Skip to content
Mutus Tech

Resources

Agricultural Carbon Flux Dataset: advancing multimodal environmental monitoring

8 June 2025

A multimodal carbon flux dataset combining climate variables, remote sensing imagery and soil properties to support high-resolution carbon cycle modelling.

At Mutus Tech, we have developed a multimodal carbon flux dataset that combines climate variables, remote sensing imagery and soil properties to support high-resolution carbon cycle modelling and AI-enabled environmental forecasting.

Multimodal dataset composition

The dataset combines several environmental data sources:

  • Climate time series — hourly meteorological variables (temperature, precipitation, solar radiation, humidity, wind speed and direction).
  • Remote sensing imagery — MODIS surface reflectance, vegetation indices (NDVI, EVI) and land cover classifications.
  • Soil characteristics — derived from SoilGrids: pH, organic carbon, sand/silt/clay proportions, and water holding capacity.
  • Carbon flux targets — hourly measurements from eddy covariance towers: gross primary productivity (GPP), ecosystem respiration (Reco) and net ecosystem exchange (NEE).

Each sample is linked to its corresponding timestamp, geographic location and ecosystem type — for example, cropland, grassland or forest — making it suitable for spatio-temporal modelling.

Structured input modalities

Level 1 — core climatic inputs

  • Temperature (2 m and soil)
  • Precipitation
  • Radiation
  • Wind (u/v components)
  • Vapour Pressure Deficit (VPD)
  • Surface pressure

Level 2 — land and soil features

  • Soil pH
  • Soil moisture and texture
  • Land use/land cover class
  • Elevation and slope
  • Vegetation indices (NDVI, LAI, FPAR)

Level 3 — target and historical signals

  • Net Ecosystem Exchange (NEE)
  • Gross Primary Productivity (GPP)
  • Ecosystem Respiration (Reco)
  • Historical flux trends
  • Management practices (cropping, irrigation, fertilisation, where available)

Real-world and simulated samples

To support robust carbon modelling in agricultural ecosystems, we have curated a dataset that includes 39 real-world eddy covariance tower sites across global cropland regions. These sites cover a range of climatic and soil conditions across North America and Europe.

The dataset includes:

  • 39 real-world agricultural flux tower sites
  • Multimodal daily-to-hourly inputs (climate, satellite imagery, soil properties)
  • Millions of aligned observations, suitable for model training and temporal generalisation

This forms a scientifically grounded benchmark for evaluating multimodal spatio-temporal learning approaches.

Applications and ongoing work

The dataset supports the development of spatio-temporal models for:

  • Carbon flux forecasting under climate change
  • Agricultural greenhouse gas emission assessment
  • Remote sensing-based carbon cycle analytics
  • AI-enabled land management decision tools

We continue to expand the dataset by integrating new sources (for example, ERA5-Land, Sentinel, drone imagery) and validating results through international collaborations and flux site campaigns.

Sample data

A representative sample of the carbon flux dataset is available for research and testing. It includes formatted input features, aligned output labels and metadata needed for model development. To request access, please contact us.