MapBiomas Soil

Annual maps of soil organic carbon stock
Static maps of soil granulometry (clay, silt, and sand) and texture

This is a summary of the methodology developed and applied in MapBiomas Soil for spatial modeling of soil particle size distribution and spatiotemporal modeling of soil organic carbon stock across Brazil. For a detailed description of the data, methods, and algorithms, visit https://doi.org/10.58053/MapBiomas/3KXXVV.

1. Introduction

The MapBiomas Soil developed the second collection (beta version) of annual soil organic carbon (SOC) stock maps for Brazil, covering the period from 1985 to 2023 with a spatial resolution of 30 meters (Table 1). The maps present the soil organic carbon stock, in tons per hectare (t/ha), for the top 30 cm of soil depth.

The second collection of MapBiomas Soil also introduces new soil particle size distribution maps (beta version), expressed as the percentage of mineral fractions clay, silt, and sand, for five depth layers (0-10, 10-20, 20-30, 0-20, and 0-30 cm) (Table 1). Additionally, soil texture classification map (beta version) are presented for the 0-10, 0-20, and 0-30 cm depth layers according to three classification schemes (five, eight, and thirteen textural classes).

The maps were developed using field soil data published in the SoilData Repository (http://soildata.mapbiomas.org/) and dozens of predictor variables available in public spatial data sources. Soil property maps can be accessed through the Soil module on the MapBiomas platform (https://plataforma.brasil.mapbiomas.org/solo). On the platform, soil properties can be visualized for different territorial, environmental, and administrative areas.

Table 1. Soil property maps with 30-meter spatial resolution available in the MapBiomas Soil module for the Brazilian territory.

Soil propertyDefinitionType of variableSoil layerTemporal resolutionNominal year¹
Soil organic carbon stock (SOC)Maps of SOC stocksContínua (estoques de 20  a 255  t/ha e 2 a 26 kg/m²)0 – 30 cmAnnualDoes not apply
Particle sizeMaps of sand, silt, andContinuous0-10, 10-20, 20-30, 0-20 e 0-30 cmDoes not apply1990
TextureMaps of soil texture classesCategorical (five, eight, and thirteen classes)0-10, 10-20, 20-30, 0-20 e 0-30 cmDoes not apply1990
¹ The "nominal year" refers to the year (e.g., 1990) assigned to the particle size and texture maps, representing field data from multiple years (1958–2023); this simplifies the application of the data in modeling exercises.

2. Concept

The collection of particle size distribution and soil organic carbon stock maps was produced using random forest machine learning algorithms. This algorithm models statistical relationships between field-collected soil data and predictor variables. The static predictor variables represent the soil-forming factors (climate, organisms, topography, and parent material), which explain the spatial variation of soil properties. The dynamic predictor variables, on the other hand, capture change vectors over time, such as climate changes and shifts in land use and management.

The random forest model consists of hundreds of independent regression trees, each trained with distinct portions of the training data. Each tree builds specific statistical relationships between soil data and predictor variables, simulating soil scientists working autonomously and using different data subsets to identify soil-landscape patterns.

The predictor variables cover the entire Brazilian territory, some with annual frequency throughout the 38-year period (1985–2023). This allows the application of the random forest model, trained with field samples, to unsampled areas, generating soil property estimates at the national scale. The process results in the static and annual maps of MapBiomas Soil. The workflow is presented in Figure 1, and the following sections describe the training samples, predictor variables, and the modeling and mapping process for soil properties.


Figure 1. Flowchart of the method for generating annual soil organic carbon stock maps and static soil particle size distribution and texture maps for Brazil.

3. Training samples

Soil data were collected from 262 studies published in the SoilData repository (http://soildata.mapbiomas.org/). After analysis, 182 studies were selected for the spatial modeling of soil particle size distribution (clay, silt, and sand). These studies cover 11,633 points distributed across Brazil, totaling 19,965 soil layers up to a depth of 40 cm (Figure 2). For each layer, the percentage proportions of clay, silt, and sand were transformed into "log-additive ratios" — log(clay/sand) and log(silt/sand) — a mathematical representation that enhances the stability of the modeling process.

A total of 250 datasets were selected for the space-time modeling of soil organic carbon stock. These data represent 15,729 points distributed across Brazil, with information on 29,881 soil layers. Some properties, such as coarse fragment content and soil density, were missing in many samples. These missing values were estimated through statistical models known as pedotransfer functions. After this step, 12,575 points and 19,028 layers from 236 datasets were considered suitable for composing the training samples for the space-time model of soil organic carbon stock (Figure 2).

Soil organic carbon stock (g/m²) was calculated for each layer using the equation: carbon content × (1 - coarse fragments) × soil bulk density × layer thickness. At sampling points with multiple layers up to 30 cm depth, the carbon stocks of each layer were summed to estimate the total soil organic carbon stock at the point.


Figure 2. Spatial distribution of the field training samples used in the spatial modeling of soil particle size distribution (left; n = 19,965 samples) and the spatio-temporal modeling of annual soil organic carbon stock (right; n = 12,575 samples).

The processing of field soil data was carried out locally, using R and Python, and in the cloud, using Google Earth Engine. The developed code is available under an open license in the GitHub repository at https://github.com/Laboratorio-de-Pedometria/mapbiomas-socThe data are available under the open Creative Commons Attribution CC-BY license in the SoilData repository at the following links: https://doi.org/10.60502/SoilData/P6R332 (particle size distribution) and https://doi.org/10.60502/SoilData/SXCSDK (carbon stock).

4. Predictor variables

Predictor variable data, in raster and vector formats, were obtained from open databases (Table 2). These include static maps (one for the entire series) and dynamic maps (one for each year of the series), covering the entire Brazilian territory. After integration into Google Earth Engine, the data had gaps filled and were standardized to a 30 m resolution. Soil properties were aggregated for the 0-30 cm layer, and multicategorical data were transformed into binary indicator variables (presence or absence of the category). In total, 105 plausible predictor variables for the spatial modeling of particle size distribution and the spatiotemporal modeling of carbon stock were included.

The Köppen climate classification was included to capture large-scale climate effects. Localized effects of hydrology and climate change were represented by land cover indicators with water layers throughout the series. Maps of phytophysiognomy and IBGE structural provinces were used to represent large-scale effects of primary vegetation and geology. Terrain morphometric variables, derived from the digital elevation model, were obtained from the Geomorpho90m database to represent local effects of topography.

Vegetation indices such as NDVI, SAVI, and EVI were derived from annual Landsat satellite imagery. The long-term effect of vegetation on the soil was modeled using weighted moving averages of these indices, considering the last seven years with greater weight assigned to more recent years. The temporal effect of land cover and land use was represented by the evolution of the age of each class from MapBiomas Cover 9.0. Anthropogenic and natural pressures on vegetation were incorporated with data from MapBiomas Degradation and MapBiomas Fire.

Static maps of soil properties and classes related to particle size distribution and soil organic carbon stocks were obtained from SoilGrids and FAO. However, particle size distribution maps were derived from the products generated in this Collection 2 of MapBiomas Soil. Spectral indices derived from Landsat images were used to indicate the mineral content in the soil. Territorial cuts, such as the IBGE biome classification and spatial position, were used to represent the relationships between climate, organisms, terrain, geology, and soil.

Table 2. Static and dynamic predictor variables used in the modeling of particle size distribution and annual soil organic carbon stock, associated with the soil state factor or change vector they represent.

Factor or vectorPredictor variableApplicationType of variable
SoilSand, silt, and clay content Source: MapBiomas Soil, Collection 2.0Carbon and particle size distributioncontinuous; static
Soil properties such as cation exchange capacity, total nitrogen content, and pH in water Source: SoilGrids 2.Carboncontinuous; static
Probability of occurrence of soil classes Source: SoilGrids 1.0 and FAO GBSmapCarbon and particle size distributioncontinuous; static
Spectral indices indicative of minerals such as iron oxides, clay minerals, and quartz Source: Landsat ImagesCarbon and particle size distributioncontinuous; static
ClimateKöppen climate classification Source: Alvares et al. (2013)Carbon and particle size distributioncategorical; static
Hydrology; Climate changeSoil cover with water layer (presence/absence and absolute frequency) Source: MapBiomas, Collection 9.0Carboncontinuous; static
OrganismsClassification of primary vegetation (phytophysiognomy) Source: IBGECarbon and particle size distributioncategorical; static
Land cover and land use persistence and changLand cover and land use persistence and change Land cover and land use class age Source: MapBiomas, Collection 9.0Carboncontinuous; dynamic
Vegetation indices such as NDVI, SAVI, and EVI Source: Landsat imagesCarbon and particle size distributioncontinuous; dynamic
Loss of organismsIndicators of native vegetation degradation (distance from anthropogenic use edge) Source: MapBiomas Degradation, Beta CollectionCarboncontinuous; dynamic
Fire occurrence indicators (cumulative occurrence; frequency over the period; years since the last fire) Source: MapBiomas Fogo, Collection 3.0Carboncontinuous; dynamic and static
ReliefTerrain morphometric variables such as slope, compound topographic index, and elevation Source: Geomorpho90mCarbon and particle size distributioncontinuous; static
GeologyGeological classification (structural provinces) Source: IBGEParticle sizecategorical; static
Territory or spatial positionTerritorial classification by biome Source: IBGECarbon and particle size distributioncategorical; static
Geographic coordinates (latitude and longitude)Carboncontinuous; static
The processing of the predictor variables was carried out in the cloud using Google Earth Engine

5. Modeling and mapping

The training matrices for the random forest models of particle size distribution (spatial) and soil organic carbon stock (spatio-temporal) were generated by intersecting the training samples with the predictor variables. For particle size distribution, the predictor variables included layer depth and the particle size of the upper layer. The optimal configuration of parameters for each model and the final list of predictor variables for each soil property were determined after training and evaluating multiple parameter combinations, using resampling techniques such as bootstrap and cross-validation. In total, 68 and 90 predictor variables were selected for the particle size distribution and carbon stock models, respectively.

Two random forest models were trained for particle size distribution, one for each log-additive ratio log(clay/sand) and log(silt/sand). These models were applied to the predictor variables to generate maps of the log-additive ratios for the depths of 0-10, 10-20, and 20-30 cm. The generated maps were then converted to the original scale of percentage content of clay, silt, and sand, and aggregated for the 0-20 and 0-30 cm layers. Subsequently, the particle size maps were classified using three different texture schemes (five, eight, and thirteen classes) (Figure 3).


Figure 3. Soil texture classification schemes with five (left), eight (center), and thirteen (right) classes based on clay, silt, and sand content (textural triangle).

Three random forest models were trained for soil organic carbon stock, one for the Amazon biome, another for the Pampa biome, and a third for the Caatinga, Pantanal, Cerrado, and Atlantic Forest biomes. This division was based on the availability of training samples in each area. The trained models were applied to the predictor variables for each of the 38 years in the series (1985-2023). For each pixel, the results from the regression trees were aggregated by the median, considered the predicted value.

Modeling and mapping were carried out locally using R and Python, and in the cloud, using Google Earth Engine. The developed code is available under an open license in the GitHub repository at https://github.com/mapbiomasThe data is available under the open Creative Commons Attribution CC-BY license in the MapBiomas Data repository, at the following links: https://doi.org/10.58053/MapBiomas/8LQB4S (particle size distribution) and https://doi.org/10.58053/MapBiomas/TVRX77 (carbon stock).