MapBiomas Soil
Annual maps of soil organic carbon stock
Static maps of soil granulometry (clay, silt, and sand) and texture
This is a summary of the methodology developed and applied in MapBiomas Soil for spatial modeling of soil particle size distribution and spatiotemporal modeling of soil organic carbon stock across Brazil. For a detailed description of the data, methods, and algorithms, visit https://doi.org/10.58053/MapBiomas/3KXXVV.
1. Introduction
The MapBiomas Soil developed the second collection (beta version) of annual soil organic carbon (SOC) stock maps for Brazil, covering the period from 1985 to 2023 with a spatial resolution of 30 meters (Table 1). The maps present the soil organic carbon stock, in tons per hectare (t/ha), for the top 30 cm of soil depth.
The second collection of MapBiomas Soil also introduces new soil particle size distribution maps (beta version), expressed as the percentage of mineral fractions clay, silt, and sand, for five depth layers (0-10, 10-20, 20-30, 0-20, and 0-30 cm) (Table 1). Additionally, soil texture classification map (beta version) are presented for the 0-10, 0-20, and 0-30 cm depth layers according to three classification schemes (five, eight, and thirteen textural classes).
The maps were developed using field soil data published in the SoilData Repository (http://soildata.mapbiomas.org/) and dozens of predictor variables available in public spatial data sources. Soil property maps can be accessed through the Soil module on the MapBiomas platform (https://plataforma.brasil.mapbiomas.org/solo). On the platform, soil properties can be visualized for different territorial, environmental, and administrative areas.
Table 1. Soil property maps with 30-meter spatial resolution available in the MapBiomas Soil module for the Brazilian territory.
Soil property | Definition | Type of variable | Soil layer | Temporal resolution | Nominal year¹ |
Soil organic carbon stock (SOC) | Maps of SOC stocks | Contínua (estoques de 20 a 255 t/ha e 2 a 26 kg/m²) | 0 – 30 cm | Annual | Does not apply |
Particle size | Maps of sand, silt, and | Continuous | 0-10, 10-20, 20-30, 0-20 e 0-30 cm | Does not apply | 1990 |
Texture | Maps of soil texture classes | Categorical (five, eight, and thirteen classes) | 0-10, 10-20, 20-30, 0-20 e 0-30 cm | Does not apply | 1990 |
2. Concept
The collection of particle size distribution and soil organic carbon stock maps was produced using random forest machine learning algorithms. This algorithm models statistical relationships between field-collected soil data and predictor variables. The static predictor variables represent the soil-forming factors (climate, organisms, topography, and parent material), which explain the spatial variation of soil properties. The dynamic predictor variables, on the other hand, capture change vectors over time, such as climate changes and shifts in land use and management.
The random forest model consists of hundreds of independent regression trees, each trained with distinct portions of the training data. Each tree builds specific statistical relationships between soil data and predictor variables, simulating soil scientists working autonomously and using different data subsets to identify soil-landscape patterns.
The predictor variables cover the entire Brazilian territory, some with annual frequency throughout the 38-year period (1985–2023). This allows the application of the random forest model, trained with field samples, to unsampled areas, generating soil property estimates at the national scale. The process results in the static and annual maps of MapBiomas Soil. The workflow is presented in Figure 1, and the following sections describe the training samples, predictor variables, and the modeling and mapping process for soil properties.
Figure 1. Flowchart of the method for generating annual soil organic carbon stock maps and static soil particle size distribution and texture maps for Brazil.
3. Training samples
Soil data were collected from 262 studies published in the SoilData repository (http://soildata.mapbiomas.org/). After analysis, 182 studies were selected for the spatial modeling of soil particle size distribution (clay, silt, and sand). These studies cover 11,633 points distributed across Brazil, totaling 19,965 soil layers up to a depth of 40 cm (Figure 2). For each layer, the percentage proportions of clay, silt, and sand were transformed into "log-additive ratios" — log(clay/sand) and log(silt/sand) — a mathematical representation that enhances the stability of the modeling process.
A total of 250 datasets were selected for the space-time modeling of soil organic carbon stock. These data represent 15,729 points distributed across Brazil, with information on 29,881 soil layers. Some properties, such as coarse fragment content and soil density, were missing in many samples. These missing values were estimated through statistical models known as pedotransfer functions. After this step, 12,575 points and 19,028 layers from 236 datasets were considered suitable for composing the training samples for the space-time model of soil organic carbon stock (Figure 2).
Soil organic carbon stock (g/m²) was calculated for each layer using the equation: carbon content × (1 - coarse fragments) × soil bulk density × layer thickness. At sampling points with multiple layers up to 30 cm depth, the carbon stocks of each layer were summed to estimate the total soil organic carbon stock at the point.
Figure 2. Spatial distribution of the field training samples used in the spatial modeling of soil particle size distribution (left; n = 19,965 samples) and the spatio-temporal modeling of annual soil organic carbon stock (right; n = 12,575 samples).
The processing of field soil data was carried out locally, using R and Python, and in the cloud, using Google Earth Engine. The developed code is available under an open license in the GitHub repository at https://github.com/Laboratorio-de-Pedometria/mapbiomas-socThe data are available under the open Creative Commons Attribution CC-BY license in the SoilData repository at the following links: https://doi.org/10.60502/SoilData/P6R332 (particle size distribution) and https://doi.org/10.60502/SoilData/SXCSDK (carbon stock).
4. Predictor variables
Predictor variable data, in raster and vector formats, were obtained from open databases (Table 2). These include static maps (one for the entire series) and dynamic maps (one for each year of the series), covering the entire Brazilian territory. After integration into Google Earth Engine, the data had gaps filled and were standardized to a 30 m resolution. Soil properties were aggregated for the 0-30 cm layer, and multicategorical data were transformed into binary indicator variables (presence or absence of the category). In total, 105 plausible predictor variables for the spatial modeling of particle size distribution and the spatiotemporal modeling of carbon stock were included.
The Köppen climate classification was included to capture large-scale climate effects. Localized effects of hydrology and climate change were represented by land cover indicators with water layers throughout the series. Maps of phytophysiognomy and IBGE structural provinces were used to represent large-scale effects of primary vegetation and geology. Terrain morphometric variables, derived from the digital elevation model, were obtained from the Geomorpho90m database to represent local effects of topography.
Vegetation indices such as NDVI, SAVI, and EVI were derived from annual Landsat satellite imagery. The long-term effect of vegetation on the soil was modeled using weighted moving averages of these indices, considering the last seven years with greater weight assigned to more recent years. The temporal effect of land cover and land use was represented by the evolution of the age of each class from MapBiomas Cover 9.0. Anthropogenic and natural pressures on vegetation were incorporated with data from MapBiomas Degradation and MapBiomas Fire.
Static maps of soil properties and classes related to particle size distribution and soil organic carbon stocks were obtained from SoilGrids and FAO. However, particle size distribution maps were derived from the products generated in this Collection 2 of MapBiomas Soil. Spectral indices derived from Landsat images were used to indicate the mineral content in the soil. Territorial cuts, such as the IBGE biome classification and spatial position, were used to represent the relationships between climate, organisms, terrain, geology, and soil.
Table 2. Static and dynamic predictor variables used in the modeling of particle size distribution and annual soil organic carbon stock, associated with the soil state factor or change vector they represent.
Factor or vector | Predictor variable | Application | Type of variable |
Soil | Sand, silt, and clay content Source: MapBiomas Soil, Collection 2.0 | Carbon and particle size distribution | continuous; static |
Soil properties such as cation exchange capacity, total nitrogen content, and pH in water Source: SoilGrids 2. | Carbon | continuous; static | |
Probability of occurrence of soil classes Source: SoilGrids 1.0 and FAO GBSmap | Carbon and particle size distribution | continuous; static | |
Spectral indices indicative of minerals such as iron oxides, clay minerals, and quartz Source: Landsat Images | Carbon and particle size distribution | continuous; static | |
Climate | Köppen climate classification Source: Alvares et al. (2013) | Carbon and particle size distribution | categorical; static |
Hydrology; Climate change | Soil cover with water layer (presence/absence and absolute frequency) Source: MapBiomas, Collection 9.0 | Carbon | continuous; static |
Organisms | Classification of primary vegetation (phytophysiognomy) Source: IBGE | Carbon and particle size distribution | categorical; static |
Land cover and land use persistence and chang | Land cover and land use persistence and change Land cover and land use class age Source: MapBiomas, Collection 9.0 | Carbon | continuous; dynamic |
Vegetation indices such as NDVI, SAVI, and EVI Source: Landsat images | Carbon and particle size distribution | continuous; dynamic | |
Loss of organisms | Indicators of native vegetation degradation (distance from anthropogenic use edge) Source: MapBiomas Degradation, Beta Collection | Carbon | continuous; dynamic |
Fire occurrence indicators (cumulative occurrence; frequency over the period; years since the last fire) Source: MapBiomas Fogo, Collection 3.0 | Carbon | continuous; dynamic and static | |
Relief | Terrain morphometric variables such as slope, compound topographic index, and elevation Source: Geomorpho90m | Carbon and particle size distribution | continuous; static |
Geology | Geological classification (structural provinces) Source: IBGE | Particle size | categorical; static |
Territory or spatial position | Territorial classification by biome Source: IBGE | Carbon and particle size distribution | categorical; static |
Geographic coordinates (latitude and longitude) | Carbon | continuous; static |
5. Modeling and mapping
The training matrices for the random forest models of particle size distribution (spatial) and soil organic carbon stock (spatio-temporal) were generated by intersecting the training samples with the predictor variables. For particle size distribution, the predictor variables included layer depth and the particle size of the upper layer. The optimal configuration of parameters for each model and the final list of predictor variables for each soil property were determined after training and evaluating multiple parameter combinations, using resampling techniques such as bootstrap and cross-validation. In total, 68 and 90 predictor variables were selected for the particle size distribution and carbon stock models, respectively.
Two random forest models were trained for particle size distribution, one for each log-additive ratio log(clay/sand) and log(silt/sand). These models were applied to the predictor variables to generate maps of the log-additive ratios for the depths of 0-10, 10-20, and 20-30 cm. The generated maps were then converted to the original scale of percentage content of clay, silt, and sand, and aggregated for the 0-20 and 0-30 cm layers. Subsequently, the particle size maps were classified using three different texture schemes (five, eight, and thirteen classes) (Figure 3).
Figure 3. Soil texture classification schemes with five (left), eight (center), and thirteen (right) classes based on clay, silt, and sand content (textural triangle).
Three random forest models were trained for soil organic carbon stock, one for the Amazon biome, another for the Pampa biome, and a third for the Caatinga, Pantanal, Cerrado, and Atlantic Forest biomes. This division was based on the availability of training samples in each area. The trained models were applied to the predictor variables for each of the 38 years in the series (1985-2023). For each pixel, the results from the regression trees were aggregated by the median, considered the predicted value.
Modeling and mapping were carried out locally using R and Python, and in the cloud, using Google Earth Engine. The developed code is available under an open license in the GitHub repository at https://github.com/mapbiomasThe data is available under the open Creative Commons Attribution CC-BY license in the MapBiomas Data repository, at the following links: https://doi.org/10.58053/MapBiomas/8LQB4S (particle size distribution) and https://doi.org/10.58053/MapBiomas/TVRX77 (carbon stock).