Spectral regionalization of tropical soils in the estimation of soil attributes 1

Conventional soil analysis produces large amount of residues and demand resources and time consuming. The construction of soil spectral database for estimating soil attributes is the newest alternative on soil mapping. The objective in this study was to build spectral libraries and study the quality of the generated prediction models for soil attributes. It was obtained 7185 soil spectral (400-2500 nm) in laboratory with respective soil analysis. The spectral libraries “general”, “regional”, and “local” were generated from these spectral readings. The general spectral library contained the full range of data and several states, the regional libraries contained data from geographically close municipalities, and the local libraries contained soil data from a single municipality. In general we observed the sequence of R2 for General (0.85), Regional (0.67 to 0.77) and Local (0.55 to 0.77). In conclusion, the best database was the general one. On the other hand, independent of the size of the database, predictive models based on physical attributes such as sand, clay, and organic matter generate good predictions until an R2 of 0.7. The determination of spectral libraries including highly variable soils formed from different parent materials create worse results for the estimation of chemical attributes and better results for the estimation of the physical ones. The low range of variation in a given attribute was a limiting factor in the generation of effective predictive models. A great spectral library can certainly improve soil quantitative evaluation.


INTRODUCTION
Agricultural planning requires detailed information about soil attributes.However, this diagnosis by routine analysis is often expensive, time consuming and generate large amounts of potentially toxic residues.Due to this and the growing demand for rational use of natural resources, Chang et al. (2001) highlight the need to develop techniques that meet the needs of land use planning on a large scale.
Thus, reflectance spectroscopy has emerged as an promising alternative method for estimating various soil properties (SORIANO-DISLA et al., 2014).This technique is based on the spectral behavior of a single attribute interacting with a specific wavelength of the electromagnetic radiation (EMR).The incident energy on the target can promote the transition of electrons in inter or intra-atomic and/or stretching and/or bending levels of structural groups, characterizing the spectral response of the target (GENOT et al., 2011).The characteristics of repeatability and reproducibility of the spectroscopic technique, associated with the accuracy of the method make it a reliable reference method (GENOT et al., 2011).
However, the organization and exhaustive study of spectroscopy applied to soils need to be developed properly in order to achieve the level of reliability required for the implementation of the technique (VISCARRA-ROSSEL;CHEN, 2011).This requires complex statistical methods for modeling and validation of the results, being more robust the greater the scope of the database reference (GOMEZ; VISCARRA-ROSSEL; MCBRATNEY, 2008).According to Longley et al. (2013), the quality of geographic analysis is limited by the quality of the database used and by modeling data from which it derives.Thus, the application of spectroscopic techniques in soil science is subject to extensive and efficient characterization and registration of the spectral response of soil samples in spectral libraries (VISCARRA-ROSSEL et al., 2008).In this aspect, spectral libraries are the ultimate goal on preparing the field for future soil analysis.The point is that, despite quantification papers are largely studied, there are few relating with spectral regionalization.In fact, there are still doubts about how to use the spectral libraries on which are the best, local or regional.
The objective of this study was to evaluate the quality of predictive models for soil attributes generated from spectral libraries (obtained in the laboratory in the VIS-NIR region) containing different ranges and quantities of data.Thehypothesize is that regional spectral libraries (within the same geological and climatic zone) will bring better predictive models than a single, general library.
Spectral data were obtained in the laboratory with a FieldSpec Pro spectroradiometer (Analytical Spectral Devices, Boulder, Colo.).This instrument has a sensor with a spectral resolution of 1 nm for wavelengths from 350 to 1100 nm and 2 nm for wavelengths from 1100 to 2500 nm.For the collection of reflectance data, samples corresponding to a volume of approximately 15 cm 3 of soil were dried in an oven at 45 o C for 24 h (HENDERSON et al., 1992) and then ground and sieved (2-mm).Each Spectral regionalization of tropical soils in the estimation of soil attributes sample was placed in a petri dish for sensor reading.The reflectance of each sample was given by the average of 100 sensor readings.The point of light capture of the equipment (the opening of the fiber optic cable) was placed in a vertical position 8 cm from the sample to measure the light reflected by an area of approximately 2 cm 2 in the center of the sample.The light source used was a 50 W halogen lamp with a non-collimated beam at the target surface.The light source was positioned 35 cm from the sample with a zenith angle of 35º.A white spectralon plate was used as the reference standard for 100 % reflectance.

Construction of spectral libraries
From the total spectral data collected, the following distinct databases, or spectral libraries (SLs), were derived: the General SL, which contained all of the collected spectral data; Regional SL 01 and Regional SL 02; and local SLs for Guararapes-SP, Porto Velho-AP, Goianesia-GO, and Luis Antonio-SP.Table 1 summarizes the different SLs and Figure 1 illustrates their construction.

Treatment of spectral data and statistical analysis
All reflectance data were pre-processed to improve the stability of the regressions.Absorbance values (1/R) were log transformed and transformed data were centralized around the average.Statistical analysis of the laboratory data was performed using SPSS 11.0.The mean, standard deviation, maximum value, and minimum value were calculated for each attribute.
A principal component analysis (PCA) of the spectral data was performed using the program Unscrambler 9.7.The PCA results indicate the soil variability within each spectral library.Due to the large quantity of spectral data included in the General SL (7185), it was necessary to perform a representative  MCBRATNEY, 2006).

Calibration and validation of spectral models and quantification of attributes
For each SL, a subset of the data was used for development and calibration of the predictive models and the remainder of the data was used for subsequent validation (test samples) (Table 1).The chemometric program Unscrambler 9.7 was used to generate predictive models (calibration phase).The partial least squares regression (PLS) and projection on latent structures regression (PLSR) modules included in this program were used to develop the models.PLS regressions have been widely used and perform well in the estimation of soil attributes based on spectral behavior (VASQUES; GRUNWALD, 2008;VISCARRA-ROSSEL et al., 2008;ZORNOZA et al., 2008).
To calibrate the model for each of the attributes, the number of PLS factors was chosen using the results of the model validation as criteria (validation phase).During the validation phase, each model was evaluated based on the coefficient of determination (R 2 ) , the root mean square error (RMSE) (Equation 1), the mean error (ME or bias) (Equation 2), the standard error of the mean (SEM or SDE) (Equation 3), and the ratio of prediction deviation (RPD) (WILLIANS, 1987): (1) (2) Where i indicates the values estimated by the model, yi indicates the observed values, and N is the number of observations of the variable to be modeled.The difference between the observed value (reference value) and the predicted value is called the residual.
The RPD is the ratio of the standard deviation of the reference data to the RMSE for the model validation.The ME is the mean of the residuals and indicates whether a model overestimates or underestimates the values of the attribute.The SDE is the standard deviation of the residuals, where high values of SDE indicate that the model predicts some of the test samples well and predicts others poorly.The results of these figures of merit were analyzed as described in literature (CHANG et al., 2001;DUNN et al., 2002;SAYES;MOUAZEN;RAMON, 2005).

Principal Component Analysis (PCA)
The SLs showed great variability of soils (Figure 2) because data came from different regions and different parent materials.The Principal Component one (PC1) was correlated with the albedo of the samples (GALVÃO; PIZARRO; EPIPHANIO, 2001) while the PC2 represents the curve shape.Samples with greater reflectance had lower PC1 scores.Reflectance is a function of the levels of Fe 2 O 3 , sand, clay, and organic matter and the presence of opaque minerals (BELLINASO; DEMATTÊ; ARAUJO, 2010).These attributes are related to the soil class and parent material of each sample.Therefore, SLs with high variation in PC1 scores showed high variability of soils formed by different parent materials.SLs with low variability of soils formed from the same parent material showed tighter clustering of points in the graph of PC1 vs. PC2 (Figure 2).
Regional SL 01 encompassed less variability of soils than Regional SL 02, as indicated by the lesser variation in PC1 scores.The data that made up Regional SL 01 represented regions in close geographic proximity where most soils were derived from sandstones.Regional SL 02 also encompassed data from geographically proximate regions, but the soils were derived from a greater variety of parent materials, including volcanic rocks, siltstones, argillites, and even sandstones.
Among Local SLs, the Luis Antonio-SP SL and the Porto Velho-AP SL showed greater variability of soils than the Goianesia-GO SL and the Guararapes-SP SL.In fact, soils from Porto Velho have contribuition of the Formação Barreiras, and Goianésia from Serra-Geral.Greater variability of soils occurs primarily in the Luis Antonio-SP region, with Spectral regionalization of tropical soils in the estimation of soil attributes greater variation in the levels of clay and sand (Table 2).This region also showed high variation in PC1 scores.

Statistical analysis of reference data
The Goianesia-GO and Guararapes-SP SLs and Regional SL 01 showed a smaller range of variation in  the levels of clay and sand (Table 2), confirming the PCA results (Figure 2).All SLs showed high variation in OM content (Table 2), with the Guararapes-SP SL having the least variation.The range of variation differed among other chemical attributes.Overall, the PortoVelho-AP SL showed less variability of the chemical attributes, followed by the Luis Antonio-SP and Regional SL 01 (Table 2).

Calibration and validation of predictive models generated from the spectral libraries
Determining which R 2 and RMSE parameter values indicate an appropriate model is subjective, and it is difficult to compare the results of different calibrations (DUNN et al., 2002).However, Sayes, Mouazen and Ramon (2005) has established that R 2 values between 0.50 and 0.65 indicate that the model can discriminate high and low concentrations, while R 2 values of approximately 0.66 to 0.81, 0.82 to 0.90, and 0.90 or greater indicate acceptable, good and excellent quantitative predictive models, respectively, for chemical attributes.Chang et al. (2001) and Dunn et al. (2002) have suggested that models with RPD values lower than 1.5 should be considered insufficient for the majority applications, whereas models with values greater than 2.0 should be considered excellent.Models with RDP values between 1.5 and 2.0 are considered useful in relation to the accuracy of their predictions.Gogé et al. (2013) found R 2 > 0.75 in predict models for local library, with R 2 = 0.58 for clay, CEC, CaCO 3 and Fe contents, but when used the national database alone, the prediction of soil properties for local site regression was inaccurate for some properties.The predictive models for all SLs showed greater accuracy for physical attributes such as clay, sand, organic carbon, and organic material than for chemical attributes (Tables 3-5).This agrees with findings of Nanni and Demattê (2006) and recently with Araújo et al. (2014).
Organic matter, organic carbon, and CEC showed good predictive ability, with R 2 values greater than 0.6 and RPD greater than 1.6 but lower than 2.0.The remaining attributes showed unsatisfactory results, with RPD values lower than 1.6.The attributes that createdthe worst predictions were P, K, Al 3+ , CaCl 2 and pH (Table 4).
In fact, this is in agreement with Soriano-Disla et al. (2014) review, which observed low values for these elements.The accuracy of the predictive models differed between regional SLs (Table 4).Regional SL 01 lead to good predictions for clay, sand, organic carbon, OM, Al 3+ , SB, CEC, V% and m%.Regional SL 02 generate excellent predictions for clay and sand and good predictions for organic carbon and OM.
Regional SL 01 result in better predictions than Regional SL2 for P, K, Mg, Al, SB, CEC, V%, and m% and poorer predictions for sand, clay, organic carbon, OM, and Ca.To summarize, Regional SL 02 result in better predictions for physical attributes, while Regional SL 01 was for chemical attributes.Regional SL 02 had greater soil variability than regional SL 01, as shown in ¹ R²: coefficient of determination; RMSE: root mean square error; SDE: standard deviation of the residuals; ME: mean error; RPD: ratio of prediction deviation.*O.M.: organic matter; SB: sum of bases; CEC: cation exchange capacity; V: base saturation; m: aluminum saturation; x: there is no data Figure 2. Whereas Region 02 is located at the transition between the Peripheral Depression and the Western Plateau of São Paulo, where distinct parent materials such as sandstones, siltstones, and volcanic rocks are present and Region 01 is located on the Western Plateau of São Paulo, where the parent material is mostly sandstone.
The presence of distinct parent materials causes a high diversity in the soils of Region 02, where sandy soils and clay soils with high Fe 2 O 3 content can be found.Therefore, the spectral libraries with greater soil variability generated better predictive models for physical attributes such as clay, sand, and organic carbon, consistent with the results found for models generated from the general SL.Sankey et al. (2008) observed that the combination of local spectral data with global data leads to better predictions for sand and organic carbon than the use of global data or local data alone.In this context, the importance of local data and the need for data variability become evident for the generation of better predictive models.
The quality of predictions varied among Local SLs (Table 5).Excellent predictions for clay and sand were found in the model for the local SL of Porto Velho-AP, and for sand in the model of the local SL of Luis Antonio-SP.Good predictions were obtained for OM and organic carbon in the model for the local SL of Porto Velho-AP;   Stevens et al. (2013) working with a continental scale (Europe), showed relatively large error (more than 4 g kg-¹), sugesting that organic carbon prediction on large scale spectral library are not accurate.The comparison among the results of the local SLs (Table 5) demonstrated that, overall, SLs with greater soil variability lead to a better predictions for physical attributes and poorer predictions for chemical attributes.According to Zornoza et al. (2008), soil properties such as CEC and exchangeable bases are primarily controlled by the types of clay minerals and the types and content of organic matter, which possess functional groups of varying adsorption capacities for different cations and water.
Therefore, when working with data from various soil classes with different parent materials, there is likely to be variation of the types of clay minerals that contribute to CEC.Each type of clay mineral (oxides, clay minerals 1:1, and clay minerals 2:1) is associated with specific spectral bands; thus, the CEC may be correlated with different spectral bands in different soils.Therefore, if an attribute is correlated with a particular band in a particular soil, with another band in another soil, and so forth, this will lead to confusion in the development of predictive models.This fact partially explains why, in some areas of lower soil variability, the predictive models for chemical attributes are better: for the majority of samples in these areas, the predicted attributes are related to the same bands.
Another important factor in the development of predictive models for chemical attributes is the range of variation.This can be observed when comparing the results obtained for Regional SL 01 to those of the Guararapes SL.Both exhibit low soil class variability and similar parent materials, as shown in Figure 2.However, SLs with greater ranges of data variation produced better predictive models (Table 6).This information is consistent Spectral regionalization of tropical soils in the estimation of soil attributes  with the conclusions of Dunn et al. (2002), who found that a low range of variation in a particular attribute may lead to poor predictive models.For example, Viscarra-Rossel et al. (2008) obtained excellent predictive models for Ca 2+ from a database whose values ranged from 1.9 to 313.5 mmol c kg -1 .Sankey et al. (2008) have presented revised results in which for the prediction of clay and O.C. tends to improve from global SLs to regional SLs and from regional SLs to local SLs.However, the size of the SL is perhaps not the best form of comparison among soil spectral libraries.Rather, the variation in the data in a spectral library may be better for comparison.For example, there are relatively small areas in Brazil with high soil variability and large areas with low soil variability.
No general rule was found by Gogé et al. (2013) for use local or global models, for CaCO 3 , for example, the best strategy was use global library, without local samples and for clay content, the prediction model was better using 50 local samples (representing 35 % of total local samples).
The discussion of what is the best to predict soil attributes, if local or general libraries raised with Henderson et al. (1992).They already stated this type of difficulty.Afterwards Demattê and Garcia (1999) observed that local models where significantly better than general ones, working in Paraná State, Brazil.

CONCLUSIONS
1. Greater or lesser variability in soil classes formed from different parent materials influences the efficiency of the predictive models generated from a spectral library; 2. Variability of soil and parent materials within a spectral library impairs the accuracy of predictive models for chemical attributes and improves predictive models for physical attributes; 3. A spectral library aimed at predicting chemical attributes should have low soil variability, and the data used to calibrate the models should have a large range of variation of the modeled attribute; 4. The results indicate the sequence of R² for Global (0.85), Regional (0.67 to 0.77) and Local (0.55 to 0.77).Despite this, in all cases, the fact is that models still indicate good accuracy for soil quantification; 5. There is still necessary research on the subject to reach the best database.

Figure 1 -
Figure 1 -Construction of Spectral Libraries

Figure 2 -
Figure 2 -Principal Components Analysis of General, Regional and Local Spectral Libraries

Table 1 -
Characterization of Spectral Libraries

Table 2 -
Statistical analysis of the attributes for General, Regional and Local Spectral Libraries

Table 3 -
Internal validation of the calibration models generated by General, Regional and Local Spectral Libraries ¹R²: coefficient of determination; RMSE: root mean square error; NF: number of factors *O.M.: organic matter; SB: sum of bases; CEC: cation exchange capacity; V: base saturation; m: aluminum saturation; x: there is no data

Table 4 -
Results of model validation for estimating attributes generated from General, Regionals and Locals Spectral Libraries

Table 5 -
Results of model validation for estimating attributes generated from Local Spectral Libraries

Table 6 -
The influence of the range of data variation in quality of predictive models