Burrough and McDonnell (1998, p.155, 2015 Ch.9) describe Kriging as a multi-stage process of interpolation that builds upon calculation and modeling of the experimental variogram. Slightly modifying their original summary we have the following stages:
•examine the data for Normality and trends — transform where necessary. Note that Kriging only requires Normality in order to create confidence intervals (probability maps). Whatever the data distribution, Kriging will provide a best unbiased predictor of values at unsampled points
•compute the experimental variogram and fit a suitable model. If the nugget value is very high then interpolation is not sensible
•check the model by cross-validation, and/or by the inclusion of a separate validation file (e.g. additional measurements not included in the modeling process). Examine the detailed validation results for both size and sign of deviations, and any unexpected patterns that these show (e.g. spikes, spatial bias)
•decide on whether to apply Kriging or conditional simulation (see further, below — conditional simulation is becoming increasingly widely used, but is not supported in many geostatistical packages — GSLIB and GS+ and Isatis are exceptions). Kriging is effectively a smoothing interpolator whereas conditional simulation is not
•if using Kriging, then use the variogram model to interpolate sites onto a regular grid where the sites are either equal in size to the original sample (Point Kriging) or are larger blocks of land (Block Kriging). Block Kriging generates smoother surfaces ― it estimates average values over a block. Kriging is not exact if a nugget is included
•display results as grid cell maps or contours, singly or draped over other data layers. Both predictions and standard deviations of predictions should be computed and mapped in most cases. If the random errors are Normally distributed and satisfy stationarity conditions then probability, quantile or confidence maps can be produced
The basic geostatistical interpolation procedure, assuming Ordinary Kriging (OK) for the time being, is essentially identical to that used in deterministic interpolation, notably interpolation using radial basis functions (see Section 6.6.4, Radial basis and spline functions). The values at unsampled (grid) points are computed as a simple linear weighted average of neighboring measured data points, where the (optimal) weights are determined from the fitted variogram rather than determined by the user. For a specific grid point, p, we have:
with the condition that the weights must add up to 1:
In Kriging these weights may be positive or negative. The condition that the weights must sum to 1 is equivalent to a process of re-estimating the mean value at each new location. Hence Ordinary Kriging is essentially the same as Simple Kriging (see below) with a location-dependent mean. Note that this allows for variation of the mean value from location to location, it is assumed that the mean value is fixed within the search window used for computations. If the mean values changes substantially over very short distances then it is described as non-stationary and alternative methods such as Universal Kriging should be considered.
Kriging assumes a general form of surface model which is a mix of:
•a deterministic model m(x,y), sometimes called the structural component of z at locations (x,y) — in OK this is assumed to be an unknown mean value, i.e. a constant, within a given neighborhood under consideration
•a regionalized statistical variation from z(x,y), which we will denote e1(x,y) — in OK and other Kriging methods this regionalized variation is determined by modeling the semivariance and using the fitted function in the interpolation process
•a random noise (Normal error) component, with 0 mean, which we will denote e2(x,y) (actually two components, measurement error and random or “white” noise, which we cannot generally separate)
In equation form this model is of the form:
or combining the two error terms and using Greek letter notation:
If the vector s is used to represent the surface coordinates (x,y) then the standard model is often written as:
If there is no overall trend or drift present then m(x,y) or μ(s) is simply the overall (local) mean value of the sampled data. If a marked trend is observed this may be removed before modeling in most packages (usually a linear or quadratic regression surface) and then added back in at the end of the process. The procedure for Ordinary Kriging is essentially the same as we provided earlier for radial basis interpolation (Section 6.6.4), but with the second step using the modeled semivariances rather than a user-selected function. As before, the steps are:
•Compute the nxn matrix, D, of inter-point distances between all (x,y) pairs in the source dataset (or a selected subset of these)
•Apply the chosen variogram model, φ(), to each distance in D, to produce a new array Φ
•Augment Φ with a unit column vector and a unit row vector, plus a single entry 0 in position (n+1),(n+1). Call this augmented array A — see below for an illustration
•Compute the column vector r of distances from a grid point, p, to each of the n source data points used to create D
•Apply the chosen variogram model, φ(), to each distance in r, to produce a column vector φ=φ(r) of modeled semivariances augmented with a single 1 as the last entry. Call this augmented column vector c — see below for an illustration
•Compute the matrix product b=A‑1c. This provides the set of n weights, λi, to be used in the calculation of the estimated value at p, plus the Lagrangian value, m, which is utilized in the calculation of variances as described below
In matrix form the system of linear equations being solved is of the form:
or more simply Ab=c, hence b=A‑1c.
The estimated value for point p is computed using the n weights, λi, (but ignoring the Lagrangian, m) in the normal manner:
or using matrix notation:
The estimated variance at point p is then determined from the computed weights, times the modeled semivariance values, plus the Lagrangian, m:
or again using matrix notation:
The variance estimates can be plotted directly and/or used to create confidence intervals for the modeled data surface under the (possibly risky) assumption that modeled random errors are distributed as Normal variates and are second-order stationary. Note that if the estimated points, p, lie on a fine grid, then average predicted values over discrete subsets of the grid (blocks) will be similar to those obtained by applying block Kriging to the same area, albeit with far more computation.
Goodness of fit
Having undertaken Kriging interpolation a variety of methods and procedures exist to enable the quality of the results, or goodness of fit, to be evaluated. These include:
Residual plots: (sometimes referred to as error plots) and standard deviation maps; maps of residuals from trend analyses
Crossvalidation: this typically involves removing each observed data point one at a time (leaving the model otherwise unaltered) and then calculating the predicted values at these points. Best fit models should minimize residuals (often referred to as errors) in these predictions: minimizing maximum residuals/errors; minimizing total sum of squared residuals/errors; removing extreme residual values (outliers/invalid results)…. Some packages allow the validation results to be exported for further examination and statistical analysis
Jackknifing: see Efron (1982) and Efron and Tibshirani (1997) for more details. This is essentially a form of cross-validation achieved by statistically re-sampling the source dataset. Typically one or more subsets consisting of x% of data points are removed at random (without replacement) and statistics of interest calculated on these subsets, which are then compared to the global value. Note that the term bootstrapping, often used alongside jackknifing, refers to procedures involving re-sampling with replacement
Re-sampling the source material (field data): an expensive or impractical option in many cases, but a realistic option in others
Detailed modeling of related datasets (e.g. easy to measure variables)
Detailed modeling incorporating related datasets: look at possible stratification of the source dataset (e.g. using underlying geological information); explicitly model non-stationarity, if appropriate; explicitly incorporate boundaries/faults or similar linear or areal elements that may substantive affect the results
Comparison with independent data (e.g. satellite data or other aerial imagery)
Examination of the fit for possible artifacts and bull’s-eyes (a form of circular artifact) — these may be generated by the interpolation process and/or be a feature inherited from the underlying dataset
Simple Kriging assumes that the data have a known, constant, mean value throughout the study area (i.e. a global mean value, in practice 0, is a suitable assumed value for modeling purposes) and exhibits second order stationarity. These assumptions are overly restrictive for most problems and hence this method is rarely used. De-trending and z-score (Normal) transformation may help to remove some of these problems. Ordinary Kriging (OK) and its variants have more relaxed assumptions than Simple Kriging. OK is the mostly widespread procedure implemented in GIS packages.
Ordinary Kriging (OK), plus variants of OK, is the most widespread procedure of this type implemented in GIS packages. OK has been described in more detail at the start of Section 6.7.2. As with other forms of Kriging, OK methods may use point or block computations, the latter resulting in a smoothed surface and inexact interpolation. As noted earlier, if the vector s is used to represent the surface coordinates (x,y) then the standard model is often written as:
Figure 6‑46 shows the result of applying Ordinary Kriging to the zinc dataset discussed earlier (Figure 6‑39 and Figure 6‑40; from Burrough and McDonnell, 1998, Appendix 3). Figure 6‑46A provides a map of the predicted z-values, whilst Figure 6‑46B shows the estimated standard deviations. In each of these the variogram model is based on the simple isotropic exponential formula shown earlier (Figure 6‑40) and the analysis is based on untransformed data. The steps involved in generating Figure 6‑46A are as follows:
•compute D=d(i,j), the inter-point distance matrix for all 98 sample points, z.
•Use these distances to compute matrix Φ=c0+c1(1-e-d/a0), the modeled variogram values. Augment Φ with unit column and row vectors plus 0 in the lower right hand corner to produce matrix A
•compute the inverse matrix A1=A‑1
•define a rectangular grid, 100x100 say, from min-max values of (x,y) for interpolated values and mapping purposes
•for every grid point, k: compute r(k)=d(i,k), the distance vector, r, from grid point k to sample points, z; then compute modeled variogram values r1 using r1(k)=c0+c1(1-e-r/a0). These steps involve applying the chosen variogram model, φ(), to each distance in r, to produce a column vector φ=φ(r)
•compute w(k)= A1*b1(k); these are the weights for point k
•then est(k)=Sum(w(k)*z(i)), i.e. the estimated value at point k is a weighted sum of the original data values, z
A. z-values (predictions)
B. Standard deviations
The predicted z-values appear as a smoothly varying map, whilst the predicted standard deviations appear somewhat as bull’s-eyes around the original source data points. To the east of the region, where there are no source data points, uncertainty is much greater and the values of the results in this area are of little value. To the west of the region is a masked-off area, a facility which is explicitly supported within some packages. The fact that there are no data points in extensive parts of the sampled region presents problems, including issues of apparent anisotropy and questions as to the appropriateness of a broadly radial modeling approach. Substantial improvements to this approach (in terms of standard deviations) are reported by Burrough and McDonnell (1998, p.156) by the use of stratification (see further below).
Also called ‘Kriging with a trend’, or KT Kriging, Universal Kriging (UK) is a procedure that uses a regression model as part of the Kriging process, typically modeling the unknown local mean values as having a local linear or quadratic trend. UK with no trend is the same as Ordinary Kriging (OK). Ordinary Kriging with a deterministic trend is often used in preference to UK as the modeling process is simpler and the results using the two approaches are often very similar.
Median-Polishing and Kriging
As an alternative to Universal Kriging (UK), which involves identification and removal of trends in the data, some packages (such as Manifold) support a technique known as median-polishing. This involves finding the median of observed (sampled) data values in a neighborhood (typically using a rectangular grid), generating row and column medians of these values in an iterative process (‘polishing’ the dataset). The median values are then subtracted from the original data points as local estimates of the mean value and Kriging of the residuals is carried out using either Ordinary Kriging (OK) or UK. For a discussion and application of such methods see Berke (2000, 2004).
Indicator Kriging is essentially the same as Ordinary Kriging, but using the binary data generated using the thresholding procedure described earlier and illustrated in Figure 6‑43. Indicator Kriging can be used with a nugget effect, and Co-Kriging (see below) of indicator data is also possible and supported in some packages. In the latter case, a second thresholding procedure is used and then both binary variables employed to predict values at unknown points for the first indicator variable. Note that the output does not provide a prediction of actual data values at grid points. The standard model for Indicator Kriging is often written as:
where μ is an unknown constant and I(s) is a binary variable.
Like Indicator Kriging, Probability Kriging is a non-linear method employing indicator variables. It can be seen as a form of Co-Kriging in which the first variable is the indicator and the second variable is the original (un-transformed) data. As with Indicator Kriging the output does not provide a prediction of actual data values at grid points. With Probability Kriging the Indicator Kriging model is extended to the form:
where μ1 and μ2 are unknown constants and I(s) is a binary variable, as before. This procedure may be compared with that of Co-Kriging (see further, below).
Disjunctive Kriging is a non-linear procedure in which the original dataset is transformed using a series of additive functions, typically Hermite polynomials (see further, below). In this case the standard model is altered to the form:
where F() is the function in question. Indicator Kriging can be regarded as a special case of Disjunctive Kriging since the original data are transformed by the Indicator function. Disjunctive Kriging assumes pairwise bivariate Normality in the data, which is rarely verifiable in spatial datasets. A z-score (Normal) transformation of the data prior to Disjunctive Kriging is often desirable.
Hermite polynomials, Hn(x), are a set of orthogonal polynomials that are most simply defined using the recurrence relation:
with the first 3 terms being H0(x)=1, H1(x)=2x and H2(x)=4x2‑2. The range of x is [‑∞,+∞], whereas most orthogonal polynomials have a more restricted range.
Non-stationary Modeling and Stratified Kriging
Variogram analysis and Kriging are often applied to a complete dataset as a single, global process. However, it may be desirable to sub-divide the study region into separate regions or strata, based on additional information. Stratification might be based on underlying geological formations, frequency of flooding, land and water regions, vegetation cover or other attributes that are of relevance to the problem at hand. In such cases the size and shape of sub-regions, and the effect of edges on the computed variograms, needs to be considered. Interpolated values using different models and parameters for each region will result in discontinuities at sub-region boundaries unless Kriging equations and/or the resulting surfaces are adjusted to support continuity. Currently, general purpose software does not support such procedures directly, but they may be implemented using standard facilities by manual stratification and re-combination of surfaces, albeit with problems of discontinuities to be resolved.
If non-stationarity of the semi-variogram is suspected, the study region can be subdivided into regularly shaped areas (assuming sufficient data are available) by using a moving window and/or systematic partitioning. Separate variograms can then be estimated for the subdivisions and ‘local’ rather than global Kriging models applied if appropriate. The resulting predictions (interpolations) may then show an improvement over the global model, at the cost of additional computational effort and a considerable increase in the number of parameters used.
Many GIS packages support the analysis of a primary and secondary dataset, mapped across the same region, using a straightforward extension of standard Kriging techniques. With Co-Kriging the estimated value at an unsampled location is a linear weighted sum of all of the variables being examined (i.e. two or more). Co-Kriging extends the Ordinary Kriging model to the form:
where μ1 and μ2 are unknown mean values (constants) and ε1 and ε2 are random errors. Each of these sets of random errors may exhibit autocorrelation and cross-correlation between the datasets, which the procedure attempts to model. Co-Kriging uses this cross-correlation to improve the estimation of Z1(s). Co-Kriging can also be applied to models other than Ordinary Kriging (e.g. Indicator Co-Kriging, etc.).
The term Factorial Kriging is used to refer to procedures in which the modeled variogram exhibits multi-scale variation. If these various scales of variation can be identified and extracted, Kriging may then be carried out using separate component parts of the variogram — for example, a spherical model component with range 2,000 meters, a similar model component with range 10,000 meters, and a separate drift component. Factorial Kriging is not widely supported in geostatistics packages at present — Isatis, a very powerful package, is one of the few that does provide support for such analysis.
This is an interpolation procedure that can be seen as a development of Kriging. However, unlike Kriging it exactly reproduces the global characteristics of the source data (notably the source data points, the histogram of input data values and covariances) and it creates multiple maps which collectively provide an estimate of local and global uncertainty. Conditional simulation also reflects patterns of local variability more satisfactorily than Kriging, since the latter tends to smooth data locally, especially where source data points are sparse. Conditional simulation is an exact interpolator and cross-validation of the type used in other interpolation methods is not applicable, although a secondary (conditioning) dataset may be used.
The most widely available procedure for conditional simulation is one of three such methods provided in the GSLIB geostatistical package (the others being simulated annealing and sequential indicator simulation). The technique is known as Gaussian Sequential Simulation (GSS) and involves creating a surface using a random (or stochastic) process, applied many times to the same initial dataset. The specifics of implementations vary from package to package (further development of conditional simulation is provided in packages such as GMS, GS+ and Isatis, and also in the Idrisi and ArcGIS GIS software), but we shall first summarize the general procedure and then illustrate the results with the zinc dataset referred to in previous subsections (from Burrough and McDonnell, 1998, Appendix 3).
The inputs to conditional simulation are similar to those for Simple or Ordinary Kriging — a set of scattered data points for which one or more measured z-values have been recorded. The standard algorithm is then a three step process:
•Step 1 initialization: analyze the input data for Normality, applying de-clustering and/or transformation if necessary; compute a best-fit experimental variogram model for the data; then define a fine grid (generally finer than is used for Kriging) over the entire study region. In some instances the definition of this grid is a multi-level process, starting with a relatively coarse grid and then progressively generating finer and finer grids. This has the advantage of improved representation of longer distance effects as well as short-range variability
•Step 2 random walk: pick a random location on the grid, X, and perform step 3, below. Then move to another (unvisited) grid location at random and continue until all grid nodes have been visited. If a multi-level grid approach is applied then repeat this process with the finer grid, using the results from the previous stage plus the input data points as the set of source data values, z*(x,y). The use of random walks or similar procedures minimizes the risk of creating artifacts that have been found to occur if systematic grid search is utilized
•Step 3 local search and conditional estimation: identify all z*(x,y) that lie within a pre-defined radius of X or locate the N nearest neighboring z*(x,y) locations (e.g. N=16). Use the variogram model obtained in step 1 to provide a predicted (mean, m) value at X and estimated standard deviation (sd) at X, as per Ordinary Kriging, i.e. as a linear weighted combination of the N selected points (this is the conditional control applied); then, using the cumulative Normal distribution with mean m and standard deviation sd, select a random value (i.e. a random Normal variate in this case), and assign this as the estimate at X
These steps result in a single instance of a simulated grid at a given resolution (single level or multi-level process). The entire process is then repeated M times using a different random starting point and random number sequence. M may be quite large (e.g. M=1000) in order to obtain improved estimates of the surface mean and uncertainty (confidence bands), together with a smoother resulting surface. Inevitably the process is very computationally intensive, particularly if M is large, the grid resolution fine and/or multi-level processing is utilized.
Conditional simulation applied as an interpolation method on the zinc dataset yields results of the type shown in Figure 6‑47. The results illustrated are based on 100 iterations. Although Burrough and McDonnell (1998, 2015) recommend 500+ iterations, increasing the iterations with this dataset and modeling system (GS+) results in very minor alteration of the z-values ― it has been suggested that this particular implementation may over-smooth the results.
Comparing this result with that obtained by Ordinary Kriging (Figure 6‑46A) shows the resultant interpolations appear very similar, although the predicted data value range is greater in the conditional simulation model — Kriging tends to smooth and under-estimate the spatial variability of data as it is a form of weighted moving average. Predicted values in the unsampled areas are also often better behaved (smoother, less subject to artifacts) in the conditional simulation model. Figure 6‑47B shows the standard deviation map, which is substantially different from that obtained by Ordinary Kriging (Figure 6‑46B) — both the pattern and the range are altered, with the range being very much reduced. This is particularly apparent in regions where the initial data points are sparse or absent. Increasing M results in minor alteration of the mapped patterns and value range, for the standard deviations in this example. A particular additional feature of conditional simulation lies in its generation of a set of individual realizations. These may be used, for example, to create a distribution of the areas above or below a given value, which is not the case with Kriging.
A. z-values, M=100 (predicted values)
B. Standard deviations, M=100