Core concepts in Geostatistics

<< Click to Display Table of Contents >>

Navigation:  Surface and Field Analysis > Geostatistical Interpolation Methods >

Core concepts in Geostatistics


The Geary and Moran ratios previously described are very similar to the functions used within geostatistics to understand the pattern of variation of observed surface values. These latter functions measure differences in observed values over a set of distance bands or lags. The function most commonly used is known as the (experimental) semivariance, and has the general form:

where h is a fixed distance or lag, say 1000 meters, dij is the distance between points i and j, and Δ is the width of a band whose center is at h, say 100m wide ― hence in this case extending over the interval [950m,1050m]. The summation considers all pairs of observed data values whose spatial separation falls into the chosen band. There are N(h) of these pairs, hence the expression measures the average squared dissimilarity between the data pairs in this band. Important and useful relationships exist between the variogram, the variance (Var) and the covariance (Cov):

In these expressions the u vector is the set of locations at which the observations, z(u), have been made, and h is a separation vector (i.e. with both distance and directional components). Cov(h) is the stationary variance and Cov(u+h) is the stationary covariance, for all u and u+h.

In practice, software implementations first compute the squared differences between all pairs of values in the dataset, and then allocate these to lag classes based on the distance (and optionally direction) separating the pair. This provides a set of semivariance values for distance lags, h, increasing in steps from 0 (or 0 plus some small increment) to a value somewhat less than the greatest distance between point pairs. When this set of values is plotted against the separation distance, h, it provides a graph known as a variogram (Figure 6‑39).

Figure 6‑39 Sample variogram


In this example the dataset consists of the 98 soil pollution data records provided in Burrough and McDonnell (1998, Appendix 3). The specific variable analyzed is the level of zinc in parts per million (ppm) in the samples taken. Lags are shown at 200 meter intervals. The first black square shows the semivariance which has been calculated using 207 of the data pairs, i.e. those that fall into the distance band [0,200] meters. The total number of possible pairs in this dataset is 98*97/2=4753. This dataset will be used to illustrate several of the geostatistical procedures described in the following subsections. Note that the worked example in Burrough and McDonnell (1998, p140-141) using this dataset contains computational errors.

Sample size

With very dense data points many interpolation techniques will give similar results. As the density falls and becomes less regular selection of interpolation method becomes more critical and careful examination of the dataset and any known characteristics of the underlying surface becomes increasingly important. If the number of data points is too small (the sample size is too low) the method may either be unworkable (for example there are insufficient data pairs in a given distance band to provide meaningful information) or may produce unusable results. Some authors have suggested that ideally there should be at least 150 data points in order to obtain satisfactory semivariance estimates, although fewer points may be used with some success, especially where ancillary information about the data is available. Others suggest that the number of grid points interpolated should not be more than 20‑50 times the number of data points. In practice it is the variability of the underlying surface and the importance of capturing a given level of detail that provides the best guidance. Conversely, prior knowledge of the patterns of variation of spatial data may be used to direct subsequent sampling schemes (e.g. point spacing on a regular grid) to have the best chance of ensuring that prediction errors lie within predetermined bounds.


This is the term used to describe the length, area or volume applying to each of the (physical) samples taken, even though these may be nominally assigned to a point for computational purposes. Predicted values at other locations based on such data apply to similar supports at these locations. If a series of observations have been made in a small area or block, and then combined to produce an averaged value, again assigned to a point, the support would be comprised of this combined or “upscaled” data.


Source data points are often preferentially sampled, such that some areas are more densely sampled than others. Where this occurs the data may not provide a truly random sample from the hypothetical population of all possible values, and hence the distribution of sampled values will not be Normal even after simple transforms are applied. Some packages (e.g. ArcGIS Geostatistical Analyst, and those utilizing GStat) offer procedures for declustering the source data points by assigning weights to these points. One simple procedure for weighting is to use the area of the Voronoi polygon surrounding the point to adjust for clustering (this topic is also discussed in Section 5.1.2, Spatial sampling).


If the observed pattern of variation with distance (the variogram) can be modeled using a relatively simple equation (some function of distance), then values at unknown locations can be estimated by applying the model to obtain weights that may be used in an interpolation process, much the same as those described in Section 6.6. Indeed, there is a strong similarity between the methods used in radial basis interpolation (Section 6.6.4, Radial basis and spline functions) and some forms of geostatistical interpolation. The modeling of variograms and similar graphs is described later in this secion. The use of such models in interpolation is described in Section 6.7.2, Kriging interpolation. These methods are generally described as different forms of Kriging, so-named after the South African mining engineer, Krige, who introduced the general procedures used. Variograms can be re-scaled using the sample variance of the input data. This process is useful where several variograms on different variates are to be compared.


The term stationarity originates in the analysis of random processes, in particular in connection with time series. In this context a stationary random process is one for which all of its statistical properties (e.g. mean, variance, distribution, correlations etc.) do not vary over time. In the context of spatial analysis a stationary process would be one that is stationary in space. Stationarity is often described in terms of first order, second order etc. First order stationary processes must have a constant mean, irrespective of the spatial configuration of sampled points. Second order stationarity implies that the autocorrelation function depends solely on the degree of separation of the observations in time or space. A non-stationary process would be one where the process varies depending on when in time or where in space it is examined. Several techniques, including a number of those in regression modeling (e.g. GWR) accept that many spatial processes will be non-stationary and seek to model this characteristic explicitly.

Sill, range and nugget

The increase in the semivariogram values with increasing lags seen in Figure 6‑39 diminishes with distance and levels off, in this example at around 1100 meters. This distance is known as the range or active lag distance, and is the approximate distance at which spatial autocorrelation between data point pairs ceases or becomes much more variable. At this range a plateau or sill in the semivariance values has been reached (Figure 6‑40). Such variograms are called transitive. Non-transitive variograms are ones in which the sill is not reached within the region of interest. The term practical range is sometimes used and defined as the distance at which 95% of the sill is reached for an asymptotic variogram model (as in Figure 6‑40), or for which variation around the sill is <5% for a wave/hole-effect model (see further, Figure 6‑44K). When a curve is fitted to the set of points that lie within this range and sill the model used may have a zero or non-zero intercept with the y-axis. In Figure 6‑40 a non-zero intercept is shown, which is known as the nugget or nugget variance, C0, referring back to the original application of such methods to mineral exploration. The nugget is usually assumed to be non-spatial variation due to measurement error and variations in the data that relate to shorter ranges than the minimum sampled data spacing. The sill minus the nugget is sometimes known as the partial sill or structural variance, C. The values C and C0 often appear as parameters in fitted models.

Figure 6‑40 Sill, range and nugget



Because the techniques involved make assumptions about the statistical nature of the observed data values it may be necessary to transform these values prior to analysis, in order to achieve a closer approximation to Normality. Frequently this is performed via a log or Box-Cox transform (see Table 3.1, Statistical measures and related formulas, for a range of data transformation functions). In Figure 6‑41 we show a series of transformations of 301 Radon level measurements made in the are around Castleisland in South-West Ireland. These graphs and the associated test statistics were generated using the Minitab package. Three very large-valued outliers were removed from the original dataset prior to distribution analysis.

The raw data plotted in Figure 6‑41A is clearly non-Normal, diverging substantially from the straight line which indicates a Normal distribution. In this case simple Log transformation of the data improves the fit to Normal but still diverges, even when corrections are made for background radiation (Figure 6‑41C). In each of these three cases the Anderson-Darling test of fit to Normal fails, whereas for the Box-Cox transform (Figure 6‑41D) with optimized parameter, k, the test passes and in this instance analysis proceeded using this specific transform. The Anderson-Darling test is a variation on the Kolmogorov-Smirnov test, again based on the cumulative distribution function, but is more sensitive to the tails of the distribution.

Figure 6‑41 Data transformation for Normality


Source: F O’Sullivan (2005)


In many instances the variogram is not of constant form in all directions. This feature is known as anisotropy, and the example shown in Figure 6‑39 is of an isotropic variogram, i.e. one in which any directional variations are ignored. An alternative approach to the calculation of the semivariance would be to divide the lag band into a series of discrete segments, say at 45 degree intervals, and then calculate separate variograms for each direction. Since the directions are not oriented (i.e. directions such as 0 and 180 degrees are treated as equivalent) there would be four such variograms in this case at 0, 45, 90 and 135 degrees. More generally the variogram can be thought of as a function of two variables or surface, γ(θ,h), that may or may not be radially symmetric.

The semivariance surface may be plotted as a 2D map (Figure 6‑42) or 3D surface representation. The center of the map is the origin of the semivariogram, with a radially symmetric structure indicting that there is little or no anisotropy. If there is a strong directional bias, this can be regarded as the major axis of a directional ellipse, with a major axis providing the range in the primary direction, and the minor axis providing the range in the orthogonal direction. A single anisotropic model may be fitted to such datasets, or a series of separate models fitted to the data grouped into distinct directional bins. Note that the practical range may vary with direction in anisotropic modeling.

Figure 6‑42 Anisotropy 2D map, zinc data


Not all software packages use radial sectors — some (including ArcGIS) use approximations to this form based on a grid structure. Other packages, such as Isatis, support more complex directional models and 3D variograms and anisotropy (e.g. for mineral deposit applications). They take into account the problem that simple directional sectors, which can be seen as cone-like regions in 2D and 3D, widen as distance from the selected point increases. To ensure that only points from a narrower band of directions are selected they provide for cylindrical and block-like anisotropic modeling.

Indicator semivariance

Some spatial variations of continuous datasets are less “smooth” than others. For example the concentrations of radioactive isotopes in soil samples is much lower than zinc or nickel, but may show marked peaks in some locations. Instead of using the z-values of the source data points directly in such cases, it may be more meaningful to record or transform the data as a binary set, with z-values above a pre-specified value as having a value of 1, and those below this threshold, t, having a value of 0 (or vice versa). The semivariogram formula is then applied, but this time to the {0,1} values rather than the original values. These new values are called indicators, and are often denoted in the form i(u,t). Indicator semivariance facilitates modeling of the probability that the variable of interest exceeds a given threshold value, t, at unknown points.

As noted above, indicator semivariance analysis is a form of analysis on transformed variables, in this case thresholding to binary values. As with other variograms, indicator variograms may be calculated from the source data values transformed to binary, or re-scaled values using the variance of the sample data. Figure 6‑43 illustrates both variants, using the same dataset as in Figure 6‑40. Typically the re-scaled graph (Figure 6‑43B) will show values in the range [0,1]. This scaling should be removed before proceeding with interpolation.

Figure 6‑43 Indicator variograms

A. Simple indicator variogram of zinc dataset


B. Re-scaled indicator variogram of zinc dataset



Spatial datasets that contain measurements on two or more variables may be analyzed in a similar manner to the procedures described for a single variate. Typically analysis is performed on pairs of variates, {z} and {w} say, where patterns of co-variation are of interest. Examination of such patterns may be of theoretical interest, for example in helping to understand underlying processes, or may assist in the prediction of one variate based on better information or more data that are available for another variate.

Comments on geostatistical software packages

The following (edited) observations on facilities provided in available geostatistical software (stand-alone or within/linked to GIS) are taken (with permission) from the AI‑GEOSTATS website prior to its re-implementation:

Display of proportional/classed symbol maps: often the first step in the analysis of geostatistical data. Showing symbols with a size that is proportional or characteristic of their attributed value can help the data analyst to identify outliers and trends.

Moving windows statistics: geostatistics assumes second order stationarity in the data: the mean has to exist and is constant and independent of location within the region of stationarity. The covariance has to exist as well, and is only dependent on the distance between any two values, and not on the locations. One way to evaluate second order stationarity is to divide the studied area into cells and compare the mean and the covariance between all cells. The moving windows technique uses an overlap of the cells as if a window was moving over the whole dataset.

Sampling network analysis: techniques used to quantify the degree of irregularity of the sampling network. These include the analysis of: Thiessen polygons; the fractal dimension of the sampling network; and the use of the Morishita Index. Nearest-neighbors statistics and many other tests can also be used to describe the spatial distribution of the measurements.

Trend analysis: An underlying trend may appear in the data and needs to be taken into account when applying Ordinary Kriging (Universal Kriging is supposed to take the drift into account). The drift is frequently modeled with a simple linear or a quadratic model.

Declustering: When global estimations are desired one needs to obtain statistics that are representative of the whole area of interest. By attributing lower weights to clustered measurements one can reduce their influence and so limit the bias one would have obtained without a prior declustering. Two declustering techniques are frequently used: the cell and the polygonal declustering method.

Variogram analysis: The use of spatial autocorrelation being one of the main keys in a geostatistical case study, one would expect the GIS to provide at least the possibility to calculate the experimental semivariogram.

Interactive variography: The display of the experimental semivariogram is not enough to analyze the spatial autocorrelation. Prior to the modeling of the experimental semivariogram one needs to evaluate its robustness. Software packages such as Isatis provide tools for the interactive analysis of the spatial autocorrelation. Their success comes from the high level of interactivity between pairs of samples, h-scatterplots, variogram clouds and the experimental variogram. Such interactivity is necessary in order to identify and remove outliers as well as to evaluate the impact of the choice of the distance of the lags.

Whilst many software packages provide automatic fitting functions of the experimental semivariograms, most practitioners still prefer to adjust their model by hand. As an example, the additional knowledge of a component of the nugget effect (e.g. documented measurement errors) will help to define a more realistic value of the nugget than the one that would have been calculated with an automatic fitting. Therefore, the interactivity during the variogram modeling still remains an essential requirement of a good package.

In many case studies a spatial structuring of the studied phenomenon appears in a particular direction. Not taking this anisotropy into account may strongly affect conclusions derived from the analysis of the spatial correlation or from the estimations.

Models implemented: Very often, spatial interpolation functions are implemented like black boxes in GIS or contouring software. It is even truer with geostatistical techniques that require the users to understand properly the underlying theory. The quality of the documentation of the available geostatistical functions should be a key for the GIS users willing to apply their tools to a geostatistical case study.

Additional interpolation/estimation functions are more than welcome since a geostatistical case study can be very time consuming, especially with very large datasets. On the other hand if one has the time to compare the efficiency of the various functions, one can expect to find that sometimes a non-geostatistical method has performed better, according to the user’s criteria, than geostatistical techniques. Here again, clear documentation of the implemented functions is essential.

Search strategies: The search strategy of the neighboring measurements to use during the estimation/interpolation step has clearly a strong impact on the estimates. One should ideally have the possibility to select: the number of measurements to use; the maximum distance at which one should search; and the measurements according to their locations. For example, one should have the possibility to select few points in each quarter of the circle used to select the neighbors. As for the analysis of the spatial correlation, one should have the possibility to use an ellipse of search rather than a circle

Masking: When the studied area has an irregular shape, or when estimates are not desired at certain locations, one should have the option to prevent certain regions from being interpolated by defining inclusive or exclusive polygons.

Semivariance modeling

Assuming a dataset has been examined, outliers identified and removed or masked where appropriate, data transformed where necessary and semivariances computed, modeling of the variogram may proceed. A large variety of alternative models have been proposed over recent years, and many of the more popular models are incorporated into GIS packages and geostatistical software. Table 6‑3 shows a selection of the principal models provided within such packages and Figure 6‑44 provides sample graphs for each of those listed. The majority of these models are included within ArcGIS Geostatistical Analyst (which also includes a number of other models) and Surfer, and in packages that are based on or utilize GStat, such as Idrisi, PCRaster and GRASS. Other products, such as GS+, GSLIB may provide a smaller range of models but offer considerable flexibility in the modeling and display process (see also, the R spatial projects page on the GeoDa website). In Table 6‑3 the distance variable, h, is pre-scaled by the effective range, a. In much of the literature and product documentation sets this scaling is not shown, so where we show h they may show h/a, h/d, or a similar ratio. Linear combinations of models are widely supported, for example: p*nugget + q*spherical1 + r*spherical2, but justifying, selecting and combining the component functions may be difficult.

Selecting and fitting variograms is something of an art rather than a science. Many packages provide a default model and attempt to find the best set of parameters to fit the dataset, whilst others apply this process for all models they support and select the one with the highest correlation coefficient or lowest residual sum of squares. Selecting only models that are asymptotic to the sill (i.e. to 1 in the diagrams shown) provides a useful first level of discrimination between functions. Both the selection of active lag distance to be considered, and the lag interval to be used, will affect the model fitted — the use of values that are too large in either case may result in over-large estimates of nugget variance. An example for which a combined model might be deemed appropriate is where multiple (nested) scales of spatial variation are observed.

Examining the profiles of these functions it is clear that the main differences between many of them applies in the first and last quarters of the effective range, hence close examination of the data within these ranges is advisable. Anisotropy may be automatically modeled or require selection, with attributes such as size and type of sectors being specified. Optimized selection of model and/or parameters under anisotropy may well alter the major axis of anisotropy identified to some extent, depending on the model selected. Most packages do not allow interactive alteration of models and parameters (an exception is Isatis), but all will allow exploration by trial-and-error, re-calculating the model to user-defined specifications. One difficulty with these approaches is that it is very difficult to replicate results across packages, even when selected parameters are “forced” by the user.

Table 6‑3 Variogram models (univariate, isotropic)




Nugget effect

Simple constant. May be added to all models. Models with a nugget will not be exact


No sill. Often used in combination with other functions. May be used as a ramp, with a constant sill value set at a range, a



Useful when the nugget effect is important but small. Given as the default model in some packages.



k is a constant, often k=1 or k=3. Useful when there is a larger nugget and slow rise to the sill — see for example, Figure 6‑40.


k is a constant, often k=3. Can be unstable without a nugget. Provides a more s-shaped curve.



Rational quadratic

k is a constant. ArcGIS uses k=19


No sill. 0<n<2 is a constant


No sill.


Compare to Gaussian — S-shaped curve with well-defined sill


Similar to Circular with extra final term



Wave hole effect

k is a constant, often 2π. Useful where periodic patterns in the data with distance are observed or expected. cos() rather than sin() function is sometimes used



Figure 6‑44 Variogram models — graphs

A. Linear

B. Spherical

C. Exponential, k=3




D. Gaussian, k=3

E. Quadratic

F. Rational quadratic




G. Power, k=0.5

H. Logarithmic

I. Cubic




J. Pentaspherical

K. Wave hole effect

L. Circular





Fractal analysis

Determining the fractal dimension of terrain surfaces is a contentious process. In practice such analysis involves obtaining an estimate that is highly dependent on the way in which the data have been captured and stored, and the modeling choices made by the user, rather than simply being a reflection of the complexity of the original surface. For many datasets the fractal dimension as computed in one program will differ from that computed by other programs, and may only apply across a limited scale range.

Figure 6‑45 shows the result of fractal analysis of one of our test surfaces, OS tile TQ81NE, using the Landserf package. The fractal dimension in this example is computed by selecting a fixed interval lag and computing the standard variogram for the surface based on this lag, assuming an isotropic process, and then plotting the log of the lag against the log of the variogram. The best fit line y=ax+b is then used to compute the fractal or capacity dimension, DC, using the expression:


Hence in this example DC=2.2. The Pentland Hills test dataset (tile OS NT04) has a higher value of 2.4, although interpretation of the computed differences is not straightforward. This approach yields a single measure of fractal dimension for the entire surface.

Figure 6‑45 Fractal analysis of TQ81NE


Idrisi computes fractal values for an entire grid using a 3x3 window or kernel, based on the slope values computed for that window. If sij is the slope value in degrees from 0 at location (i,j) then the fractal dimension DC,ij is computed as:

where the logs can be in any base. This formula results in a linear (profile-like) fractal measure, providing an index of surface texture in the range [1,2].

Madograms and Rodograms

Madograms are computed in the same way as variograms, but the absolute values of lagged differences are calculated rather than the squared differences. Hence the equivalent expression to the semivariogram is:

The use of absolute values rather than squared differences reduces the impact of clustering and outliers, assisting in the determination of range and anisotropy for data with these characteristics. They may be seen as a supplement to variogram analysis and are not used subsequently in interpolation. Madograms should not be used for modeling nugget variance.

Rodograms are essentially the same as Madograms, and are used in a similar manner, but take the square root of the absolute values:

Periodograms and Fourier analysis

A surface represented as a grid, z(x,y), may be analyzed to identify possibly two-dimensional periodicities in the data using a combination of sin() and cos() functions. Surfer for example, analyses the grid looking for frequencies that are integer multiples of 2π/k, where k represents the number of grid cells in the x or y directions. This form of analysis, whilst widespread in many disciplines, has only been applied in a few areas of spatial analysis, for example meteorological modeling.