Geographically Weighted Regression (GWR)

<< Click to Display Table of Contents >>

Navigation:  Data Exploration and Spatial Statistics > Spatial Regression >

Geographically Weighted Regression (GWR)

GWR is the term introduced by Fotheringham, Charlton and Brunsdon (1997, 2002) to describe a family of regression models in which the coefficients, β, are allowed to vary spatially. GWR uses the coordinates of each sample point or zone centroid, ti, as a target point for a form of spatially weighted least squares regression (for some models the target points can be separately defined, e.g. as grid intersection points, rather than observed data points). The result is a model of the form:

The coefficients β(t) are determined by examining the set of points within a well-defined neighborhood of each of the sample points. This neighborhood is essentially a circle, radius r, around each data point (anisotropic modeling is not currently supported). However, if r is treated as a fixed value in which all points are regarded as of equal importance it could include every point (for r large) or alternatively no other points (for r very small). Instead of using a fixed value for r it is replaced by a distance-decay function, f(d), as described earlier (Section 4.4.5, Distance decay models). This function may be finite or infinite, much as with kernel density estimation (Section 4.3.4, Density, kernels and occupancy). The functions utilized in the GWR software package are of the form:

In these functions the parameter, h, also known as the bandwidth, is the key factor determining the way in which the weighting schemes operate. A small bandwidth results in very rapid distance decay, whereas a larger value will result in a smoother weighting scheme. This parameter may be defined manually or alternatively by some form of adaptive method such as cross-validation minimization, e.g. jackknifing — see further, Section 6.7.2, Kriging interpolation, and Efron (1982) and Efron and Tibshirani (1997), or minimization of the Akaike Information Criterion (AIC). The use of a kernel function also raises the possibility of generating additional descriptive statistics, as have been described earlier.

Using a selected kernel function and bandwidth, h, a diagonal weighting matrix, W(t), may be defined for every sample point, t, with off-diagonal elements being 0. The parameters β(t) for this point can then be determined using the standard solution for weighted least squares regression:

or, letting

The standard errors of the parameter estimates can be computed as the square root of these variances and used in t-tests to obtain estimates of the significance of the individual components. In this model the variance component, σ2, is defined by the normalized residual sum of squares (RSS) divided by the degrees of freedom. The latter are defined by the number of parameters, p, in a global model, or the effective number of parameters in the GWR model. This value is approximated by the authors as the trace of a matrix S, tr(S) (the sum of the diagonal elements of S) defined by the relation:

A set of such equations is solved for all points, t. The fit of the model may be examined in the usual manner, although it is to be expected that the fit in terms of variance explanation will almost always be an improvement over global methods, if only because there are far more parameters fitted to the dataset. For this reason comparisons should be made on additional criteria, for example the AIC measure which takes account of the model complexity. As with conventional regression, the modeled surface and (standardized) residuals may be mapped for exploratory purposes, but additionally the parameters β(t) and their estimated standard errors may also be mapped since these also vary spatially. Within GWR standardized residuals are determined as the sum of squared residuals, εεT, divided by the degrees of freedom, n‑tr(S). The authors recommend examining any values for these residuals >3.

To illustrate this process we shall use an example dataset comprising educational attainment by county in the state of Georgia, USA (the dependent or response variable, Table 5‑12). This dataset lists the percentage of University graduates by county together with a range of social data that might act as independent variables to be used in predicting the dependent variable. The data have been assigned to a set of 159 point locations (county centroids) and show an overall average of 19% of the population recorded as being graduates, with an average per county of 10.9% (i.e. not population weighted). The range by county is from 4.2% to 37.5%. Table 5‑12 shows the predictor variables and the global regression parameters estimated by OLS, which collectively account for around 63% of the variance. Also shown are the GWR parameter estimates, expressed as a range of values that have been computed. In the diagnostics section of this table note the drop in residual sum of squares, the increase in the adjusted R2, and a modest fall in the AICc statistic. The authors suggest that a fall of 3 or more in the AICc value warrants examination as demonstrating a meaningful improvement in model fit. Note that differences in the method of calculation of the AIC statistic can easily result in differences of greater than 3, so caution is required when comparing alternative software packages on this measure. As mentioned above, the standardized residuals from the GWR predictions can be mapped in order to identify any prediction outliers. Figure 5‑38A shows this mapped dataset, with the dark gray/dark blue and red counties illustrated being those with the highest and lowest deviations. These counties may then be examined to try and ascertain if there are any special characteristics of these cases that might explain the large residuals. By definition the GWR modeled parameters falling within the range shown in Table 5‑12 include a value for every county. Hence each parameter can also be mapped, as shown in Figure 5‑38B. In this case the map highlights a distinct pattern of variation, with higher values in the north and lower values in the south.

As Fotheringham et al. (2005) have noted: “In some instances … it is difficult to justify why some relationships should be allowed to vary spatially. In others, empirical results may suggest that some relationships are stationary over space while others vary significantly. In these instances, ‘mixed’ GWR models, where some relationships are allowed to vary spatially while others are held constant, would seem to be more appropriate.” For such cases the authors are planning to release a Version 4 of the GWR software which will provide specific support for mixed models.

The same kind of GWR analysis can be carried out on count data, using Poisson regression (GWPR), and on binary data, using Logistic regression (GWLR). The GWR V3 program supports both models. The standard Poisson and Logistic regression models described in Table 5‑11 are utilized, but with the coefficients β(t) varying with location, t, as before. As an example, GWPR has been applied to counts of disease incidence amongst a particular age/sex grouping, recorded by health district. In such models an offset value is applied to the model based on a matching count variable, such as the total number of people in that district in the selected age/sex cohort. For example, Nakaya et al. (2005) applied GWPR to mortality rates in Tokyo. The dependent variable in this instance was based on the standardized mortality ratio (SMR) for each of 262 municipality zones. The SMR is defined as the observed number of deaths, Oi, in a specified time period (e.g. 1990) in a given zone i, divided by the expected number, Ei, for that zone based on national or regional mortality rates (i.e. by demographic grouping). In the GWPR model the Oi became the response variable and the Ei provided the offset values. The study was able to examine relationships between the dependent variable and a number of independent variables at the local level, highlighting particular relationships that global models may not have identified. In fact the best model the authors were able to produce included a mix of global and local parameter estimates, with the proportion of older people (64+) and of house-owners taken as globals and the proportion of professional and technical people and the proportion of unemployed people being allowed to vary regionally.

For Logistic GWR one might have true presence/absence data, or recoded continuous data based on some critical threshold value. For example, with the Georgia dataset the dependent variable could be recoded as 1 if the percentage of graduates is above the state average and 0 otherwise. This is rather an artificial example, but recoding of this type is often applied in decision-making — for example coding land as contaminated (1) if, say, the average measured cadmium level in the soil exceeds a certain number of parts per million (ppm) and not contaminated (0) with respect to this trace element if below this threshold level.

Table 5‑12 Georgia dataset — global regression estimates and diagnostics

Predictor variables

Global parameter estimate

GWR parameter estimates

Total population, β1

0.24 x10‑4

0.14 to 0.28 x10‑4

% rural, β2


‑0.06 to ‑0.03

% elderly, β3

‑0.06 (not signif.)

‑0.26 to ‑0.06

% foreign born, β4


0.51 to 2.42

% poverty, β5


‑0.20 to –0.00

% black, β6

0.022 (not signif.)

‑0.04 to 0.08

Intercept, β0


12.62 to 16.49




Residual SS



Adjusted R2



Effective parameters






Figure 5‑38 Georgia educational attainment: GWR residuals map, Gaussian adaptive kernel

A. Standardized residuals

B. Parameter 5: % foreign born, β4



Both Poisson and Logistic GWR require model fitting using a technique known as iteratively reweighted least squares (IRLS). The analysis is carried out in much the same manner as previously described, but the computation of the Akaike Information Criterion (AIC) and AICc differs from the OLS expressions.

The ready availability of GWR software, supporting Gaussian, Poisson and Logistic models, together with a companion book and materials, has resulted in an upsurge in interest in the technique. This includes its consideration by spatial econometricians, medical statisticians and ecologists amongst others (GWR support has recently been included in the R-Spatial collection as spgwr and in the SAM ecology package, as well as in the latest versions of ESRI’s ArcGIS). It has the attraction of accepting the non-stationarity of most spatial datasets (see further, Section 6.7.1, Core concepts in Geostatistics), and proceeding to create models with improved information characteristics and amenable to further exploratory analysis. For large-scale problems the processing overheads of GWR may become prohibitive, but the technique is well-suited to parallel or grid-enabled processing, as has been demonstrated by Harris et al., 2006 and others.