Density, kernels and occupancy

<< Click to Display Table of Contents >>

Navigation:  Building Blocks of Spatial Analysis > Queries, Computations and Density >

Density, kernels and occupancy

In Section 4.3.3, Ratios, indices, normalization, standardization and rate smoothing, we noted that spatially extensive variables, such as total population, should not be plotted directly for zones under normal circumstances, but standardized in some manner. An exception to this general rule may be made where cartograms rather than conventional maps are produced (see below, Cartograms). For such maps the areas are distorted in some manner to reflect the (positive) magnitude of an attribute of each region, whilst retaining a sense of the original areal-based map. Densities, with respect to the variable under consideration, thus become uniform in such maps.

For variables that are a subset of a total it is often useful to compute and map these data as a proportion of this total, but for totals themselves we will need to compute a density measure — dividing the total population, N, by the zone area, A. This yields a density value, N/A, for each zone, but assumes that population density is constant through the zone and then may suddenly change to some other value at the zone boundaries. In our discussions on boundaries and on areal interpolation we have already alluded to the inadequacies of simple density calculation, but it remains in widespread use. Conversion of the source dataset to a raster model using some form of intelligent interpolation may well provide a more satisfactory result. A good introduction to mapping density is provided in Mitchell (1999, Chapter 4). Techniques for computing density estimates, including so-called kernel density measures, are discussed in the subsections below.

Point density

If the dataset we have is not zone-based but point- or line-based, alternative methods of determining density are required. The simplest measures apply some form of zoning and count the number of events in each zone, returning us to a zone-based dataset. For example, with data on thefts from cars (TFC events), each event may be geocoded and assigned to a unique zone such as a pre-defined 100mx100m grid (or quadrat), or to city blocks, census blocks, or policing districts. The density of TFC events is then simply the event count divided by the area of the associated zone in which they are reported to have occurred.

A difficulty with this approach is that densities will vary substantially depending on how the zoning, grid or quadrat is selected, and on how variable are the size and associated attributes of each zone (e.g. variations in car ownership and usage patterns). Overall density calculation for all zones (e.g. a city-wide figure for the density of TFC events) suffers similar problems, especially with regard to definition of the boundary (such problems are generally known as edge effects, and methods to adjust for edge effects are known as edge correction or edge bias correction methods). Use of s uch information to compare crime event levels within or across cities is clearly fraught with difficulties. An alternative approach, particularly suited to ecological datasets, is to define zones based on a covariate of interest, for example the underlying geology and terrain slope. Each classified area can then be taken as a separate zone type and densities computed within these (polygonal) boundaries. This type of computation is supported within spatstat.

An alternative approach to density computation for two-dimensional point-sets is based on techniques developed in one-dimensional (univariate) statistical analysis. This is simplest to understand by looking at the example in Figure 4‑41, below. We have a set of 5 events marked by crosses. They occur at positions 7, 8, 9, 12 and 14 along the line. We could argue that the point density across the entire 20-unit line length is 5/20=0.25 points per unit length and assign a value of 0.25 to each section, as shown on the gray line. We might equally well argue that if we divide the overall line into two halves, the density over the first half should be 0.3 per unit length and over the second, 0.2 per unit length, to reflect the variation in positioning of the 5 events.

There is clearly no single right answer or single method to assign the points to the line’s entire length, so the method we choose will depend on the application we are considering. Important observations to notice about this problem include: the length of the line we start with seems to have an important effect on the density values we obtain, and since this may be arbitrary, some method of removing dependence on the line length is desirable; if the line is partitioned into discrete chunks a sudden break in density occurs where the partition boundaries occur, which is often undesirable; depending on the number of partitions and distribution of points, areas may contain zero density, even if this is not the kind of spread we are seeking or regard as meaningful; the line is assumed to be continuous, and allocation of density values to every part is valid; and finally, if we have too many partitions all sections will only contain values of 1 or 0, which is essentially back to where we started from.

Figure 4‑41 Point data


Figure 4‑42 Simple linear (box or uniform) kernel smoothing


These observations can be dealt with by treating each point in the original set as if it was spread over a range, then adding together overlapping zones and checking that the total adds up to the original value. For example, choosing each point and smoothing it over 5 units in a uniform symmetric manner, we obtain the result shown in Figure 4‑42. The rows in the diagram show the spreading of each of the 5 original points, with the total row showing the sums (densities) assigned to each unit segment. These add up to 5, as they should, and a chart showing this distribution confirms the pattern of spread. This method still leaves us with some difficulties: there are no density values towards the edges of our linear region; density values still jump abruptly from one value to the next; and values are evenly spread around the initial points, whereas it might be more realistic to have a greater weighting of importance towards the center of each point. All of these concerns can be addressed by selecting a well-defined, smooth and optionally unbounded function, known as a kernel, and using this to spread the values. The function often used is a Normal distribution, which is a bell-shaped curve extending to infinity in each direction, but with a finite (unit) area contained underneath the bell (Figure 4‑43).

In Figure 4‑43, for each point (7,8,9,12 and 14) we have provided a Normal distribution curve with central value (the mean) at the point in question and with an average spread (standard deviation) of 2 units. We can then add the areas under each of these curves together to obtain a (cumulative) density curve with two peaks (dotted line), and then divide this curve by 5 to adjust the area under the curve back to 1 giving the lower (red) curve shown. When adjusted in this way the values are often described as probability densities, and when extended to two dimensions, the resulting surface is described as a probability density surface, rather than a density surface. We now have a density value for every position along the original line, with smooth transitions between the values, which is exactly what we were trying to achieve.

Figure 4‑43 Univariate Normal kernel smoothing and cumulative densities


Figure 4‑44 Alternative univariate kernel density functions


There still remain some questions: why should we use the Normal distribution? could we not use almost any unimodal symmetric function with a finite area under it? and why did we choose a value of 2 units for the average spread? The answer to these questions is that the specific selections made are a matter of choice and experience, although in some instances a symmetric distribution with a finite extent (e.g. a box or triangular function) may be regarded as more suitable than one with an infinite possible extent. Figure 4‑44 shows a selection of commonly used functions plotted for the same point set, using the MATLab Statistics Toolbox function ksdensity(). As may be seen from examining the various curves shown in Figure 4‑44 the exact form of the kernel function does not tend to have a major impact on the set of density values assigned across the linear segment (or area in 2D applications). Of much greater impact is the choice of the spread parameter, or bandwidth. For further details on these and other functions used for smoothing and density estimation see Bowman and Azzalini (1997) and Silverman (1986).

Two-dimensional kernel density

All of this discussion so far addresses problems in one dimension (univariate density estimation). We now need to extend the process to two dimensions, which turns out to be simply a matter of taking the univariate procedures and adding a second dimension (effectively rotating the kernel function about each point). If we were to use the Normal distribution again as our kernel function it would have a two-dimensional bell-shaped form over every point (Figure 4‑45). As before, we place the kernel function over each point in our study region and calculate the value contributed by that point over a finely drawn grid. The grid resolution does not affect the resulting surface form to any great degree, but if possible should be set to be meaningful within the context of the dataset being analyzed, including any known spatial errors or rounding that may have been applied, and making allowance for any areas that should be omitted from the computations (e.g. industrial zones, water, parks etc. when considering residential-based data). Values for all points at every grid intersection or for every grid cell are then computed and added together to give a composite density surface.

Figure 4‑45 2D Normal kernel


The resulting grid values may be provided as: (i) relative densities — these provide values in events per unit area (i.e. they are adjusted by the grid size, giving a figure as events per square meter or per hectare) — this is the default or only option in many GIS packages, including ArcGIS; (ii) absolute densities — these provide values in terms of events per grid cell, and hence are not adjusted by cell size. The sum of the values across all cells should equal the number of events used in the analysis; (iii) probabilities — as per (ii) but divided by the total number of events. Crimestat supports all three variants. Computed density values may then be plotted in 2D (e.g. as density contours) or as a 3D surface (as in Figure 4‑46). In this latter case the kernel density procedure has been applied to a dataset of reported cases of lung cancer in part of Lancashire, England. This example is discussed further in Section 5.4.4, and fully in Diggle (1990) and in Baddeley et al. (2005). The software used in this instance was Crimestat, with a Normal kernel function and average spread (bandwidth) determined from the point pattern itself. Map data were output in ESRI shape file format (SHP) and mapped in ArcGIS to produce a 2D map and in ASCII grid format for visualization in 3D using Surfer. Other GIS packages support a variety of kernel functions and procedures. ArcGIS Spatial Analyst provides kernel density estimation for point and line objects, but only supports one kernel function, which it describes as a quadratic kernel (a bounded kernel) but which is often described as an Epanechnikov kernel (see further, Table 4‑8). In general KDE software implementations do not take boundaries or barriers into account (fixed or porous) — typically areas are simply masked off. which can result in inaccuracies when computing probabilities or matching absolute densities to regionalized values.

MapInfo’s grid analysis add-on package, Vertical Mapper, includes kernel density mapping as do the spatial statistics packages splancs and spatstat, and the crime analysis add-on for MapInfo, Hot Spot Detective (in the latter two cases based on quartic kernels). TransCAD/Maptitude supports what it describes as density grid creation with the option of count (simple), quartic, triangular or uniform kernels. Crimestat supports four alternative kernels to the Normal, all of which have finite extent (i.e. typically are defined to have a value of 0 beyond a specified distance). These are known as the quartic, exponential, triangular and uniform kernels. If kernel density methods are applied in urban areas the use of network kernel density estimation should be considered (see further, below), as Okabe et al. (2009) have shown that the use of planar KDEs can lead to erroneous conclusions regarding clustering for point events on networks.

Figure 4‑46 Kernel density map, Lung Case data, 3D visualization


The details of each of the main kernel functions used in various GIS and statistical packages are as shown in Table 4‑8. The value at grid location gj, at a distance dij from an event point i, is obtained as the sum of individual applications of the kernel function over all event points in the source dataset. The table shows normalized functions, where the distances dij have been divided by the kernel bandwidth, h, i.e. t=dij/h. Graphs of these functions are shown in Figure 4‑47, where each has been normalized such that the area under the graph sums to 1.

Whether the kernel for a particular event point contributes to the value at a grid point depends on: (i) the type of kernel function (bounded or not); (ii) the parameter, k, if applicable, which may be user defined or determined automatically in some instances; and (iii) and the bandwidth, h, that is selected (a larger bandwidth spreads the influence of event points over a greater distance, but is also more likely to experience edge effects close to the study region boundaries). Event point sets may be weighted resulting in some event points having greater influence than others.

Bandwidth selection is often more of an art than a science, but it may be subject to formal analysis and estimation, for example by applying kernel density estimation (KDE) procedures to sets of data where actual densities are known. An alternative to fixed bandwidth selection is adaptive selection, whereby the user specifies the selection criterion, for example defining the number of event points to include within a circle centered on each event point, and taking the radius of this circle as the bandwidth around that point. A somewhat different form of adaptive density model is provided within the spatstat software. In this case a random sample of points (a fraction, f%) is taken, and this is used to create a Voronoi tessellation of the sample point set. The density of the remaining points (1-f%) is then computed for each cell of the tessellation. The process is then replicated n times and the average densities of all replicates computed.

Table 4‑8 Widely used univariate kernel density functions



Comments. Note t=dij/h, h is the bandwidth

Normal (or Gaussian)

Unbounded, hence defined for all t. The standard kernel in Crimestat; bandwidth h is the standard deviation (and may be fixed or adaptive)

Quartic (spherical)

Bounded. Approximates the Normal. k is a constant

(Negative) Exponential

Optionally bounded. A is a constant (e.g. A=3/2) and k is a parameter (e.g. k=3). Weights more heavily to the central point than other kernels

Triangular (conic)

Bounded. Very simple linear decay with distance.

Uniform (flat)

Bounded. k=a constant. No central weighting so function is like a uniform disk placed over each event point

Epanechnikov (paraboloid/quadratic)

Bounded; optimal smoothing function for some statistical applications; used as the smoothing function in the Geographical Analysis Machine ( GAM/K) and in ArcGIS

Kernel smoothing, or kernel density estimation methods (KDE methods) of the type described have a variety of applications: exploratory point data analysis; point data smoothing; creation of continuous surfaces from point data in order to combine or compare these with other datasets that are continuous/in raster form; probability distribution estimation; interpolation (although this terminology is confusing and not recommended — Crimestat is amongst a number of packages that use this terminology, which is essentially incorrect); and hot spot detection.

Figure 4‑47 Univariate kernel density functions, unit bandwidth

A. Constant

B. Normal, SD=1



C. Exponential

D. Quadratic



E. Quartic

F. Triangular (or linear)



KDE can also be used in visualizing and analyzing temporal patterns, for example crime events at different times of day and/or over different periods, with the objective of understanding and potentially predicting event patterns. KDE can also be applied with more than one point set, for example a set of cases and a set of controls. The output of such “dual” dataset analysis is normally a ratio of the primary set to the secondary set (or ratio of the log transform of the variables), with the objective being to analyze the primary pattern with background effects being removed or minimized. Care must be taken in such cases that the results are not subject to distortion by very low or zero values in the second density surface, and that common bandwidths are used. Crimestat provides support for 6 alternative dual density outputs: simple ratio; log ratio; difference in densities (two variants, with and without standardization); and sum of densities (again, with and without standardization). Levine (2007, Chapter 8) provides an excellent discussion of the various techniques and options, including alternative methods for bandwidth selection, together with examples from crime analysis, health research, urban development and ecology.

The use of kernel functions as a form of point weighting enables the creation of local weighted means and variances. The locally weighted (kernel) statistics supported within the GWR software are defined as follows:


These are local statistics, based on locations, u, and weighting function w() defined by a fixed or adaptive kernel function. The bandwidth in this case must be user-defined, and might be chosen on the basis of experience or could use an adaptively derived bandwidth (e.g. from an associated regression study) for comparison purposes.

Kernel density for networks

The preceding discussion is not readily applicable to the case of points on networks, such as accidents along roads or thefts in urban areas. In these instances one might attempt to compute the density by applying a two-dimensional kernel procedure as described in the preceding section, overlaying the network, and examining the density patterns along the network routes. However, this approach has been shown to produce biased density estimates (e.g. it suggests variable densities along routes when these are known to be uniform). To resolve this problem kernel density computations in situations where events occur along or adjacent to networks, should be calculated using shortest paths and adjusted kernel functions (the conventional functions described earlier are not suitable without modification. This question has been addressed by Okabe et al. (2009) and is available in their SANET software. They show that a non-continuous fixed bandwidth kernel function, which is unimodal and symmetric, can be used to compute unbiased kernel densities on large networks (e.g. the street network of Kashiwa, Japan, with 35,235 links, 25,146 nodes and 3109 point events). The function used is essentially a modified form of the linear or triangular kernel (Figure 4‑47F), applied to networks which may divide into multiple segments at junctions, but for which the integral of the kernel function is held at unity. The authors suggest that for problems of the type they have investigated (i.e. dense, urban networks), the bandwidth should be 100-300metres.

Line and intersection densities

Density methods may be applied to linear forms as well as point forms. Some statistics of interest include:

the number of line segments per unit area (sometimes called the line frequency, λ, rather than the density)

the length of line segments (by type) per unit area. The equation La=L0e‑r/b has been found to provide a good model of the distribution of urban road length with radial distance, r, from many city centers. L0=the road density at the city center, which for London, UK is c.19.5 kms/km2; b is a parameter to be fitted from sample data. The inverse of this density measure is the average area associated with a unit of path length, sometimes called the constant of path maintenance

the number of line intersections per unit area

the number of line intersections per unit of length (the latter is known as the intersection density, k). This is a particular case of the density of points on lines, which may be subject to kernel density smoothing in the manner discussed earlier in this Section. It can be shown that if λ is the line density (frequency) then for a random network of straight lines there will be an average of k= λ2/π intersections per unit length. K=k/λ2 is sometimes called the crossings factor, and experimental evidence suggests that the random model provides an upper limit to this measure, so K has the range [0,1/π].

All these measures provide information that may be useful for analysis, including inter-regional comparisons. Applications include hydrology, ecology and road traffic analysis. For further discussion of applications in these areas and statistical measures see Haggett and Chorley (1969, Chapter 2.II) and Vaughan (1987, Chapter 3). A number of GIS packages provide facilities for kernel density mapping of lines (i.e. not points on lines − this is covered above) as well as points (e.g. ArcGIS Spatial Analyst, Density operations). Essentially these work in a similar manner to 2D point operations, spreading the linear form symmetrically according to a specified kernel function. ArcGIS provides two sets of point density and line density functions (for SIMPLE or KERNEL density operations), one set that results in a grid being generated and the other for use in map algebra environments (i.e. where a grid already exists). SIMPLE density operations generate a set of grid values based on a neighborhood shape (e.g. circular, annular, rectangular, or polygonal) and size (e.g. radius). Similar facilities do not appear to be available with most other mainstream GIS packages. The GRASS open source package include a vector processing function v.kernel (formerly s.kernel) that now includes linear kernel generation (i.e. a 1D kernel) in addition to its generation of 2D raster grids from vector point sets with a Normal (Gaussian) kernel.


A cartogram is a form of map in which some spatially extensive variable such as population size or gross national product is substituted for land area. The geometry or space of the map is distorted (transformed) in order to convey the information of this alternate variable. The end result is a map or diagram in which the density of the variable under consideration is uniform throughout the map; variation in values are represented by variations in the size of mapped regions. Cartograms may be created use a variety of algorithms. One simple approach, due to Dorling (1995, CATMOG 59), is to associate the centroid of each mapped zone with the value of the variable to be mapped, and then to construct a circle around this point with area proportional to the magnitude of the variable. The size of the circles are typically scaled so that the largest is some fraction of a bounding rectangle encompassing all the original zones. Generally these circles will overlap, as illustrated for Zurich cantons in Figure 4‑48A. By systematically moving and rescaling the circles overlapping can be minimized whilst retaining approximate contiguity of the mapped regions (Figure 4‑48B).

Dorling (1995, p32) describes the algorithm, for which sample code is provided, as follows:

Each region ... is treated as an object in a gravity model which is repelled by other circles with which it overlaps, but is attracted to circles which were neighboring regions on the original map. Forces akin to forces of gravity are calculated, including velocity and acceleration, and the whole process is acted out as if it were occurring in treacle, to avoid any circles moving too quickly (as can occur when small objects orbit too close to large objects in the original gravity models).

This example illustrates a procedure that seeks to represent the variable ‘population’ by the areas of the circles, whilst retaining the approximate relative positions and contiguity of the communes whilst avoiding overlap. It results in a discontinuous map with no attempt to retain any of the original region shapes. This form of cartogram is supported in several GIS and mapping packages, including MAPresso (Java-based), MapViewer and GeoDa.

If the objective is to create a cartogram that retains full contiguity, then the original area-based map needs to be stretched, or ‘rubber-sheeted’, in some way. Many algorithms have been proposed for such cartograms, commencing with Tobler’s work on map transformations, Tobler (1963). More recently continuous transformation cartograms have been implemented using the rubber-sheeting algorithms of Dougenik et al. (1985) and the diffusion-based algorithm of Gastner and Newman (2004) − see Figure 4‑49B (Zurich cantons), Figure 4‑50 and Figure 4‑51B (which uses the North Carolina dataset) discussed in several sections of this Guide. Examples and discussion of visualizations that utilize this approach can be found on the World Mapper website ( and the Views of the Word website:

It is important to note that within-zone variation is not represented in this technique, hence representation of highly concentrated datasets (such as population) will provide a very particular view of the dataset. If zoned or gridded data are available within zones it may be useful to consider using such information when generating cartograms.


Figure 4‑48 Cartogram creation using basic Dorling algorithm

A. Initial stage

B. Final stage



Created using MAPresso (Adrian Herzog)

Figure 4‑49 Cartogram creation using Dougenik, Chrisman and Niemeyer algorithm

A. Zurich Canton, Switzerland. 171 communes, population mapped thematically

B. Cartogram creation using Dougenik, Chrisman and Niemeyer algorithm



Created using MAPresso (Adrian Herzog)

Figure 4‑50 World Population as a Cartogram


© Copyright 2006 SASI Group (University of Sheffield) and Mark Newman (University of Michigan).

The Gastner-Newman technique commences with the source map with its varying ‘population’ density and conceptually allows the individual members of this population to diffuse from higher density zones to lower density zones until such time as density is uniform. Zone boundaries are progressively moved during the diffusion process such that the net flow through them is zero, until they reach a stable position. It is important to note that here, as with many software implementations of geospatial algorithms, the final cartograms may differ depending upon the software package used to generate them and the parameters used or selected. For example, Dorling cartograms are produced very rapidly by GeoDa, with no option for parameter selection, and with some residual overlapping of the circular regions (Figure 4‑51C). If the MapViewer software is chosen with the same dataset the results are markedly different for almost all parameter selections (Figure 4‑51D). Of course, as another alternative, the shape of regions could be retained and their size altered to reflect the mapped variable if they are sufficiently well separated (i.e. a form of ‘explosion’ or multipolygon representation of the original mapped regions – see Figure 4‑51E).

A final cartogram procedure that warrants attention involves identifying a zoning of the original source data in which the population is approximately equal (e.g. census areas, electoral districts). These zones, which will generally be of varying sizes, are then replaced with equal-sized hexagons and re-arranged into a lattice that approximates the source map. This procedure provides a contiguous mapping of the source data, retaining some of the overall form of the conventional map, whilst eliminating most of the variations due to underlying population levels. The Atlas of Mortality in Britain by Shaw et al. (2008) uses this approach for all of its maps (see for example, Figure 4‑52) – these show the cause of death for roughly 15 million cases over a 24 year period.

Figure 4‑51 Cartograms of births data, 1974

A. Source data


B. Glastner-Newman diffusion algorithm (ArcGIS 9 Cartogram Geoprocessing tool)


C. GeoDa’s Dorling cartogram


D. MapViewer’s Dorling cartogram


E. MapViewer’s non-contiguous ‘explosion’ cartogram


Figure 4‑52 Hexagonal cartogram showing UK mortality data, age group 20-24


Note: in this diagram each hexagon is divided into two halves which may have different principal causes of death within a given district, for this age group