Analysis: Analytical methods and tools

<< Click to Display Table of Contents >>

Navigation:  Methodological Context > Spatial analysis and the PPDAC model >

Analysis: Analytical methods and tools

One of the principal purposes of this Guide is to assist readers in selecting appropriate analytical methods, tools and models from those readily available in the GIS and related marketplace. Since analysis takes place well down the chain in what we have described as the analytical methodology, initial selection of methods and tools should have been made well before the analytical stage has been reached. Simplicity and parsimony — using the simplest and clearest tools, models and forms of visualization — and fit to the problem and objectives are the key criteria that should be applied. Other factors include: the availability of appropriate tools; time and cost constraints; the need to provide validity and robustness checks, which could be via internal and/or external checks on consistency, sensitivity and quality; conformance to relevant standards; the use of multiple techniques; and the use of independent and/or additional datasets or sampling.

There are a very large number of software tools available for performing spatial analysis: both for simple data summarizing and exploratory spatial data analysis (ESDA), which will often form the initial stage of analysis; and to assist the development and/or application of specific techniques or models. Frequently spatial model-building and spatial analysis are very closely coupled, with the output of one leading to revisions in the other. This is particularly true of micro-modeling techniques, such as geosimulation, where results often ‘emerge’ from the modeling exercise and reveal unexpected patterns and behaviors which in turn lead to revised ideas and hypotheses (see for example, Section 8.2.2 on Agents and agent-based models).

All products describing themselves as geographic information systems have a core set of analytical facilities, including many of those we describe in this Guide. We cannot hope to draw examples from even a representative sample of products, so we have made our own selection, which we hope will provide sufficient coverage to illustrate the range of analytical techniques implemented in readily available toolsets. In addition to mainstream GIS products we have included a number of specialist tools that have specific spatial analysis functionality, and which provide either some GIS functionality and/or GIS file format input and/or output. Throughout this Guide there are many examples of the application of tools and models to geospatial problems. In addition, in Section 3.4, we discuss the application of certain types of modeling, particularly high-level process-flow models, to geospatial problems.

A recurrent theme in spatial analysis is the notion of pattern. Frequently the objective of analysis is described as being the identification and description of spatial patterns, leading on to attempts to understand and model the processes that have given rise to the observed patterns. Unfortunately the word ‘pattern’ has a very wide range of meanings and interpretations. One way of defining whether a particular set of observations constitute a spatial pattern is by attempting to define the opposite, i.e. what arrangements of objects are not considered to constitute a pattern. The generally agreed notion of not a pattern is a set of objects or an arrangement that provides no information to the observer. From a practical point of view no information is often the situation when the arrangement is truly random, or is indistinguishable from a random arrangement. An alternative definition might be an even arrangement of objects, with deviations from this uniformity considered as patterns. Thus spatial pattern is a relative concept, where a model of not a pattern (e.g. Complete Spatial Randomness or CSR) is a pre-condition (see for example, Figure 2‑11, First- and second-order processes).

Observed spatial arrangements are frequently of indirect or mapped data rather than direct observations — the process of data capture and storage (e.g. as points and lines or remote sensed images) has already imposed a model upon the source dataset and to an extent pre-determined aspects of the observable arrangements. The method(s) of visualization and the full and/or sampled extent of the dataset may also impose pre-conditions on the interpretation and investigation of spatial patterns.

Identification of spatial pattern is thus closely associated with a number of assumptions or pre-conditions: (i) the definition of what constitutes not a pattern for the purposes of the investigation; (ii) the definition of the dataset being studied (events/observations) and the spatial (and temporal) extent or scale of the observations; and (iii) the way in which the observations are made, modeled and recorded.

Observed patterns may suggest a causal relationship with one or more principal processes, but they do not provide a secure means of inference about process. For example, consider the case of the distribution of insect larvae. Imagine that an insect lays its eggs at a location in a large region at least 200 meters from the egg sites of other insects of the same species, and then flies away or dies. Other insects of the same species do likewise, at approximately the same time, each laying a random number of eggs, most of which hatch to form larvae that slowly crawl away in random directions from the original site for up to 100 meters. At some point in time shortly thereafter an observer samples a number of sites and records the pattern of all larvae within a given radius, say 10 meters. This pattern for each sampled site is then individually mapped and examined. The observer might find that the mapped patterns appear entirely random or might have a gradient of larvae density across the sampled regions. Zooming out to a 100m radius (i.e. using a larger region for sampling) a different pattern might have been observed, with a distinct center and decreasing numbers of larvae scattered in the sampled region away from this point. However, if observations had been made over 1km squares in a 10kmx10km region it might only have been practical to identify the centers or egg sites. At this scale the pattern may appear regular since we implied that each egg site is not randomly distributed, but is influenced by the location of other sites. However, it could equally well be the case that egg sites are actually random, but only eggs on those sites that are laid on suitable regularly distributed vegetation survive and go on to produce live larvae. Zooming out again to 100kmx100km the observer might find that all egg sites are located in particular sub-regions of the study area, thus appearing strongly clustered, a pattern perhaps related to an attraction factor between insects or uneven large-scale vegetation cover or some other environmental variable. The mapped patterns at any given scale may not be sufficient to enable the analyst to determine whether a given set of observations are random, evenly spread, clustered or exhibit some specific characteristic such as radial spread, nor to infer with any reliability that some particular process is at work (as discussed in Section 2.1.9: Detail, resolution, and scale and Section 5.4, Point Sets and Distance Statistics).

On the other hand it is fairly straightforward to generate particular spatial patterns, in the manner described above, using simple (stochastic) process models (see further, Section 5.4, Point Sets and Distance Statistics). These patterns are specific realizations of the models used, but there is no guarantee that the same pattern could not have been generated by an entirely different process. A specific mapped dataset may be regarded as a specific outcome of a set of (known and unknown) processes, and as such is but one of many possible outcomes or process realizations. To this extent, and perhaps to this extent only, such datasets can be thought of as samples (see further, Sections 2.2.8, Spatial sampling and 2.3.4, Statistical inference).

Thus the ANALYSIS phase can be seen as a multi-part exercise. It commences with the review of data collected and the manipulation of the many inputs to produce consistent and usable data. It then extends into the pure analytical phase, where the data are subjected to study in order to identify patterns of various kinds then help to develop new ideas and hypotheses regarding form and process. And this in turn may lead on to the use of or development of one or more models within a formal build-fit-criticize cycle. Finally the output of the models and analysis is examined, and where necessary the dataset and data gathering is re-visited, working back up the PPDAC model chain, prior to moving on to producing the output from the project and delivering this in the Conclusion stage.