The simplest form of EDA involves the computation of basic statistics, and in the context of spatial data, statistical summaries of attribute tables and grid values. Useful online references on EDA include the NIST e-Handbook (EDA section), CATMOG 49 (an early monograph that deals with ESDA), and especially for ESDA, the GeoDa workbook and GeoViz toolkit documentation. Graphical analysis of such data tend to be histograms, pie charts, box plots and/or scatter plots. None of these provides an explicitly spatial perspective on the data. However, where such facilities are dynamically linked to mapped and tabular views of the data they can provide a powerful toolset for ESDA purposes. The selection of objects through such linking may be programmatically defined (e.g. all values lying more than 2 standard deviations from the mean) or user defined, often by graphical selection. The latter is known as brushing, and generally involves selection of a number of objects (e.g. points) from a graphical or mapped representation (Figure 5‑6, Columbus, Ohio). Selected features are automatically highlighted (linked) in the other views of the dataset (histogram, map and attribute table in the example illustrated).
Figure 5‑6 Brushing and linking, GeoDa
Facilities of this type are implemented in a number of GIS packages, notably in ArcGIS V9 (with a range of tools for different data types, but limited to a fixed number of points for selected ESDA tools such as semivariance analysis) and in the stand-alone package GeoDa. The latter has been built using ArcGIS objects and reads and writes ArcGIS shape files. It limits its attention to lattice data, by which is meant discrete spatial units (zones/areas) rather than point sets or point samples from a continuous surface.
A wide range of ESDA tools, which include brushing and linking as core functionality, have been implemented in GeoVista Studio and the somewhat more accessible Java version, the GeoViz toolkit (which also reads SHP files). These focus on the exploration of multivariate datasets, and include the visualization tool known as the parallel coordinate plot, or PCP. In the example illustrated in Figure 5‑7 five variables are included: house values, income levels, crimes recorded (residential burglaries and vehicle thefts), open space and the percentage of housing with deficient plumbing. Each variable is shown with a [min, max] vertical scale and a linking line that corresponds to the case (census tract). Lines are colored according to a user-chosen single variable and classification rule. By selecting a single line, such as that shown, the variable values are displayed and all the other visualization windows, such as various forms of maps and graphs, show the selected object highlighted.
Figure 5‑7 Parallel coordinate plot
Other visualization facilities of this type include star plots and star plot maps, which show multivariate data as a star of values (Figure 5‑8).
Figure 5‑8 Star plot
In this type of chart each variable is plotted in a separate direction, with the length of the star arm being proportional to the variable magnitude. Each mapped region will have an individual star plot, which can be mapped as an overlay onto a classified base map to provide a start plot map.
Extensions to a number of these techniques to the spatio-temporal domain (ESTDA) have recently been made available in a number of software packages. These include: the STARS open source project; BioMedware’s (commercial) Space-time Intelligence System (STIS/SPACESTAT); and the National Cancer Institute’s (NCI) SaTScan software, available free of charge, from www.satscan.org/. The NCI’s GIS website (gis.cancer.gov/) is an excellent source of information and guidance as regards the mapping and analysis of epidemiological datasets. In conjunction with the NCI the GeoVista team at Penn State University have recently developed ESTAT, their Exploratory Spatio-Temporal Analysis Toolkit, a Java-based implementation of several of the ESDA tools provided within GeoVista augmented by linked time-series plots. The latter are similar in format to the PCP display, but with the horizontal axis providing fixed time periods (e.g. years) and the data plotted being a single variable, e.g. the morbidity rate for a particular form of cancer. Individual lines again correspond to the spatial entities (e.g. counties, states).
A recent publication, with a specific focus on spatial and spatio-temporal EDA is Andrienko and Andrienko’s “Exploratory Analysis of Spatial and Temporal Data”. They view the purpose of ESDTA as providing a data focus in which:
•peculiarities of the data can be revealed and an appreciation obtained as to how the data should be further processed (e.g. filtered, transformed, split, combined, …)
•hypotheses can be generated for further testing (e.g. using statistical methods), and
•proper methods can be selected for in-depth analysis of the data
They make the case that space and time must be seen as complementary views of the same data. This leads on to need for systematic analysis of: (i) the evolution of spatial patterns in time; and (ii) the distribution of temporal behaviors in space. Because spatio-temporal datasets are complex and often incomplete and inconsistent they recommend that such data are divided up and explored by slices and subsets (species, age groups, countries, years etc.), and that care is taken to examine outliers and unexpected patterns. This approach is especially important for data obtained from non-governmental sources, e.g. much of the data obtained via the Internet. This time-slice view of spatio-temporal data is being superseded by other approaches — see the section Spatial and Spatio-temporal Data Models and Methods for more details.