<< Click to Display Table of Contents >> Navigation: Building Blocks of Spatial Analysis > Spatial and Spatiotemporal Data Models and Methods 
Spatial data models and methods
Spatial datasets make it possible to build operational models of the real world based upon the field and object conceptions discussed in Section 2.1.6, Fields, and the use of coordinate geometry to represent the object classes described in Section 2.1.3, Objects. These include: discrete sets of point locations; ordered sets of points (open sets forming composite lines or polylines, and closed, noncrossing sets, forming simple polygons); and a variety of representations of continuously varying phenomena, sometimes described as surfaces or fields. The latter are frequently represented as a continuous grid of square cells, each containing a value indicating the (estimated) average height or strength of the field in that cell. In most of the literature and within software packages the points/ lines/ areas model is described as vector data, whilst the grid model is described as raster (or image) data.
Longley et al. (2015, Ch7.2) provide a summary of spatial data models used in GIS and example applications (Table 4‑1, below). The distinctions are not as clearcut as they may appear, however. For example, vector data may be converted (or transformed) into a raster representation, and vice versa. Transformation in most cases will result in a loss of information (e.g. resolution, topological structure) and thus such transformations may not be reversible. For example, suppose we have a raster map containing a number of distinct zones (groups of adjacent cells) representing soil type. To convert this map to vector form you will need to specify the target vector form you wish to end up with (polygon in this example) and then apply a conversion operation that will locate the boundaries of the zones and replace these with a complex jagged set of polygons following the outline of the grid form. These polygons may then be automatically or selectively smoothed to provide a simplified and more acceptable vector representation of the data. Reversing this process, by taking the smoothed vector map and generating a raster output, will generally result in a slightly different output file from the one we started with, for various reasons including: the degree of boundary detection and simplification undertaken during vectorization; the precise nature of the boundary detection and conversion algorithms applied both when vectorizing and rasterizing; and the way in which special cases are handled, e.g. edges of the map, “open zones”, isolated cells or cells with missing values.
Table 4‑1 Geographic data models
Data model 
Example application 
Computeraided design (CAD) 
Automated engineering design and drafting 
Graphical (nontopological) 
Simple mapping 
Image 
Image processing and simple grid analysis 
Raster/grid 
Spatial analysis and modeling, especially in environmental and natural resource applications 
Vector/Georelational topological 
Many operations on vector geometric features in cartography, socioeconomic and resource analysis, and modeling 
Network 
Network analysis in transportation, hydrology and utilities 
Triangulated irregular network (TIN) 
Surface/terrain visualization 
Object 
Many operations on all types of entities (raster/vector/TIN etc.) in all types of application 
Similar issues arise when vector or raster datasets are manipulated and/or combined in various ways (e.g. filtering, resampling). In the following sections we describe a large variety of such operations that are provided in many of the leading software packages. We concentrate on those operations which directly or indirectly form part of analysis and/or modeling procedures, rather than those relating to data collection, referencing and management. These processes include the various “methods” that form part of the OGC simple “feature specifications” (Table 4‑2) and test protocols, including the procedures for determining convex hulls, buffering, distances, setlike operators (e.g. spatial intersection, union etc.) and similar spatial operations. In each case it is important to be aware that data manipulation will almost always alter the data in both expected and unexpected ways, in many instances resulting in some loss of information. For this reason it is usual for new map layers and associated tables, and/or output files, to be created or augmented rather than source datasets modified. In many cases these procedures are closely related to the discipline known as Computational Geometry.
Table 4‑2 OGC OpenGIS Simple Features Specification — Principal Methods
Method 
Description 

Spatial relations 

Equals 
spatially equal to: a=b 
Disjoint 
spatially disjoint: equivalent to: 
Intersects 
spatially intersects: is equivalent to [not a disjoint(b)]: 
Touches 
spatially touches: equivalent to: does not apply if a and b are points 
Crosses 
spatially crosses: equivalent to: 
Within 
spatially within: within(b) is equivalent to: 
Contains 
spatially contains: [a contains(b)] is equivalent to [b within(a)] 
Overlaps 
spatially overlaps: equivalent to: 
Relate 
spatially relates, tested by checking for intersections between the interior, boundary and exterior of the two components 
Spatial analysis 

Distance 
the shortest distance between any two points in the two geometries as calculated in the spatial reference system of this geometry 
Buffer 
all points whose distance from this geometry is less than or equal to a specified distance value 
Convex Hull 
the convex hull of this geometry (see further, Section 4.2.13, Boundaries and zone membership) 
Intersection 
the point set intersection of the current geometry with another selected geometry 
Union 
the point set union of the current geometry with another selected geometry 
Difference 
the point set difference of the current geometry with another selected geometry 
Symmetric difference 
the point set symmetric difference of the current geometry with another selected geometry (logical XOR) 
Note: a and b are two geometries (one or more geometric objects or features — points, line objects, polygons, surfaces including their boundaries); I(x) is the interior of x; dim(x) is the dimension of x, or maximum dimension if x is the result of a relational operation 
Spatiotemporal data models and methods
The focus of most GIS software has historically been on spatial data rather than spatiotemporal data. Indeed, many GIS packages are relatively weak in this area. However, the huge quantities of spatialtemporal data now available demands a rethink by many vendors and an extension of their data models and analytical toolsets to embrace these new forms of data. One of the most important aspects of this development is a change in perspective, with the temporal domain becoming ever more important.
It is perhaps simplest to clarify these developments using examples of spatiotemporal data and considering how these may be represented and analyzed:
•complete spatial fields recorded at distinct points in time, viewed as a set of time slices (typically in fixed time intervals). This is sometime referred to as a Tmode data model and analysis of such data as Tmode analysis. The Tmode view of spatiotemporal field data is the most common within GIS software. If sufficient timeslices are available (from observation or generated computationally) they may be suitable for display as videos rather than simply as sets of static images. Analysis has tended to focus on the differences between the timesliced datasets
•complete spatial fields recorded at distinct points in time, viewed as a set of point locations or pixels, each of which has a temporal profile. This view of spatiotemporal data can be regarded as a form of spacetime cube, similar conceptually to multispectral datasets (see further, Classification and clustering) with analytical methods that concentrate on patterns detected in the set of profiles. This is known as Smode analysis, and is widely used in disciplines that study dynamic data over very large spatial extents, such as meteorological and oceanographic sciences
•incomplete spatial fields recorded at (regular) distinct points in space and time (often very frequently, e.g. each minute or every few seconds). Data of this type is typical of environmental monitoring using automated equipment — for example, weather stations, atmospheric pollution monitoring equipment, river flow meters, radiation monitoring devices and many similar datasets (including geolocated human activities). For such datasets analysis of the timeseries data are often as important as the process of estimating the complete spatial field. Lack of completeness of time series at sample points is a common problem — this may apply to a single attribute (variable being measured) or across multiple attributes
•mobile objects (points) tracked in spacetime (track data). Track data are typically a set of geospatial coordinates together with a timestamp for each coordinate. The temporal spacing may not be regular and the data elements may not be complete (e.g. when a moving object disappears from radio or satellite contact when in a tunnel or a forest). If the timestamping is designed to be regular then irregularities indicate that the spatial component of the track is inaccurate during that period — i.e. the track from location X to the next location observed Y, may be missing some intermediate points. Track data may include additional information that has attributes reflecting the underlying continuity in much data of this type, for example velocity, acceleration and direction
•networkbased data. The most common form of such data relates to traffic monitoring, but event data on networks or related to networks (e.g. crime events, accidents, transaction data, trip data, environmental monitoring data) is often either specific to the network representation of space or the network structure provides important insights for modeling purposes (e.g. noise and pollution diffusion)
•patterns of points (events) over time. This kind of data is exemplified by epidemiology, where evolving patterns of diseases (human, animal) or crime activity are monitored although in some instances event evolution at fixed locations forms the dataset.
•patterns of regions (zones) over time. This kind of data is common with censusbased information but applies to a wide variety of data collected on a regular basis by zone — socioeconomic and health district data are typical examples. An added complexity is that the zoning applied may itself have changed over time, requiring some form of common zoning or rasterization of all the data to be carried out prior to spatiotemporal analysis
There is no standard database model or analytical approach to these complex, large, often incomplete and highly varied datasets (see Tao et al., 2013, for a recent discussion of spatiotemporal data mining and analysis techniques). Specialized techniques have been developed for specific cases — for example: for land use change modeling (e.g. as provided in Idrisi's Land Change Modeler package); for pollution datasets the use of timeseries analysis (prediction using ARMA — autoregressive moving average models) followed by spatial interpolation or diffusion modeling (only really valid if the spatial and temporal components can be regarded as separable), or by the use of extended spatiotemporal versions of spatial and/or temporal modeling tools (e.g. STARMA — spatiotemporal autoregressive moving average models); for spatiotemporal event data the application of extensions to traditional or novel spatial statistical models (e.g. the extension of spatial scan statistical procedures to spatiotemporal point data (see further, the crime analysis example illustrated below, Figure 4; see Cheng and Adepeju, 2013, for full details).
Figure 4 Spacetime detection of emergent crime clusters
In the example illustrated the free software package SatScan was used to look for possible unusual clusters of crime events in both space and time in an area of North London. The height of cylinders in the 3D visualization provides a measure of their temporal extent and the radius a measure of their spatial extent (up to a maximum of 750 meters). Probability measure were determined by Monte Carlo simulation (built into the SatScan software). The idea behind this analysis was to use retrospective data on crimes as a means of identifying welldeveloped clusters of crimes, and then to rerun the process to see if watching the early development of possible cluster (effectively a form of surveillance) translates in to the locations and start dates of significant crime clusters. The authors found that they could, indeed, detect emergent clusters using this technique, which opens up the prospect of a new tool for identifying emerging spatiotemporal hotspots before they have become statistical significant.
More common than such quasistatistical procedures is the application of a range of microsimulation techniques (including Agent Based Modeling, ABM) and machine learning (notably the application of artificial neural networks, ANN, and support vector machines, SVN — see Kanevski et al., 2009, for an extensive discussion of such methods). In many cases the volume and complexity of such datasets makes traditional statistical analysis impossible, and forms of data mining, data filtering, advanced techniques in visualization and simulation modeling are the only practical approaches currently available to obtaining a fuller understanding of the datasets.