﻿ Building Blocks of Spatial Analysis > Spatial and Spatio-temporal Data Models and Methods

# Spatial and Spatio-temporal Data Models and Methods

Navigation:  Building Blocks of Spatial Analysis >

# Spatial and Spatio-temporal Data Models and Methods

Spatial data models and methods

Spatial datasets make it possible to build operational models of the real world based upon the field and object conceptions discussed in Section 2.1.6, Fields, and the use of coordinate geometry to represent the object classes described in Section 2.1.3, Objects. These include: discrete sets of point locations; ordered sets of points (open sets forming composite lines or polylines, and closed, non-crossing sets, forming simple polygons); and a variety of representations of continuously varying phenomena, sometimes described as surfaces or fields. The latter are frequently represented as a continuous grid of square cells, each containing a value indicating the (estimated) average height or strength of the field in that cell. In most of the literature and within software packages the points/ lines/ areas model is described as vector data, whilst the grid model is described as raster (or image) data.

Longley et al. (2015, Ch7.2) provide a summary of spatial data models used in GIS and example applications (Table 4‑1, below). The distinctions are not as clear-cut as they may appear, however. For example, vector data may be converted (or transformed) into a raster representation, and vice versa. Transformation in most cases will result in a loss of information (e.g. resolution, topological structure) and thus such transformations may not be reversible. For example, suppose we have a raster map containing a number of distinct zones (groups of adjacent cells) representing soil type. To convert this map to vector form you will need to specify the target vector form you wish to end up with (polygon in this example) and then apply a conversion operation that will locate the boundaries of the zones and replace these with a complex jagged set of polygons following the outline of the grid form. These polygons may then be automatically or selectively smoothed to provide a simplified and more acceptable vector representation of the data. Reversing this process, by taking the smoothed vector map and generating a raster output, will generally result in a slightly different output file from the one we started with, for various reasons including: the degree of boundary detection and simplification undertaken during vectorization; the precise nature of the boundary detection and conversion algorithms applied both when vectorizing and rasterizing; and the way in which special cases are handled, e.g. edges of the map, “open zones”, isolated cells or cells with missing values.

Table 4‑1 Geographic data models

 Data model Example application Computer-aided design (CAD) Automated engineering design and drafting Graphical (non-topological) Simple mapping Image Image processing and simple grid analysis Raster/grid Spatial analysis and modeling, especially in environmental and natural resource applications Vector/Geo-relational topological Many operations on vector geometric features in cartography, socio-economic and resource analysis, and modeling Network Network analysis in transportation, hydrology and utilities Triangulated irregular network (TIN) Surface/terrain visualization Object Many operations on all types of entities (raster/vector/TIN etc.) in all types of application

Similar issues arise when vector or raster datasets are manipulated and/or combined in various ways (e.g. filtering, resampling). In the following sections we describe a large variety of such operations that are provided in many of the leading software packages. We concentrate on those operations which directly or indirectly form part of analysis and/or modeling procedures, rather than those relating to data collection, referencing and management. These processes include the various “methods” that form part of the OGC simple “feature specifications” (Table 4‑2) and test protocols, including the procedures for determining convex hulls, buffering, distances, set-like operators (e.g. spatial intersection, union etc.) and similar spatial operations. In each case it is important to be aware that data manipulation will almost always alter the data in both expected and unexpected ways, in many instances resulting in some loss of information. For this reason it is usual for new map layers and associated tables, and/or output files, to be created or augmented rather than source datasets modified. In many cases these procedures are closely related to the discipline known as Computational Geometry.

Table 4‑2 OGC OpenGIS Simple Features Specification — Principal Methods

Method

Description

Spatial relations

Equals

spatially equal to: a=b

Disjoint

spatially disjoint: equivalent to:

Intersects

spatially intersects:  is equivalent to [not a disjoint(b)]:

Touches

spatially touches: equivalent to: does not apply if a and b are points

Crosses

spatially crosses: equivalent to:

Within

spatially within: within(b) is equivalent to:

Contains

spatially contains: [a contains(b)] is equivalent to [b within(a)]

Overlaps

spatially overlaps: equivalent to:

Relate

spatially relates, tested by checking for intersections between the interior, boundary and exterior of the two components

Spatial analysis

Distance

the shortest distance between any two points in the two geometries as calculated in the spatial reference system of this geometry

Buffer

all points whose distance from this geometry is less than or equal to a specified distance value

Convex Hull

the convex hull of this geometry (see further, Section 4.2.13, Boundaries and zone membership)

Intersection

the point set intersection of the current geometry with another selected geometry

Union

the point set union of the current geometry with another selected geometry

Difference

the point set difference of the current geometry with another selected geometry

Symmetric difference

the point set symmetric difference of the current geometry with another selected geometry (logical XOR)

Note: a and b are two geometries (one or more geometric objects or features — points, line objects, polygons, surfaces including their boundaries); I(x) is the interior of x; dim(x) is the dimension of x, or maximum dimension if x is the result of a relational operation

Spatio-temporal data models and methods

The focus of most GIS software has historically been on spatial data rather than spatio-temporal data. Indeed, many GIS packages are relatively weak in this area. However, the huge quantities of spatial-temporal data now available demands a re-think by many vendors and an extension of their data models and analytical toolsets to embrace these new forms of data. One of the most important aspects of this development is a change in perspective, with the temporal domain becoming ever more important.

It is perhaps simplest to clarify these developments using examples of spatio-temporal data and considering how these may be represented and analyzed:

complete spatial fields recorded at distinct points in time, viewed as a set of time slices (typically in fixed time intervals). This is sometime referred to as a T-mode data model and analysis of such data as T-mode analysis. The T-mode view of spatio-temporal field data is the most common within GIS software. If sufficient timeslices are available (from observation or generated computationally) they may be suitable for display as videos rather than simply as sets of static images. Analysis has tended to focus on the differences between the time-sliced datasets

complete spatial fields recorded at distinct points in time, viewed as a set of point locations or pixels, each of which has a temporal profile. This view of spatio-temporal data can be regarded as a form of space-time cube, similar conceptually to multi-spectral datasets (see further, Classification and clustering) with analytical methods that concentrate on patterns detected in the set of profiles. This is known as S-mode analysis, and is widely used in disciplines that study dynamic data over very large spatial extents, such as meteorological and oceanographic sciences

incomplete spatial fields recorded at (regular) distinct points in space and time (often very frequently, e.g. each minute or every few seconds). Data of this type is typical of environmental monitoring using automated equipment — for example, weather stations, atmospheric pollution monitoring equipment, river flow meters, radiation monitoring devices and many similar datasets (including geolocated human activities). For such datasets analysis of the time-series data are often as important as the process of estimating the complete spatial field. Lack of completeness of time series at sample points is a common problem — this may apply to a single attribute (variable being measured) or across multiple attributes

mobile objects (points) tracked in space-time (track data). Track data are typically a set of geospatial coordinates together with a time-stamp for each coordinate. The temporal spacing may not be regular and the data elements may not be complete (e.g. when a moving object disappears from radio or satellite contact when in a tunnel or a forest). If the time-stamping is designed to be regular then irregularities indicate that the spatial component of the track is inaccurate during that period — i.e. the track from location X to the next location observed Y, may be missing some intermediate points. Track data may include additional information that has attributes reflecting the underlying continuity in much data of this type, for example velocity, acceleration and direction

network-based data. The most common form of such data relates to traffic monitoring, but event data on networks or related to networks (e.g. crime events, accidents, transaction data, trip data, environmental monitoring data) is often either specific to the network representation of space or the network structure provides important insights for modeling purposes (e.g. noise and pollution diffusion)

patterns of points (events) over time. This kind of data is exemplified by epidemiology, where evolving patterns of diseases (human, animal) or crime activity are monitored although in some instances event evolution at fixed locations forms the dataset.

patterns of regions (zones) over time. This kind of data is common with census-based information but applies to a wide variety of data collected on a regular basis by zone — socio-economic and health district data are typical examples. An added complexity is that the zoning applied may itself have changed over time, requiring some form of common zoning or rasterization of all the data to be carried out prior to spatio-temporal analysis

There is no standard database model or analytical approach to these complex, large, often incomplete and highly varied datasets (see Tao et al., 2013, for a recent discussion of spatio-temporal data mining and analysis techniques). Specialized techniques have been developed for specific cases — for example: for land use change modeling (e.g. as provided in Idrisi's Land Change Modeler package); for pollution datasets the use of time-series analysis (prediction using ARMA — auto-regressive moving average models) followed by spatial interpolation or diffusion modeling (only really valid if the spatial and temporal components can be regarded as separable), or by the use of extended spatio-temporal versions of spatial and/or temporal modeling tools (e.g. STARMA — spatio-temporal auto-regressive moving average models); for spatio-temporal event data the application of extensions to traditional or novel spatial statistical models (e.g. the extension of spatial scan statistical procedures to spatio-temporal point data (see further, the crime analysis example illustrated below, Figure 4; see Cheng and Adepeju, 2013, for full details).

Figure 4 Space-time detection of emergent crime clusters

In the example illustrated the free software package SatScan was used to look for possible unusual clusters of crime events in both space and time in an area of North London. The height of cylinders in the 3D visualization provides a measure of their temporal extent and the radius a measure of their spatial extent (up to a maximum of 750 meters). Probability measure were determined by Monte Carlo simulation (built into the SatScan software). The idea behind this analysis was to use retrospective data on crimes as a means of identifying well-developed clusters of crimes, and then to re-run the process to see if watching the early development of possible cluster (effectively a form of surveillance) translates in to the locations and start dates of significant crime clusters. The authors found that they could, indeed, detect emergent clusters using this technique, which opens up the prospect of a new tool for identifying emerging spatio-temporal hotspots before they have become statistical significant.

More common than such quasi-statistical procedures is the application of a range of microsimulation techniques (including Agent Based Modeling, ABM) and machine learning (notably the application of artificial neural networks, ANN, and support vector machines, SVN — see Kanevski et al., 2009, for an extensive discussion of such methods). In many cases the volume and complexity of such datasets makes traditional statistical analysis impossible, and forms of data mining, data filtering, advanced techniques in visualization and simulation modeling are the only practical approaches currently available to obtaining a fuller understanding of the datasets.