The preceding discussion of the PPDAC methodology has illustrated how the data used for decision-making in a wide range of applications areas often have a spatial component. The general context to the supply and provision of spatially referenced data has changed profoundly over the last 20 years and this has had far-reaching implications for the practice of applications-led science. Specifically:
•Most developed countries now have advanced spatial data infrastructures, comprising ‘framework’ data. Such data typically record the variable nature of the Earth’s surface, landscape characteristics, and a range of artificial structures ranging from roads to mail delivery points. This provides valuable context for decision-making. Even in parts of the world where this is not the case, high resolution remote sensing imagery or the data sources assembled into the ‘virtual Earths’ of Google and Microsoft can provide an acceptable framework from which environmental, population and infrastructure characteristics can be ascertained
•Advances in data dissemination technologies, such as geoportals, have encouraged governments to disseminate public sector data on social, health, economic, demographic, labor market and business activity to the widest possible audiences (e.g.www.gov.uk/government/statistics ; data.gov.uk/). Many geoportals include metadata (‘data about data’) on their holdings, which can be used to establish the provenance of a dataset for a given application. The advent of geoportals is having a catalytic role in stimulating public participation in GIS (PPGIS, Communityviz), and for example amongst genealogists when historic census data are put online. Of more central importance to professional users, geoportals empower very many more users to assemble datasets across domains and create added value information products for specific uses
•Advances in data capture and collection are leading to conventional data sources being supplemented, and in some cases partially replaced, with additional measures of physical, social and environmental information. Developments in the supply and assembly of volunteered geographic information are creating new spatial data infrastructures such as OpenStreetMap (www.openstreetmap.org), that help to contextualize spatial data and enhance geovisualization. Additional framework data sources are being provided by the likes of Microsoft and Google, underpinned by an advertising led business model. This is creating challenges for those national mapping agencies that are required to recover most of their costs through user charges and copyright enforcement. One scenario is that the monolithic structure of parts of the data industry may become obsolete. In the socio-economic realm, Longley and Harris (1999) have written about the ways in which routine customer transactions in retailing, for example, are captured and may be used to supplement census-based measures of store catchment characteristics. Although the provenance of such data sources is less certain than that of conventional censuses and other government surveys, there is nonetheless some movement towards adoption of rolling surveys and other sample-based methods to replace conventional censuses — see further, Chapter 9
•There is now a greater focus upon metadata. As suggested above, the proliferation of data sources from formal and informal sources is leading to greater concern with the quality and provenance of data, and the situations in which such data are fit for purpose. This is particularly important post the innovation of the Internet, since data collectors, custodians and users are often now much more dispersed in space and time, and Internet GIS enables much greater apparent ease in conflating and concatenating remotely located data
•Allied to the previous two points, there is a wider realization that the potential of data collected by public sector agencies at the local level or at fine levels of granularity, remains under-exploited. There exist very good prospects for public sector organizations to ‘pool’ data pertaining to their local operations, to the good of the populations that they serve. Innovations such as Google Earth are making it easy for organizations with no previous experience of GIS to add spatial context to their own data
•The practice of science itself is changing. Goodchild and Longley (1999) have written of the increasingly interdisciplinary setting to the creation of GIS applications, of the creation and maintenance of updates of GIS databases in real time, and of the assembly and disbanding of task-centered GIS teams — most obviously in emergency scenarios, but increasingly across the entire domain of GIS activity
•An increasingly important source of geospatial data is the rise of datasets that are collected by automated systems that incorporate geolocation information (e.g. mobile phones, navigation devices, retail systems, social media systems). The quality, volume and 'velocity' of such datasets present numerous problems — see further, Chapter 9
Each of these developments potentially contributes to the development of GIS applications that are much more data rich than hitherto. Yet such applications may be misguided if the analyst is unaware of the scientific basis to data collection, the assumptions implicitly invoked in conflation and concatenation of different datasets, or general issues of data quality. Additionally, the creation of GIS applications and the dissemination of results may be subject to a range of ethical considerations, and issues of accountability and data ownership. The remaining sections of this Guide give the flavor of these issues, and illustrate the ways in which GIS based representations are sensitive to the properties of software and the data that are used. In an ideal world, all data and software would be freely available, but in practice such a world would be unlikely to prioritize investments for future applications development. In the case of software, public domain sources are by no means all open source, but it is possible to form a view as to the quality of a software product by examining the nature of the results that it yields compared against those obtained using similar software offerings and known results with test datasets. Gauging the provenance of data sources is often an altogether trickier proposition, however. In seeking to validate one dataset with reference to a second, as in the comparison of two digitized street network products or the classification of two remotely sensed data resources, analysis of mismatches is more likely to suggest uncertainties than pinpoint precisely measurable inaccuracies. Discrepancies are only ultimately reconcilable through time-consuming field measurement, although models of error propagation and understanding of data lineage or classification method may optimize use of scarce validation resources.
While it is possible to validate some of the data used in representations of built or natural environments, validation of social measurements is an altogether more problematic task (Kempf-Leonard, 2005). Repeated measurement designs are usually prohibitively expensive, and some of the attitudes and behaviors that are under investigation may in any case be transient or ephemeral in nature. In general terms it is helpful to distinguish between direct versus indirect indicators of human populations. A direct indicator bears a clear and identifiable correspondence with a focus of interest: net (earned plus unearned) income, along with measures of wealth accumulated in assets provides a direct indicator of affluence. Direct indicators may be time-consuming or difficult to measure in practice (how affluent are you?), as well as being invasive of personal privacy. Thus many data sources rely upon indirect indicators, which are easier to assemble in practice, but which bear a less direct correspondence with the phenomenon of interest — housing tenure, car ownership and possession of selected consumer durables might be used in a composite indicator of affluence, for example.
Many of these issues are crystallized in the current focus upon ‘spatial literacy’, which has been defined by the National Research Council, part of the US National Academy of Sciences, as comprising the following integrated components:
•understanding spatial relationships
•developing knowledge about how geographic space is represented; and
•the ability to reason and make key decisions about spatial concepts
In this Guide we seek to demonstrate how the practice of GIS is fundamentally about finding out about the real (geographic) world. To quote the NRC (2006):
Spatial thinking is a cognitive skill that can be used in everyday life, the workplace, and science to structure problems, find answers, and express solutions using the properties of space