Process-Mediated data

<< Click to Display Table of Contents >>

Navigation:  Big Data and Geospatial Analysis > Types of Big Data >

Process-Mediated data

Process-mediated data describes the classes of data that are primarily generated through the transactions of businesses and governments. Such data is said to be “found” rather than “made”, as they were typically generated to support commercial or administrative functions rather than to represent in the real-world in research (Connelly et al., 2016). Through new technologies, businesses and institutions now capture data at a much higher velocities. Some example data sources are described below.

Government Administrative Data

Administrative data are routinely collected by public institutions in the form of registrations, transactions and records. These include data generated by welfare, taxation, licensing and electoral registration, as well as records maintained by healthcare and education institutions. Most administrative data sources aim to acquire complete coverage of specific populations; and in some cases, the data are collected from transactions which are legal requirements, making them more suitable for geodemographic representations. For instance, governments usually retain records on 100% of vehicles and legally employed workers for taxation purposes.

Administrative data systems have historically been integral to the development of spatial data. It was for administrative purposes that the UK postcode and US ZIP code systems were built, and through them, Census geographies were carefully designed to assist dissemination at a fine level without breaching guidelines on disclosure. Due to common interests, much of the data collected by governments correspond with social surveys historically collected by statistics authorities. Consequently, on both sides of the Atlantic, governments are becoming increasingly interested in using administrative data to bolster census statistics (see Public Administration Committee, 2014 for example). Administrative data has the advantage of being routinely captured and regularly updated, enabling large-scale longitudinal analysis of high quality.

In addition, the data can contribute new information that was not previously available in the public domain. For example, despite considerable interest in social class and wealth from social scientists and users of geodemographics classifications, there are still no granular data on income in the UK. Previous research has found value in harnessing government data to investigate life-chances and criminality (Britton et al., 2015). In addition, while previous geographic research on deprivation has included car ownership as a proxy for social standing due to its collection in censuses (for example the Townsend Index of Deprivation, Townsend et al., 1988), research on car registration data from the UK’s Diver and Vehicle Licensing Agency found car characteristics (such as age and model) to be strongly associated with socio-economics, especially in large cities. Figure 9-4 demonstrates that there is a distinctive geography to the registered locations of new cars.

Of course, the extent and quality of administrative data collection varies around the world. Some countries maintain detailed population registers in order to maintain better data on citizens. Such registers enable different government institutions to link their data in order to improve service delivery. Unfortunately, in most countries, administrative data are collected by multiple different organizations making it difficult to link records, although there have been efforts to overcome this using multivariate linkage approaches.

Figure4

Figure 9-4. The spatial distribution of the ratio of vehicles that are less than 3 years old by their registered addresses. Source: Lansley, 2016

Consumer Data

An increasing share of data held on people and their actions is generated by commercial organizations in order to assist their activities. These include data created by retailers, utility providers, transport providers and banking and financial services. Typically, they require transactions of some kind to generate data although they often also retain account information. For example, retailers record the times and the locations of transactions. Some large retailers and almost all online retailers also maintain customer accounts. Sometimes, specially devised loyalty programmes (such as supermarket loyalty cards) have been implemented specifically to understand trends in consumer behavior and to target promotions. This enables retailers to retain valuable longitudinal data often including the location of shopping trips and customer residences. Loyalty databases and website shopping accounts empower retailers to focus their data collection and promotions in order to target individual customers.

A popular anecdote about the predictive capabilities of retail data describes how Target (an American discount store) predicted a teenage customer in their loyalty database was pregnant (Duhigg, 2012). Target used their databases to personalize the coupons and advertisements they mail to their customers. In this case, a father complained to the store that his daughter had been sent promotions for baby products and maternity clothing. His concern was that the retailer was implementing marketing techniques that might encourage teenagers to have children. However, a few days later the father contacted Target again to apologize — he had discovered that his daughter had been pregnant for some time. It is not uncommon for private companies to use empiricist methodologies in order to automate tailored service provision at the individual level. In this case, the high dimensionality and granularity of the data made it feasible to make associations at the individual level.

Some large retailers may achieve a very large coverage of the population. For example, Tesco’s Clubcard in the UK had achieved a coverage of 15 million members in 2015 (over 30% of the adult population). However, engagement with the programme will vary considerably as many users may become disengaged with using loyalty schemes over time or may routinely avoid using cards when making small purchases. Yet, it is obvious, that this data will have merits over traditional data sources that will struggle to acquire detailed information on consumption for large shares of the population. Retailers are now able to grasp data on the consumer behavior of groups who are traditionally more averse to participating in market research programmes (Manyika et al., 2011). Retail data, therefore, can be of considerable value to geospatial population research. The core use of geodemographic classifications in industry is to predict lifestyles across space, particularly consumer behavior (Harris et al., 2005). Consumption habits may have broader connotations for understanding segregation in lifestyle choices. Indeed, equivalent data from traditional sources suffer from common limitations of costs, respondent errors, sample biases, and perhaps most importantly, small sample sizes.