Human-sourced data

<< Click to Display Table of Contents >>

Navigation:  Big Data and Geospatial Analysis > Types of Big Data >

Human-sourced data

Human-sourced information is integral to a wide range of services, some of which are explicitly for the dissemination of public information. For example, Internet enabled handheld devices have supported the growth of various Volunteered Geographic Information (VGI) services. A popular example is OpenStreetMap (OSM), which is a widely used crowd-sourced map of the world and geodata portal ( Like other information sharing services (such as Wikipedia), the service uses a series of social hierarchies of moderators in order to maintain the quality of its information and sieve out redundant records (Goodchild, 2013). An advantage of VGI over officially produced counterparts is transparency and the velocity of data production. It is easy for lay users to update data and report misinformation, a process that is otherwise cumbersome with governments and businesses that often lay out lengthy internal administrative procedures. However, there are, of course, data quality issues that may emerge due to potential weaknesses in vetting the large quantities of data that are produced daily.

One very prevalent Big Data source is social media. Social media has become an integral component of modern societies. It is estimated that just under one-third of the world’s population are currently social media users (Statista, 2017). There are several different types of social media including social network services, video sharing services, and information sharing services, most of which are accessed via online platforms. Consequently, social media has also become a popular source of data on the human condition for the academic community, largely due to the volume and velocity of the data and the rich social information they provide. Furthermore, many social network services such as Twitter and Flickr enable users to include coordinates from their mobile devices when uploading content, thereby enabling the study of geospatial phenomena.

As users engage with the online services through handheld devices, so they leave geographic and temporal footprints of their activity in the real-world within social media data (Blanford et al., 2015). It is, therefore, possible that large quantities of social media data can be highly informative of general human activity and, thus, they have many potentially useful applications in geospatial analysis. Taking Twitter data as an example, many studies have sought to use the geosocial data to predict real-world trends, thus treating the users as sensors (Haklay, 2013). This has included estimating the spread of influenza (Lamb et al., 2013), predicting customer catchments for retail centers (Lloyd and Cheshire, 2017), and tracking natural hazards (Guan and Chen, 2014). The data can also be repurposed as indicators of urban activity by using new techniques in text mining to categorize Tweets and identify routine spatial and temporal patterns. For instance, Figure 9-3 displays the relative density of Tweets about education in inner London: here, most areas of high concentration coincide with the locations of university campuses.

Although social media data are curated by companies, they are distinct from most other commercial datasets in that the bulk of the data are fashioned by members of the public. The data typically record digitized media and can take many forms. Social network data are unlike other forms of Big Data as there is little control or regulation over what is produced, thus it is inherently difficult to harvest objective research (see Tinati et al., 2014).


Figure 9-3 The relative density of Tweets about education across inner London (20km East-West). Red indicates locations with higher densities. Source: Lansley and Longley, 2016