Data are unlike other assets in that they are difficult to collect but easy to copy and disseminate. However, only a very small proportion of big datasets are released in the public domain due to their strategic importance, commercial value and potentially disclosive attributes. Breaches of privacy can have serious adverse consequences on the public opinion of businesses and governments, so there are considerable efforts to safeguard data. The recent events surrounding the misuse of Facebook profile data by an academic and Cambridge Analytica, only serve to highlight how fraught with problems issues of access and privacy can be. This inevitably has had a detrimental impact on Big Data research as commercial organizations that produce geospatial Big Data products often value efficiency and customer satisfaction over other interests. Companies are also less likely to share their methodologies and results thus hampering the advancement of research, methodologies and theoretical insights.
Where data access is granted, researchers and analysts are usually at the mercy of their data providers. It is not uncommon for elements of the data collection or sampling process to be kept private due to commercial sensitivities. Many companies that opt to share samples of their data disseminate them via Application Programming Interfaces (API). However, when using such channels the coverage of the data are usually uncertain (Boyd and Crawford, 2012). Taking the example of Twitter, little is known about how the free data feeds are sampled and feeds often drop in velocity for short intervals of time. Furthermore, data owners may set conditions on data access and use, and sometimes these limitations may be contrary to the interests of researchers.
For these reasons there needs to be greater support at all levels for data sharing and safe data practices in order to facilitate research and the maximize the full potential of Big Data. In 2008, the OECD launched a call in support of open data aimed at national governments. In response, some governments have launched open data initiatives to disseminate administrative data which may be of value to research and the general public (for instance data.gov in the USA, data.gov.uk in the UK and aurin.org.au in Australia). Example datasets that have been recently been made open source in the UK include Land Registry Price Paid data (the location and value of almost every domestic property sale) and Domestic Energy Performance Certificate (the energy performance certificates for over 15 million properties).
Commercial organizations are generally under less pressures to share their data. Therefore, with or without legislative intervention, institutions are required to provide safe environments through which commercial data can be accessed by researchers without violating data protection protocols. One such institution, is the UK Consumer Data Research Centre (CDRC, based at University College London) which specializes in large consumer datasets and it manages data from a wide range of sectors.