Accessing Big (Commercial) Data across a Global Research Infrastructure - Modelling Consumer Behaviour in China
Lloyd, Ashley D.
Antonioletti, Mario A.
Sloan, Terence M.
The use of globally distributed computing systems and globally distributed data to understand and manage global organisations is a well-established vision. It can be found in patents awarded for electrical communications systems that are integrated with electro-mechanical computing devices as far back as 1927. Use of electrical communications to reproduce images goes back even further to the first fax patent awarded to Scottish inventor Alexander Bain in 1843, preceding Alexander Graham Bell's patent for the telephone by over 30 years. Like many other company assets, data has value, however it has two additional characteristics that establish tensions with a globally distributed vision: (i) its value cannot be assessed until after it has been analysed, and (ii) that analysis may prove to be of more value to a competitor than the company itself. This type of concern is not typical of the global scientific collaborations that have driven the development of global network infrastructure, a distinction Jim Gray of Microsoft highlighted by describing data exchanged in radio-astronomy collaborations as “completely worthless”, by which he meant that it had all the dimensionality and scale of the most complex problems in business or medicine, but none of the sensitivities that impede how and with whom you share that data, or what analyses you attempt. Since the Economic and Social Research Council defines social science as “the study of society and the manner in which people behave and influence the world around us” it is clear that the sensitivities of exposing commercial data on behaviour in global markets to globally distributed computational environments presents a major challenge for (Social) Data Scientists. This paper describes some of the challenges of building the first Global Computing Grid to connect collaborating sites in three continents and installing an embedded analytical facility within a Chinese commercial organisation that has enabled collaborative analysis of millions of consumers. We report how this access has provided new insights into consumer behaviour within China ranging from testing strategic models of economic development to exploring ‘digital exclusion’ and the impact of migration on technology adoption.