SINGAPORE: Singapore, like many other cities around the globe, has bet heavily on “big data” as a way to drive economic development and solve urban problems.
Data is widely seen as a kind of resource – the oil of the 21st century – that can be mined to “unlock value and innovation” as Prime Minister Lee Hsien Loong has put it. Data is now touted as a powerful resource that will solve problems in healthcare, transport, crime, education, and bring new economic opportunities.
But such metaphors can be misleading. Data, unlike oil, is not something that is just sitting in the ground waiting to be collected.
It is mostly generated by people – when we use our smartphones, ride the MRT, visit the doctor, surf the Internet or drive our cars.
Some of this data is captured by authorities including the Land Transport Authority, the Ministry of Health or other government bodies; others are sucked up by Singtel, Facebook, Google, or other private companies.
But in almost all cases, data refers back to people, to us. And this means we need to be very careful how we use and interpret it. Data may not lie, but – like the people it comes from – it can be messy, incomplete and misleading.
MISINTERPRETED, MISUSED DATA
In the United States, for example, crime data is increasingly used to direct police to particular neighbourhoods. Famed American mathematician Cathy O’Neil has recently written about how sophisticated software such as Predpol and Hunchlab analyse crime statistics.
In Santa Cruz, California, for example, Predpol examined eleven years of data of past crimes to generate predictions about where future crimes may occur. These “hotspots” appear as red squares on a map on a computer screen. More police can now be sent to these areas.
So far, so good. But having a greater police presence in an area also means more crimes may be detected, especially petty crimes. These are then fed back into the software, making an area look even more dangerous and attracting an even greater police presence.
The result is that some neighborhoods end up with many more arrests and many more people being sent to jail. This is exactly what has happened in cities across the US including Philadelphia, Chicago and New York.
That’s not fair, and it’s also not a smart use of data – it amplifies inequalities between groups. This could happen in Singapore too. One of the uses of big data here has been the use of locational information from our phones, collected by telcos in Singapore.
Patterns of foot traffic can be mined to find the most heavily walked areas within malls and shopping streets. This is used by retail brands to find the ideal location for their next outlet or by malls to set rental prices.
But this kind of reasoning may contribute to a vicious cycle in which depressed areas remain depressed, while rich areas get richer.
DATA CAN EXACERBATE INEQUALITY
Directing business to high-traffic neighbourhoods means that those areas will retain good jobs; this means less money and less spending in low-traffic neighbourhoods, potentially leading to even less economic activity.
The use of data may exacerbate the differences between the best-served and worst-served areas.
For now this remains a hypothetical in Singapore, but elsewhere in the world the results of big data approaches are already becoming clear.
For example, US activist Virginia Eubanks has recently exposed how data-driven systems – such as the Homeless Management Information System in Los Angeles and the predictive risk modelling Allegheny Family Screening Tool in Pennsylvania – are trapping vulnerable individuals in cycles of poverty and homelessness.
If we begin to allocate public services – such as healthcare, education, public transportation – according to data, we need to make sure we’re interpreting it correctly.
Recent news has highlighted the growing significance of big data in the healthcare sector.
One report described how Fullerton Healthcare had examined data from medical transactions at public hospitals in Singapore, searching for spikes in reported cases of chronic conditions such as diabetes in particular areas.
When they found an uptick, the company tried to help areas in need by delivering health education and awareness campaigns about diet and healthy eating. In some areas, the number of medical claims dropped substantially.
But this data only captures those individuals who attended hospitals or clinics; it does not measure the actual occurrences of chronic disease.
Those who are the most vulnerable may be left out of these statistics - maybe because they find it difficult to travel to a doctor, or are worried about the costs of medical care.
Certainly the reduced number of claims is saving Fullerton money, but is it really serving the public interest?
MAKING DATA ACCESSIBLE
One solution is to make data as open as possible – to let everyone see and use them. The Government’s data.gov.sg website aims to give Singaporeans access to large quantities of public data they can examine for themselves.
A quick glance at the portal reveals information about crime, housing, education, the economy, transport, health and many other areas. This is surely a good thing because we need as many people as possible to be aware of what data exists and participate in analysis.
But opening up data does not solve every problem. We also need to know and take into account where and how data is collected and who it is collected from.
If data is collected from web and smartphone users, for example, we can’t expect it to fairly represent the elderly population who are likely to use such technologies less. If data is collected from public transport, we can’t expect it to evenly reflect the whole socio-economic spectrum of Singapore, since wealthier individuals are more likely to drive, nor can we necessarily expect data to equally represent races, religions, or language groups.
Of course, all this doesn’t mean we should abandon big data. But data requires great care in its interpretation.
Data is not inherently smart and won’t automatically lead to the best solutions to social or economic problems.
We need to temper the enthusiasm for data with continued awareness of its limitations. This means keeping an eye on who data represents, who it includes, and who it leaves out. That's the really smart thing to do.
Hallam Stevens is a historian whose work focuses on understanding the social, political and economic impact of new and emerging technologies. He is an Associate Professor of History at Nanyang Technological University and Associate Director (Academic) of the NTU Institute of Science and Technology for Humanity.