Have you ever been out on the town, when an overcoming feeling of hunger takes over and all there is to eat is a greasy, soggy slice of “pepperoni” pizza? You might be in a food desert, an area in which it is difficult to buy affordable or good-quality fresh food.
In the world of open data, one comes across similar instances - a data desert. Researchers, practitioners and a civil society organisations often can not find specific datasets or information needed to fulfil their needs. Granted, there are several “fast-food” data options (such as incomplete government data sets, or indexes gathered by private sector interests with unclear methodologies), and clever researchers can find ways around lack of access and information. However, these options are not ideal in quality for policy and/or are time and resource consuming.
Furthermore, and much like chowing down unhealthy fast food, blind spots, lack of representation, and inclusivity in data affects the wellbeing of individuals. In New York, for example, researchers at the Mayor’s Office of Data Analytics found diverging patterns in the characteristics of open data usage by New Yorkers. This led them to conclude that certain cross-sections of society were not using or benefiting from open datasets and, in turn, failing to benefit from the government’s Open Data efforts.
The New York Open Data portal holds over 1,500 datasets collected by the city on nearly all dimensions of urban life, everything from 8TH Grade New York State math test results to the geographical locations of public payphones. In spite of this commendable effort by the city to open up and share their data, researchers found data deserts that had very tangible impacts on New Yorkers.
In the borough of Brooklyn, for example, despite overall datasets on housing codes (ie: Housing Maintenance Code Violations, Rodent Inspection, and Housing Maintenance Code Complaints), there was substantially less use of datasets related to this subject in the neighborhoods of Mott Haven, Port Morris, and Melrose. Even compared to areas with similar demographic and socio-economic composition, the discrepancy in usage for these three neighbourhoods was substantial. Upon deeper inspection, it turns out Mott Haven, Port Morris, and Melrose had a higher concentration of public housing. While datasets on New York City housing were open, specific data points on public housing were never published or shared.
This oversight places those accessing public housing at a disadvantage compared to their peers. For example, those accessing public housing did not have the possibility of checking if a prospective residence had been infested with rodents or if a particular address was guilty of multiple housing code violations. This phenomenon is known as data poverty (defined by Yanurzha et al.)1: “The situation in which one is deprived of the benefits of data driven solutions by the lack of access, use, and representation within data”.
Furthermore, lack of representation of public housing in these datasets further entrenched existing power structures that further marginalize already marginalized groups. In New York, for example, those citizens requiring access to public housing come from Latino and African-American backgrounds, they tend to be single-parent households, the elderly, and those at risk of falling into homelessness.
In the Canadian context, one can ask similar questions for important pieces of policy both inside and outside government. Take, for example, the case of Prime Minister Justin Trudeau’s flagship National Housing Strategy. Could data used to plan, inform and/or articulate this policy be subject to issues of data poverty? Are we missing important or relevant information?
A solution to this problematic is encouraging and promoting discussions on the ethics of data. From collection, to use and distribution, policy makers make decisions on inclusion and representation in the datasets that inform policy. A great first step is to open datasets connected to relevant and important pieces of policy in real time. Stakeholder engagement with data experts in academia, civic-technology and civil society during planning of policies would also help identify instances of data poverty, and which datasets should be prioritized for sharing with the wider community. Such fist steps would take the onus of identifying data poverty away from a few government experts and help identify demand for data that would benefit, inform and create truly inclusive policy making.
As academics and business-leaders call for a Canadian National Data Strategy, conversations and safeguards on data poverty, inclusion and representation need to be addressed. Both for practical and ethical reasons, we need to consider the blind spots in the data we collect for policy making.
1 Yanurzha, et al. (2016), “Reducing Data Poverty in NYC: Achieving Open Data for All” NYC Office of Data Analytics. Available at: www1.nyc.gov/assets/analytics/downloads/pdf/cusp_open_data_poverty_capstone.pdf