Nesta’s Big and Open Data for the Common Good raced through (I think) 7 different projects, all of them detailed in the report, and including my work with @lagaia, @rowanelena and @mparsfield in Hounslow.
The projects underpinned a more general debate about two recurrent topics – ethics, and who should be responsible for building the open data infrastructure.
Ethics and data
Whenever you are using data about people the question “Have the people in question given informed consent?” arises. When the data is not directly about people, there is still a question “Is the end to which we are using this data ethical?”. This topic generated much debate on the Twitter hashtag – as can be seen in the storify.
Clearly, using people’s data without their consent is an invasion of their privacy as well as a disservice to society. In @lagaia’s example, if Citizen’s Advice Bureau opened up detailed data about what people ask them about payday loans (which, by the way, they have no intention of doing) that might be very useful to unscrupulous lenders.
As upstanding, morally conscious individuals the obvious answer is to be extremely conservative with the uses we put data to. This has a number of non-obvious drawbacks:
- Informed Consent is extremely difficult to parse, since most people have no idea of the conclusions that can be drawn from a given set of data using statistical approaches. So strict interpretation of informed consent will be extremely limiting. Much of the activity discussed at the event would at be at best in a grey zone, for example. The ‘bigger’ the data, the harder it is to claim ‘informed’ consent because the the information that can be derived becomes more surprising.
- There is a free-rider problem. If one person does not consent for their medical data to be shared for research purposes, but others do, is it fair for the person who does not consent to benefit from any research breakthroughs predicated on other people’s generosity with their own personal data?
- Traditionally, academia and the third sector have been very strict about ethics, while unsurprisingly the commercial sector has not. On a case-by-case analysis we might see the strictest ethical interpretation as morally preferable, but if the cumulative outcome is for the commercial sector to have vast lead in theoretic and behavioral understanding, to be decisively more adept at data processing, is that really for the greater good?
- Perhaps the most important point is the huge opportunity cost of not doing certain big and open data activities. Being over-cautious could have as bad an outcome for society as an incautious approach. Playing it safe is not cost free.
@Stianwestlake pointed out that rules to enforce ethics are unsuccessful, suggesting that disaster in the financial sector was the result of bad faith and could not have been averted by more rules. In some countries bankers now have to take a hippocratic oath. Perhaps something similar could be beneficial for those using data? Bankers and data scientists both work with social abstractions that make it easy to forget the human cost of bad decisions, and they both potentially face perverse incentives.
Data as infrastructure
We (nearly) all accept the governments role in enforcing contracts and standardising weights and measures. These activities are seen as precursors to all the public and private activity that makes our society work. Imagine trying to buy petrol if every station used it’s own system of measurement. Systems such as company registration, agreeing to use litres for fuel etc. become part of the furniture. We need rules about how information is recorded and transmitted to make the system work; a kind of systemic infrastructure.
Yet it seems clear the government does not have enough interest in enforcing similar rules for data formats and data sharing in the digital realm. For me this is the most fascinating part of the debate. @willperrin pointed out the huge potential for giving to local causes that is untapped in the UK simply because there is no mechanism to discover local charitable causes. @edtparkes talked about the important data the private sector has and which it easily could share. To me this issues is exactly the same requiring suppliers to list ingredients on packaging (and put me in mind of this amazing podcast, in part about Tesco’s and immigration patterns). @carljackmiller called for an ‘ebay’ style clearing house for collective social action.
How will these systems described above be built? Clearly the commercial sector is going to play a role, @edtparkes said “We’ll have no social impact if we don’t make a profit”, implying that anything that doesn’t make a profit won’t exist in the long term. On the other hand @trisml suggested the idea that for-profit companies could build all of this infrastructure was ‘magical thinking’ – noting that historically infrastructure has always been pioneered by the state. Finally, @duncan3ross, perhaps partially in answer to these questions, pointed out that when local authorities award contracts they should require that some part of the budget be allocated to open data concerns.
It’s hard to reinforce enough the idea – beautifully articulated by Keller Easterling here – that this systemic, digital infrastructure is as important to the public good as the network of roads or the hidden plumbing that we take to be the signifiers of civilisation.