In this long read, Maham Saleem explains why getting the balance between privacy and public good when regulating digital technology is trickier than it may at first appear.
In 2018, the Guardian published the Cambridge Analytica files, uncovering a huge breach of Facebook users’ data and causing enough commotion for Netflix to produce an Emmy-nominated documentary and for one of the whistleblowers, Christopher Wylie, to publish a bestselling book. This happened in the same year that the EU’s long awaited General Data Protection Act (GDPR) came into play, and everyone’s inboxes started filling up with companies begging for consent to continue spamming us with marketing emails. Today, five years on, we are still trying to navigate the rocky territory of data protection and privacy ethics. As generative AI becomes mainstream and algorithmic targeting becomes more aggressive, the conversation around data and privacy is still developing. The data bill is back in Parliament, online safety and digital services are still a hot topic of conversation (think TikTok bans), and papers on AI regulation are springing up everywhere - the UK had the AI white paper, the US had the AI Bill of Rights, and the EU (in true EU style) has an entire Act in the works. And yet, if despite streams of regulation aimed and improving people’s online safety and experience, we still have to choose which cookies track us across the internet everytime we open a new website, we might want to reconsider how we use data.
Since 2018, data minimisation has effectively become the norm. Companies fear any privacy infringement debacles or hefty fines, and rightly so. If we want to live in an open, liberal democratic society, individual privacy is an integral value that can protect us from unwanted surveillance by authoritarian governments, large businesses, and other powerful entities. But in a bid to protect individual choice and identity, we mustn’t ignore the many social and public benefits that can be derived from collecting and processing more data, including increasing transparency and accountability in governments, accelerating innovation and development of new technologies, improving the quality of public research, and making the user’s experience with technology more efficient and convenient. Alongside the cynicism around data collection and processing, there has also been a growing movement towards ‘open data’, pioneered by the Open Data Institute (ODI) and Tim Berners-Lee. The ODI focus their attention on how data can be used for public good, and you can read about some of their work over the last decade here.
Data in algorithmic and intelligent systems
Improving data quality sounds like a technical low-stakes issue, but with rapid integration of AI systems into virtually every field, the consequences of inadequate and fragmentary data collection is far-reaching. This has been particularly true for representation of different demographics within datasets. A good example is Labeled Faces in the Wild (LFW), a database of pictures of faces assembled in 2007 which has been widely used to train facial recognition software. With over 13000 images it was a big step forward in training algorithms to recognise different faces and movements, establishing the foundations for Snapchat face filters as well as facial recognition for surveillance CCTV cameras. Seven years later in 2014, some researchers decided to study the LFW dataset itself and found shocking disparities in the types of faces included in the dataset. The faces in LFW were 77% male and 83% white, with 530 individual images of George Bush - more than double the number of pictures of ethnic minority women. The result is such that the systems trained on the LFW set would have been much worse at classifying and recognising ethnic minority women than white women, and women than men. Professor Moritz Hardt, now director at the Max Planck Institute for Intelligent Systems was pointing these problems out as early as 2013. In the blog he wrote, ‘The whole spiel about big data is that we can build better classifiers largely as a result of having more data… the contrapositive is that less data leads to worse predictions. Unfortunately, it’s true by definition that there is always proportionately less data available about minorities. This means that our models about minorities generally tend to be worse than those about the general population.’
The same problem existed for people over the age of 80, young children, babies, and people in bad lighting because the LFW database just didn’t have a diverse enough range of images.
And it wasn’t just this particular dataset that had faced problems because of its inadequate input. An early IBM classification system (which classifies age and gender amongst other things) had a 0.3% error rate for white men and a 34.7% error rate for dark-skinned women. Given the use of machine learning and other algorithmic models in healthcare, recruitment, resource allocation, and criminal justice, it’s clear that incomplete representation in data collection has consequences far worse than not being recognised by a Snapchat filter. And the consequences of misuse or misapplication of AI or other algorithmic models will rarely be faced by everyone, instead being felt primarily by vulnerable or historically disadvantaged groups. A recent research paper by Arthur Holland Michel for Chatham house highlighted how ‘yet-to-be-proven AI technologies are more likely to be used against populations with less policy leverage to advocate for protections… for example, early experiments involving AI for welfare fraud detection, predictive policing, and surveillance in public housing.’ Ultimately, this is a reflection of bias in society - but a bias of which the consequences have been exacerbated by historically inaccurate data representation and not one that can be fixed by tightening data collection practices to the point of minimisation under the guise that honouring the integrity of individual privacy will create a fairer and safer society.
Open Banking and innovation
One of the most important initiatives borne out of the open data movement is Open Banking. It involves the use of APIs that allow banks to securely share information with third party developers who can then build applications and services around that financial institution. Before the existence of open banking, we would not have been able to use budgeting apps like Plum or Money Dashboard securely or even some comparison softwares to get the best current account or lending deals. Pre-Open Banking, you’d have to share your bank account details and PIN in order to be able to use these apps. As such, banks would refuse to guarantee financial security as you’d have to willingly share passwords with these apps so they can access your accounts - information your bank would have asked you to keep private. Most people have money in multiple accounts with different banks, but as a result of a lack of secure information sharing mechanisms, budgeting apps or personalised current account comparators were unable to take off. Open banking changes that - with the consumer’s permission, banks can now securely share individual consumer financial accounts with third party apps or softwares. This would enable a price comparison service for example, to recommend the best bank accounts, credit cards, or mortgages based on your financial situation, easing the process of switching bank accounts or cards, in a way similar to switching energy suppliers, allowing consumers to save an average of £140 a year by changing overdraft facilities based on their financial situation. A flurry of startups have launched as a result of the Open Banking initiative, supporting innovation in the UK’s fintech industry and providing new services that make consumer’s lives easier.
UK Data Protection and Digital Information Bill
On 17th April the data reform bill, after much delay, finally reached its second reading in the Commons. The bill, which is meant to reform the UK’s version of GDPR, aims to restructure the data protection watchdog, the ICO, as well as reform administrative requirements for small businesses. To Michelle Donelan’s credit, the bill also aims to broaden the breadth of research that can be carried out and address the issue of ‘cookie fatigue’ - repeatedly having to close cookie pop-ups. It remains to be seen whether this reform will ultimately be a genuine improvement from GDPR or a crass pursuit of deregulation in a bid to seem industry-friendly, with Lucy Powell warning it could ‘add another complicated and uncertain layer of bureaucracy.’
Ultimately, there is a lot to gain reframing the conversation around privacy and data protection. Privacy concerns are of course extremely important to prevent mass surveillance of people or the public disclosure of personal information. But my argument is not that privacy is incompatible with modern problems or that we should deregulate the data protection sphere. Instead, I take issue with the big data-fearing attitude that has taken hold of us after the Facebook/Cambridge Analytica scandal. The many benefits of data openness are clear and to not harness those benefits because of cynicism about how our data might be used is at best a waste, and at worst can damage our ability to attain a more progressive society. It is also important to mention that whilst ‘privacy’ and ‘data protection’ are often used interchangeably, they are not the same thing. A surveillance company that films you without your consent in a private place, but deletes your data from its software right away might be protecting your data, but not your privacy. It is a small distinction, but one that would prevent companies from mistaking data deletion for a solid privacy framework.
I’ll end with a tricky example that lies at the crossroad between privacy and public good: banks can now tell if someone is developing a gambling addiction or if someone with bipolar disorder is about to embark on a spending spree, purely from financial transaction patterns. Now, if a bank detects that you’re developing a gambling addiction and send you a text that links to a gambling support page, or alternately if they detect that you‘re going to go on spending spree as a result of bipolar disorder and ask if you’d like to set upper limits on your transactions for the next week, would you consider that a nanny-state invasion of your privacy or a useful measure that could help your financial stability in the long run? There is no correct or obvious answer, but it might make you wonder whether prioritising privacy over other principles is always fruitful for the public good.
From June, the Young Fabians Tech, Defence, and Cyber network will be running a series on AI, big data, and big tech. More details coming soon @YF_TDaC.
Image by Towfiqu Barbhuiya on Unsplash.