The Difference Between What and Why: Machine Learning, Science, and Policy

Gautam Kambhampati discusses our increasing dependence on machine learning models in policymaking.

Delegating decision-making to machine learning models is all the rage today. From deciding where to deploy policing resources to sifting through CVs, the technology appears all over everyday life [1].

However, faith in these methods is misguided and our increasing dependence on them in policymaking should trouble us. This is not because machine learning models are opaque (though they can be) or because they reproduce systemic injustices hidden in the training data (though they do). Both of these issues can be mitigated through novel techniques, more attentive modellers, and end-users critically evaluating outputs [2]. My criticism is more fundamental: machine learning models only answer what questions, whilst progressive policy development requires both asking and answering why questions.

Consider the case of so-called ‘proactive’ policing. This makes use of machine learning models to predict the crime rate in certain areas, so that limited policing resources might be deployed more effectively [3]. This answers a what question: what is the crime rate likely to be in Bishop Burton? It does not answer the why question: why is the crime rate like that in the first place? The latter is the question that needs to be answered in order to develop a progessive policy that tackles the root causes of crime.

In fact, machine learning models are structurally incapable of answering a why question.

We can contrast them with scientific models, which are constructed specifically for the purpose of answering why questions. At their core, scientific models are just sets of laws, drawn from experimental evidence, and typically directly observed. These laws can be taken together with a set of circumstances to provide an explanation [4]: the answer to a why question. For example, a law of natural selection is that genes can be inherited and mutate randomly. The circumstances may be that only a certain kind of food is available in a particular area. This will provide an explanation for why animals in that area have evolved a given mouth and digestive system.

On the other hand, a machine learning model, like any statistical model, is constructed from some set of data (the training data) and a fixed mathematical function. Certain parameters in the function are varied until the output of the function matches the data; a process called fitting. The values of these parameters – in a neural network, for example, the weights and biases – take the place of the laws in a scientific model.

The issue is that these parameters cannot be verified independently of the training data, or of each other. In natural selection, the statements ‘genes can be inherited’ and ‘genes mutate randomly’ can be tested separately and without reference to the digestive tracts of animals. However, the parameters in a machine learning model are all fundamentally linked to each other and to the specific question being asked.

This lack of explanatory power bleeds into other areas where machine learning is used, too. For example, if you were rejected from a car insurance application by a machine learning model you may (whilst GDPR applies) appeal this. The true explanation for why the model rejected you would be something like ‘because parameter A was 3, parameter B was 7, and parameter C was 98’. Does this suffice for an explanation? Ideally, we would like to know what the parameters A, B, and C correspond to in the real world, so that we can check whether 3, 7, and 98 are correct. However, a machine learning model is not a scientific model, and parameters are not laws: they don't correspond to anything individually.

This is not to say that machine learning is not an incredibly useful tool: it is, both for answering what questions and for highlighting correlations that help policymakers decide which questions to ask. However, it is the responsibility of progressives to understand the causes of injustice and make things fairer. Our evidence-based policy should tackle why questions: why is the crime rate what it is? Why is it that students from certain areas achieve lower grades than their peers? Why are BAME people more likely to die from COVID? It is not enough to merely ask what the world is like, we must ask why it is like that so that we can strive to change it.

 

Gautam Kambhampati is a PhD student at Imperial College London. His research focusses on quantum information and control. He is also the director of Academical Machine Learning and editor of the Tamarind Literary Magazine.


References

[1]: Storm, M. (ed.) (2020) “Modern Britain: Global Leader in Ethical AI” Young Fabians Pamphlets. Available from: http://www.youngfabians.org.uk/modern_britain_global_leader_in_ethical_ai [Accessed 16th November 2020]

[2]: Meijer, A. and Wessels, M. (2019) “Predictive policing: Review of benefits and drawbacks” International Journal of Public Administration 42.12:1031-1039.

[3]: Zerilli, J., et al. (2018) “Transparency in Algorithmic and Human Decision-Making: Is There a Double Standard?” Philosophy & Technology 32:661–683. Available from: doi:10.1007/s13347-018-0330-6

[4]: Hempel, C. and Oppenheim, P. (1948) “Studies in the Logic of Explanation” Philosophy of Science, vol. 15, iss. 2, p. 135–175.

Do you like this post?