Anna Go - On interpretability in AI

Info

In a recent issue of nature reviews physics, Prof. Matthew D. Schwartz from Harvard, whom I know through IAIFI, shared an opinion piece titled “Should artificial intelligence be interpretable to humans?”. In this essay, he argues that interpretability of AI is not achievable, but also not necessary. And I maximally disagree with this.

Interpretability is a topic I care about deeply. In fact, interpretability of AI—or rather a lack thereof—is what got me into AI research. As deep learning began to demonstrate impressive results in computer vision, many researchers, including physicists, started to adopt DL in their field despite its obvious insufficiencies (brittleness, unexpected failure modes, etc.). The reason why these insufficiencies can not be easily addressed is the lack of interpretability: Characterizing limitations (this means knowing the failure modes), ensuring more reliable results and possibly extending the regime of applicability of AI models requires us having a sufficiently thorough understanding of the factors that determine AI models’ output. Without such understanding, we can only use AI in settings where we can cross-check the outputs with known answers (which is insufficient in science), or we can take a risk and apply AI without understanding the full scope of its capabilities (which is inacceptable in science). Therefore, I have been pursuing interpretability since I started doind research in AI.

So, the following is my impromptu rebuttal to Schwartz’s essay. Just for context: I wrote it in our internal IAIFI chat where we were discussing the essay, i.e., it was not meant for publication and is therefore not as polished as I’d like. However, it still is a reasonable little opinion piece that I deem is worth sharing, since I think this is an important discussion topic. You can follow my arguments best after reading Schwartz’s essay (unfortunately it’s behind a paywall).

Part I

Should artificial intelligence be interpretable to humans? Yes, absolutely. To be precise, I take “artificial intelligence” here to be “AI methods”, i.e., all sorts of Machine Learning (ML) and Deep Learning (DL) techniques, rather than an “artificial intellect”. The latter does not exist (yet), but some believe it can be achieved by scaling up current AI methods; whether or not this is the right path, I believe that interpretability of AI methods should be our primary concern.

My main criticism about Schwartz’s essay is that it presents a rather narrow viewpoint, focusing on potential achievements of AI in physics and completely ignoring the reality of AI deployment today. The prime application of AI is automated decision-making across a variety of fields, including healthcare and science. The rapid export of AI methods into these fields happened during the past couple of years inspired by some impressive achievements of AI in computer vision. On the second look, however, it turned out that AI-based predictions are often flawed, and that AI models are brittle. For instance, a common unresolved problem is failure under “distribution shift”, i.e. changes in the input data.

Here is a concrete example from medical diagnostics: An AI-based model is deployed to classify x-ray images. It was trained on a dataset originating from lab A and worked as desired. However, it performed poorly on images taken by lab B. An ablation study revealed that the model is forming its decisions largely based on lab markers on the image margins. And even after this was corrected, further investigation revealed that the model’s performance varied strongly depending on factors such as image size and resolution, imaging device, patient age and ethnicity, etc. To learn more about this specific case and AI for med in general, I recommend watching this talk by Jayashree Kalpathy-Cramer.

One could argue that all of the above is a problem of training data, and that one needs to ensure that the training set is clean, balanced and complete. Well, this is in fact a very hard problem and not always achievable. Furthermore, how do we ensure the removal of all possible confounders to prevent the model from learning “spurious correlations”? And even if we did our best, the question still is: At what point do we rely on AI’s predictions? This is why we need interpretability research.

Luckily, this issue has been gaining more recognition in the AI research community over the past couple of years and increasingly more effort is being put towards its solution with a number of different approaches. For instance, there are attempts to reverse-engineer neural networks – most famously, there is the “circuits” work led by Chris Olah on vision models and more recently on transformers. There is also a resurgence in the community around causality which advocates for the need to formalize and model the causal relationships between input and model prediction in order to tackle the problem of “spurious correlations” (e.g., check out this recent ICML 2022 workshop).

Part II

In the following, I refer to passages in Schwartz’s essay and comment on several of the points he makes, occasionally wandering off on some tangents.

weights do not equal synapses

In the section comparing biological and artificial intelligence, the author juxtaposes the number of synapses in a brain with the number of weights in a large language model to make some estimates about the models’ capabilities surpassing human. I have doubts about this being a meaningful comparison. First of all, can we equate biological synapses with trainable parameters? I remember at least one reference arguing that a single synapse is rather equivalent to a multi-layer ANN (artificial neural network). Which metric this work used and whether this result is the consensus in the neuro-AI community, I do not know. However, since the resemblence between artificial and biological neurons is rather spiritual than exact (in particular once we take it beyond a single neuron and towards comparing systems such as PaLM with the animal brain), I argue that brain synapses and ANN weights can not be defined as the respective “computational units”, such that their 1:1 comparison is not a fair one and therefore not a predictive measure for ANN capabilities relative to the brain.

data is crucial

Setting aside this biological-vs-artificial comparison, the second caveat is that all arguments in the related section of Schwartz’s essay tacitly rest on the assumption that the capabilities of an ANN will necessarily and indefinitely increase as the number of trainable parameters becomes larger. However, the success of modern AI models relies not only on scale, but also on data – its quantity, quality and the form it is presented in to the ANN in the training process. In fact, the author brings up data scarcity in the next section, but does not provide any resolution to this issue.

language models’ capabilities in science

With regards to the question posed at the end of the second section about ANNs generating scientific papers, I cannot help but think about the new model GALACTICA released by META as an “AI system which can produce academic papers based on simple prompts”. It was made publicly available a couple days ago. Unfortunately, it did not pass the stress test by the public, and I think its failure is quite instructive. It is also worth noting that the earlier mentioned model Minerva that “aced” high-school math questions scores about 50%. This is a record for language models, but certainly not good enough compared to a (well-prepared) human.

At least from my experience, language models are currently far from being able to write a coherent scientific paper. Interestingly, AI models demonstrate excellent results on creative generative tasks and even manage to combine multiple modalities (e.g., image generation based on a text prompt with Stable Diffusion has taken the spotlight recently). On the other hand, AI-based chat agents are pretty good at reciting facts. But the combination of the two, i.e. the creative generation of factually correct output, is a much harder challenge and is yet to be mastered. Maybe this would constitute the leap to “understanding”?..

understanding, technically and emotionally

The question Schawartz raises about “understanding” is an extremely interesting and deep one! I think it deserves a long interdisciplinary discussion. I do not have a good definition of what “understanding” exactly means, but to me “understanding” is more than AI is currently capable of. Personally, I advocate to refrain from using formulations such as “the AI model understands” loosely, because it contributes to a wrong narrative, in particular for people who are not involved in AI research. Apart from overstating the ability of AI models, there is an important psychological aspect to this wording: In colloquial language, “exhibiting understanding” is strongly associated with the subject having human-like intellect, which in turn is directly entangled with emotions, possibly because “technical understanding” and “emotional understanding” (as in, “feeling understood”) easily gets mixed. For instance, consider the statement “Dogs are so smart, they understand everything!” – for many humans, an observation like this elevates the subject in context (dogs) to a higher level, i.e., closer to humans. Consequently, when formulations like “it understands” or “it thinks” are used to describe AI models, it leads to humans subconsciously attributing human-like qualities to AI models, which, at the very least, is misleading. I think the signigficance of this psychological trap has grown immensely since the advent of language models, because output in form of human language is easily interpreted as “communication”, suggesting that there is some intelligent entity behind it, as opposed to e.g. numerical output.

Focusing only on technical understanding, when could we claim that an AI model has “understood” something? For instance, for most standard math problems there exists an exact algorithm that solves them. Therefore, one could argue that once the model has discovered and learned the algorithm, it has “understood” the underlying principle, and if it can parse any variation of the same problem, it will always compute the correct answer. However, math-based problems are only a small subset of all possible tasks, and the algorithmic approach does not easily apply to others.

WE build AI models

Using the ability to explain as a proof of understanding seems like a suitable approach to achieving AI interpretability. In this context, Schwartz debates whether communication between humans and (future) AI is possible and raises the question “Why should we expect the machines to be able to explain things to us?”. My answer is: Because we build these machines. These are our tools and we engineer them according to our needs. And if we are doing it properly, we should expect to see the desired result. There are limits on the machines’ capabilities, of course, but within the physical and technical constraints it is our scientific and engineering challenge to come up with the right solutions that would yield the desired results. By the way, some researchers are actively working specifically on communication – e.g., check out Been Kim’s work and her keynote speech at ICLR 2022.

AI is not an independently evolving organism

Finally, referring to the last paragraph in Schwartz’s essay I want to point out: AI is not an independently evolving organism; therefore, I think that evolutionary arguments can not apply to contrast the cognition of any organism on Earth with AI. Moreover, an important detail is that current AI models are not goal-seeking agents. In this sense, just like a computer or a pocket calculator, AI is a powerful tool, and it is up to us how we deploy it and what we achieve with it.

One could certainly entertain the possibility that future AI will be cardinally different and infinitely more capable than humans. I think that even this scenario is not a reason for adopting a defeatist attitude. Future AI will still be the result of our work, therefore we are best prepared by putting in research and engineering efforts towards developing AI with interpretability, communication skills and other desired properties today.