Trevor Paglen: On 'From Apple to Anomaly'

Exploring the underbelly of our digital world, Trevor Paglen discusses the deeper meanings in his installation in The Curve, revealing the powerful, and often hidden, forces at play in artificial intelligence.

About the exhibition

Trevor Paglen From 'Apple' to 'Anomaly' © Tim P. Whitby, Getty Images

Trevor Paglen introduces the processes behind training and visualising gargantuan data sets before discussing with Anthony Downey the role such data sets play in machine learning and the impacts these algorithms have on our world order.

'Something very dramatic is happening in the world of visual culture and the world of images that surround us, that are part of the societies that we live in. We're increasingly embedded within sensing systems, whether that's cameras that are embedded within urban infrastructures that take pictures of car's license plates or facial recognition software that are installed at airports and borders and commercial places. Commercial imaging systems in shopping malls that record your movements through department stores, that monitor your facial expressions, trying to figure out what products you might like or systems that try to read your lips and try to decipher what you're talking about when you're talking to other people.

These kinds of sensing systems are also in places that are maybe less obvious, for example, you put a picture on Facebook for a few of your friends to see it, but in the background, those images are being scrutinised in great detail by a host of artificial intelligence algorithms that have been trained to try and recognise things in them.'

'What we're seeing is the advent of a new relationship to images...'

Trevor Paglen From 'Apple' to 'Anomaly' © Tim P. Whitby, Getty Images

New ways of seeing

'What we're seeing is the advent of a new relationship to images. In the past, images needed somebody to look at them in order for them to come into existence. An image that nobody ever saw basically didn't exist. But with things like computer vision and artificial intelligence that's not true anymore.

There's a vast world of images that are now machine readable that don't need humans to look at them anymore to make sense of them. I've been trying to learn how machines look at images. I want to know what forms of seeing are embedded within technical systems. In my studio we've been developing a lot of software that allows us, as humans, to try to see through the eyes of machines.

For example, we created a programming language that allows us to do things like let's take a picture of a string quartet playing music and run it through software that you would use for a guided missile, or for a self driving car, or an AI algorithm that's doing object detection, and it will draw pictures that represent what that algorithm is seeing when it's looking at an image.'

'What forms of seeing are embedded within technical systems?'

Trevor Paglen From 'Apple' to 'Anomaly' © Tim P. Whitby, Getty Images

Building a neural network

'To do object recognition, you use things called neural networks. Basically, that's another way of describing artificial intelligence or machine learning systems. In order to build a neural network that can recognise different objects, the first thing you have to do is you start with a taxonomy. In other words, you need to create a giant list of all the objects you want to be able to recognise. For example, let's pretend we're going to make a neural network to recognise things in our kitchen - apples, oranges, spoons - you make a list of all the things you want it to be able to recognise. Then you start to build what's called a training library or training set. You give the neural network many, many hundreds, if not thousands, of examples of each of those objects you want it to learn how to see. You feed it thousands of pictures of oranges, thousands of pictures of spoons, thousands of pictures of plates, forks…

The system then conducts a statistical analysis of all of those images and breaks them down in to what I think of as primitive components, or primitive shapes. They could be horizontal lines, diagonal lines, vertical lines, and it will invent a series of primitive shapes that it can then assemble in various ways to make sense of more complex objects'.

Training a neural network

'When a neural network has been trained in this way, you can show it a picture of something it's never seen before and it will analyse that image and ask, what are the formal components that make up that image? A spoon might be some vaguely parallel lines, chrome silver colour, an ellipse on one side of it. And you break it down into these shapes and if you find all of these primitive shapes, it's probably more likely to be a spoon than something else. Banana will be very different - it'll be made of arcs, and more yellowish colours - and when it finds all of those primitive shapes put together, the system will determine - this is a banana. Once you've trained your network to recognise these different objects you can start doing things like showing it an apple and it says 'this is an apple'.

The images you're feeding into the network are what are called training images or training data. It's this data, these images, that are training the neural network how to see.'

'Every time you create a taxonomy, there's always a politics to that…

You’re always creating a negative space...'

Trevor Paglen From 'Apple' to 'Anomaly' © Tim P. Whitby, Getty Images

Interrogating training sets

'Looking at these collections of training images has been something that I've spent a lot of time working on for the last few years. When you're creating a training set, you have to create an overall taxonomy - this is true of all kinds of training sets. Every time you create a taxonomy, there's always a politics to that - because when you're creating a taxonomy you're saying this is a range of categories that are intelligible, and it's always going to be a limited range. In doing so you're always creating a negative space the things that are outside of that, the things that are not intelligible.

When you're creating data sets for something like emotion, as used in ‘affective computing’ where a computer learns your emotional state by looking at your face, we can start looking at what kind of assumptions are built into a training set.'

Cracking open ImageNet

'The most ambitious training set, and most widely cited is called ImageNet. ImageNet is the training set that provides the basis for the installation in The Curve.

It was published in 2009 and created by researchers at Stanford and Princeton University and has become the gold standard for training sets. It consists of over 14 million images and those images have been labelled into more than 20,000 categories. It was an attempt - in the words of its creators - to 'map out the entire world of objects'. It's a massive database and it's intricate. It's kind of wondrous to behold. There are, for example, 1,478 pictures of strawberries, 932 pictures of strawberry icecream, there are 604 pictures of strawberry daiquiris. There are classes for apples, apple butter, apple fritters, apple dumplings, apple jelly, apple juice, apple pie - and that isn't even all of the apple things!'

Trevor Paglen From 'Apple' to 'Anomaly' © Tim P. Whitby, Getty Images

'When we start looking at ImageNet, we start to see a world view coming into view. A world view that is contained within the training set itself. We have the overall taxonomy, but then we can look at the individual categories. What kinds of concepts become ‘real’ by getting a class within the training set and having images attached to those? Things like apple might be relatively uncontroversial but as we go further into the dataset, it gets stranger and stranger...'

'The computer vision system isn't so much describing people, as judging them...'

Trevor Paglen From 'Apple' to 'Anomaly' © Tim P. Whitby, Getty Images

'At some point, the computer vision system isn't so much describing people, as judging them. Where does that come from? When we start looking more closely at ImageNet, we find that there's about 2,800 categories relating to people in there. When we start looking at the categories of people we very quickly find categories are not only judgemental, but also classist, ablist, racist, homophobic, misogynistic and just cruel. We have things like 'bad person', 'debtor' - like you can tell what someone's bank account is by looking at their face in the weird epistemology of ImageNet?

'Swinger'
'Tramp'
'Ball buster'
'Ball breaker'
'A demanding woman who destroys men's confidence'
'Failure'
'Loser'
'Non-starter'
'Unsuccessful person'
'Jezebel'
'Drug-addict'
'Junkie'

These are all pictures that the researchers collected by scraping the internet, scraping people's Flickr accounts, and then hiring Amazon Mechanical Turk workers to label those images and sort them into these 20,000 categories. What does it mean to look at an image and label it? When we look at an image, what do we see? A woman on holiday on the beach has been labelled a 'kleptomaniac'. A man labelled a 'loser'.

The point is, looking at these training sets and trying to look at the categories they've been collected into is trying to look at what world views and what forms of politics and what cultural and political forms of seeing are kind of hard-wired into computer vision systems and technical systems. Systems whose creators would often like you to imagine are neutral and it’s all maths, algorithms and science. But when you crack open the hood of them, you will find all sorts of questionable things going on...'

'What does it mean to look at an image and label it?'

Trevor Paglen In Conversation with Anthony Downey

An edited extract

AD: The installation in The Curve is overwhelmingly visual, but it’s not about what we can see – it’s about the backend, what we don’t see…The sense of visibility I think is key to a lot of your work, visualising invisibility. I think there's a cliche that we're somehow drowning in images whereas the reality is we're only seeing about 1% of the images that are currently circulating via network systems of communication. And a 99% that are manifesting a new world order - and that world order is closed to us, in as much as we do not see that system at work.

TP: You can think about them as infrastructure. You're looking at these masses of images - they aren't actually for us to look at, but they are hardwired into technical systems that are looking at us.

AD: The show is called From Apple to Anomaly. An apple is an apple is an apple, right?

TP: Right. Except for when it's not... which is always.

AD: And this is the problem! Because apple is a normative noun and anomaly is a relational noun. What you define as an anomaly, I might not define as an anomaly. There are cultural, political, economic ways of thinking about an anomaly. It's a cultural construct as a word. A computer cannot understand that effect.

TP: Exactly. We can't even say an apple is an apple. What's an apple? It's knowledge, it's sin - it has all kinds of cultural associations and depending on that context it can invoke those or not. But, that's a more philosophical case than 'anomaly' which like everybody would agree, that's not a 'thing' - it's a relational concept, just something we think is weird! What does it mean to universalise that? And by universalise, I mean hardwire it into a technological system.

The Treachery of Object Recognition, 2019

The Treachery of Object Recognition, 2019

AD: The first image in the exhibition is not from a data set it's by René Magritte. A very specific image 'Ceci n'est pas une pomme'. Why did you choose this particular image?

TP: For me, a print of this image begins the whole show and it's basically posing the question, for me, what is an apple? What is a representation? And more significantly who gets to decide what an image means? In the case of the Magritte painting, it's a picture of an apple - that says 'this is not an apple'. And for me, that becomes a kind of allegory for a kind of self-representation - an acknowledgement that representations are always relational and they are based on some kind of consensus. And those consensuses can change.

We think about a lot of queer or feminist liberation projects are about trying to change the meaning of images - to be able to define the meaning of one’s own image. In Magritte’s painting, it's pointing in that direction, to the politics of representation. And then to have the computer vision system imposing its will on that image and saying, ‘No I don't care what you think you're doing here, Magritte, this is a picture of an apple’. To me, the piece then becomes an allegory for the underlying politics about who gets to decide what the meaning of images are. Who gets to create those classifications?

There is no way to debias these systems. They are always going to have a world view built into them and the best thing you can do is pick what kind of world view you want with the understanding that not only is a world view embedded in technical systems, but that world view is then reimposed on the world that it is intervening in.

There are projects right now within the field of machine learning around `fairness and transparency’. People are trying to technically debias training data – there’s a realisation that it’s a bad idea if all CEOs are white and all criminals are black, as it is now. But the technical solutions, to simply make more racial or gender classifications, still imply that you can make assumptions on such things without asking the person. It still requires this ‘pre-labelling’ and preconceptions being applied to images.

AD: Absolutely. And what’s interesting is that these images are acting as a kind of ‘apparatus’, a notion explored by philosopher Giorgio Agamben. Potentially the algorithm actually is an apparatus. We're not meant to see these images, per se. Yet they are imbricated with an ideology, they are not abstract, they are affecting a world order which we and despite the fact that we upload to it, have very little to do with.

One of the things I was thinking about was data-mining and who produces data. What I wanted you to talk about is training humans and your ImageNet Roulette, because this is obviously practical. The art world, despite its best intentions and despite the political claims made on the art work - it doesn't really change anything, ever. Whereas this actually did change something. I was wondering if you could just talk about Image Net Roulette, training humans and how you subvert this idea of data-mining and data surveillance?

TP: ImageNet Roulette was a project where I took these people categories from the ImageNet data set and I trained a model on them. So we made a neural network that was trained on just the kinds of people in ImageNet. Then you can upload a picture and ImageNet Roulette would classify you. For me, I was a psycholinguist and a skinhead!

I built this application earlier in the year and then for some reason last week a post doc tweeted it - and it went viral and we were doing 1.2million images a day. Then basically what happened was the people that created ImageNet said that they were going to delete a lot of the people categories and undertake a bigger project to de-bias it. Like having white people and black people as criminals, for example. I don't know the approach they are taking is going to work but the project very clearly underlined that there was a problem.

On the website for the project it said in big letters 'this produces racist, misogynistic and horrible results' all the time - and the point of the project was to show you how bad these systems are by giving you feedback about what it's seeing when it's looking at your picture.

AD: One of the things we need to talk about is labour. One of the thing about an algorithm is that it emerges, genie-like, out of nowhere and then replicates and self replicates - is untrue. And one of the things this work makes clear is the labour, physical labour. I want to talk about the physical labour going into this and how that is remunerated - because that is important.

TP: When you're looking at these training sets, you have massive, massive numbers of images that have to be categorised and put into categories - someone has to do this. So the way they do this is they have online platforms like Mechanical Turk, that are mostly outsourced to often developing countries where you basically hire people to look at images. There is an enormous amount of labour that is underneath the training layer itself. You have actual people, bodies clicking on things - it's called 'click work'. And that is, in the installation something we were trying to mirror. In the installation downstairs there are 35,000 images that have been individually printed and individually pinned to the wall. And that was 10 people working for 2.5 weeks basically around the clock. For me that was a really important part of the project - to create an installation where the amount of labour that went into the creation of the collection of images, there was a trace of it that was still visible in the installation itself in a way that is harder to see in the training set, but is certainly underlining them.

AD: This precarity of labour is interesting, it's the dark side of the web. I want to ask a very basic question - what is to be done? Because I don't think this is a generic or abstract question. I think something profound has happened. There are algorithms out there that we genuinely don't know what they are doing, but they are doing something.

How do we offer something that could potentially disrupt this very system which is creating a new world order, not before our eyes, but almost behind our eyes?

TP: I don't know the policy answer to that, but I know how to begin the conversation. I think you have to begin the conversation by a re conceptualisation of how we think about what technology is. And the metaphor that drives me crazy all the time is 'technology is like a hammer, you can use it to build a house or you can hit someone on the head with it' - and that's useless. They have politics built into them.

The most dramatic example - nuclear weapons. If you're going to have nuclear weapons you need to have certain types of infrastructure in place which means you have to have certain kinds of security measures in place, certain kinds of economies - the existence of nuclear weapons is going to have geopolitical ordering as a consequence of it. In other words there is a vision of society that is built into the weapon and that the existence of the weapon has to reproduce. And I think that that is true for these kind of systems as well, as it is for all kinds of technology. When we're talking about machine learning, I can build little models in my studio, but to really do this at scale you need to be able to collect all the data of everybody on the planet. There's 5 companies in real life that can do that at scale and so what kind of vision of politics is inherent in that? What are the places where we collectively decide that these sorts of technologies might be a good – or bad - thing? I think that's not an impossible thing to imagine - and I think we should definitely start imagining that that is possible. In my opinion, there are problems that we want to believe these systems will solve that are not solvable by them - and that will only become worse if we apply those tools to them.

AD: There was a time before the internet and there will be a time after the internet but effectively we are living through a moment which will define, at least for a generation, how we interact not with the internet but with the world. And ultimately, the 'apparatus' of that, we are not yet privy to.

_{Transcribed from Trevor Paglen in conversation with Anthony Downey on Thursday 26 September 2019.}

Watch an interview with Trevor Paglen:

Trevor Paglen: From Apple to Anomaly

26 Sep-16 Feb, The Curve

Artist Trevor Paglen’s new Curve commission takes as its starting point the way in which AI networks are taught how to ‘see’ and ‘perceive’ the world by taking a closer look at image datasets.

Paglen has incorporated approximately 30,000 individually printed photographs, largely drawn from ImageNet, the most widely shared, publicly available dataset. Discover how the advent of autonomous computer vision and AI has developed, rife with hidden politics, biases and stereotypes.

Free entry