Abstract
By looking at the politics of classification within machine learning systems, this article demonstrates why the automated interpretation of images is an inherently social and political project. We begin by asking what work images do in computer vision systems, and what is meant by the claim that computers can “recognize” an image? Next, we look at the method for introducing images into computer systems and look at how taxonomies order the foundational concepts that will determine how a system interprets the world. Then we turn to the question of labeling: how humans tell computers which words will relate to a given image. What is at stake in the way AI systems use these labels to classify humans, including by race, gender, emotions, ability, sexuality, and personality? Finally, we turn to the purposes that computer vision is meant to serve in our society—the judgments, choices, and consequences of providing computers with these capacities. Methodologically, we call this an archeology of datasets: studying the material layers of training images and labels, cataloguing the principles and values by which taxonomies are constructed, and analyzing how these taxonomies create the parameters of intelligibility for an AI system. By doing this, we can critically engage with the underlying politics and values of a system, and analyze which normative patterns of life are assumed, supported, and reproduced.
Similar content being viewed by others
Data availability
Not applicable.
Code availability
Not applicable.
Change history
25 November 2021
The Editor-in-Chief has removed Figure 1 and replaced Figure 2a due to copyright and consent concerns
23 November 2021
A Correction to this paper has been published: https://doi.org/10.1007/s00146-021-01301-1
Notes
Minsky currently faces serious allegations related to convicted pedophile and rapist Jeffrey Epstein. Minsky was one of several scientists who met with Epstein and visited his island retreat where underage girls were forced to have sex with members of Epstein’s coterie. As scholar Meredith Broussard observed, there is a a broader culture of exclusion and hostility that became endemic in AI: “as wonderfully creative as Minsky and his cohort were, they also solidified the culture of tech as a billionaire boys’ club. Math, physics, and the other “hard” sciences have never been hospitable to women and people of color; tech followed this lead.” See Broussard (2018).
See Crevier D (1993).
Minsky gets the credit for this idea, but clearly Papert, Sussman, and teams of “summer workers” were all part of this early effort to get computers to describe objects in the world. See Papert SA (1966). As he wrote: “The summer vision project is an attempt to use our summer workers effectively in the construction of a significant part of a visual system. The particular task was chosen partly because it can be segmented into sub-problems which allow individuals to work independently and yet participate in the construction of a system complex enough to be a real landmark in the development of ‘pattern recognition’.
Russell SJ (2010).
In the late 1970s, Ryszard Michalski wrote an algorithm based on “symbolic variables” and logical rules. This language was very popular in the 1980s and 1990s, but, as the rules of decision-making and qualification became more complex, the language became less usable. At the same moment, the potential of using large training sets triggered a shift from this conceptual clustering to contemporary machine-learning approaches. See Michalski R (1980).
There are hundreds of scholarly books in this category, but for a good place to start, see Mitchel WJT (2007).
As described in the AI Now Report 2018, this classification of emotions into six categories has its root in the work of the psychologist Paul Ekman. “Studying faces, according to Ekman, produces an objective reading of authentic interior states—a direct window to the soul. Underlying his belief was the idea that emotions are fixed and universal, identical across individuals, and clearly visible in observable biological mechanisms regardless of cultural context. But Ekman’s work has been deeply criticized by psychologists, anthropologists, and other researchers who have found his theories do not hold up under sustained scrutiny. The psychologist Lisa Feldman Barrett and her colleagues have argued that an understanding of emotions in terms of these rigid categories and simplistic physiological causes is no longer tenable. Nonetheless, AI researchers have taken his work as fact, and used it as a basis for automating emotion detection.” Whitaker M et al. (2018). See also Barrett LF et al. (2019).
Fei-Fei Li, as quoted in Gershgorn D (2017).
Markoff J (2012).
Their paper can be found here: Krizhevsky et al. (2012).
Released in the mid-1980s, this lexical database for the English language can be seen as a thesaurus that defines and groups English words into synsets, i.e., sets of synonyms. https://wordnet.princeton.edu This project takes place in a broader history of computational linguistics and natural-language processing NLP), which developed during the same period. This subfield aims at programming computers to process and analyze large amounts of natural language data, using machine-learning algorithms.
These are some of the categories that have now been entirely deleted from ImageNet as of January 24, 2019.
For an account of the politics of classification in the Library of Congress, see Berman S (1971).
We’re drawing in part here on the work of Lakoff (2012).
See Deng et al. (2009).
Quoted in Sekula A (1986).
Ibid; for a broader discussion of objectivity, scientific judgment, and a more nuanced take on photography’s role in it, see Daston et al. (2010).
UTKFace (2019).
See Edwards and Gabriellecht (2010). Earlier classifications used in the 1950 Population Act and Group Areas Act used four classes: “Europeans, Asiatics, persons of mixed race or coloureds, and ‘natives’ or pure-blooded individuals of the Bantu race” Bowker and Star, 197). Black South Africans were required to carry pass books and could not, for example, spend more than 72 h in a white area without permission from the government for a work contract 198).
Bowker and Star, 208.
See Davis FJ (2001).
See Buolamwini and Gebru (2018).
Merler et al. (2019).
Webscope | Yahoo Labs (2019).
Solon O (2019).
Fiure Eight (2019).
The authors made a backup of the ImageNet dataset prior to much of its deletion.
Their “MegaPixels” project is here: https://megapixels.cc/
Satisky (2019).
2nd Unconstrained Face Detection and Open Set Recognition Challenge (2015).
Locker M (2019).
Murgia M (2019).
Locker, “Microsoft, Duke, and Stanford Quietly Delete Databases”.
Full video here: Singh (2018).
Melendez (2018).
Vincent (2018).
Ibid.
Gould, The Mismeasure of Man, 140.
References
2nd unconstrained face detection and open set recognition challenge. https://vast.uccs.edu/Opensetface/. (Accessed 28 August 2019)
Barrett LF (2006) Are emotions natural kinds? Perspect Psychol Sci 1:1. https://doi.org/10.1111/j.1745-6916.2006.00003
Barrett LF et al (2019) Emotional expressions reconsidered: challenges to inferring emotion from human facial movements. Psychol Sci Pub Interest 20:1. https://doi.org/10.1177/1529100619832930
Bechmann A, Bowker GC (2019) Unsupervised by any other name: hidden layers of knowledge production in artificial intelligence on social media. Big Data Soc 6:1. https://doi.org/10.1177/2053951718819569
Berman S (1971) Prejudices and antipathies: a tract on the LC subject heads concerning people. Scarecrow Press
Bowker GC, Star SL (2000) Sorting things out: classification and its consequences, 1st edn. MIT Press
Broca P (1864) Sur le crâne de Schiller et sur l’indice cubique des cranes. Bulletin de la Société d’anthropologie de Paris
Broussard M (2018) Artificial unintelligence: how computers misunderstand the world. MIT Press, p 174
Buolamwini J, Gebru T (2018) Gender shades: intersectional accuracy disparities in commercial gender classification in conference on fairness, accountability, and transparency. http://proceedings.mlr.press/v81/buolamwini18a.html. Accessed 28 Aug 2019
Crevier D (1993) AI: the tumultuous history of the search for artificial intelligence. Basic Books
Daston L, Galison P (2010) Objectivity, Paperback. Zone Books
Davis FJ (2001) Who is black? One nation’s definition, 10th, anniversary. Pennsylvania State University Press
Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet: A Large-Scale Hierarchical Image Database. IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255.
Edwards PN, Hecht G (2010) History and the Technopolitics of Identity: the case of Apartheid South Africa. J South African Stud 36:3. https://doi.org/10.1080/03057070.2010.507568
Fiure Eight | The essential high-quality data annotation platform. https://www.figure-eight.com/. (Accessed 28 Aug 2019)
Gershgorn D (2017) The data that transformed AI research—and possibly the world. Quartz https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/. Accessed 28 Aug 2019
Gould SJ (1996) The mismeasure of man, revised and expanded. Norton
Justin E (1943) Lebensschicksale artfremd erzogener Zigeunerkinder und ihrer Nachkommen [Biographical destinies of Gypsy children and their offspring who were educated in a manner inappropriate for their species]. Friedrich-Wilhelms-Universität Berlin
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet Classification with Deep Convolutional Neural Networks in Advances in Neural Information Processing Systems 25. In: F. Pereira et al. (ed) Curran Associates, Inc. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf. Accessed 28 Aug 2019
Lakoff G (2012) Women, fire, and dangerous things: what categories reveal about the mind. University of Chicago Press
Le Bon G (1881) L’homme et les sociétés. Leurs origines et leur développement
Leys R (2010) How did fear become a scientific object and what kind of object is it? Representations 110:1. https://doi.org/10.1525/rep.2010.110.1.66
Leys R (2017) The ascent of affect: genealogy and critique. University of Chicago Press
Locker M (2019) Microsoft, Duke, and Stanford quietly delete databases with millions of faces. fast company. https://www.fastcompany.com/90360490/ms-celeb-microsoft-deletes-10m-faces-from-face-database. Accessed 28 Aug 2019
Markoff J (2012) Seeking a better way to find web images. The New York Times, https://www.nytimes.com/2012/11/20/science/for-web-images-creating-new-technology-to-seek-and-find.html. Accessed 28 Aug 2019
MegaPixels Project. https://megapixels.cc/
Melendez S (2018) Watch this drone use ai to spot violence in crowds from the sky.fast company. https://www.fastcompany.com/40581669/watch-this-drone-use-ai-to-spot-violence-from-the-sky. Accessed 28 Aug 2019
Merler M, et al. (2019) Diversity in faces. ArXiv 4:1901–10436. http://arxiv.org/abs/1901.10436.
Michalski R (1980) Pattern recognition as rule-guided inductive inference. IEEE Trans Pattern Anal Mach Intell 2:349–361
Mitchell WJT (2007) Picture theory: essays on verbal and visual representation. In: Paperback N (ed) Nachdr. University of Chicago Press
Murgia M (2019) Who’s using your face? The ugly truth about facial recognition. Financial Times. https://www.ft.com/content/cf19b956-60a2-11e9-b285-3acd5d43599e. Accessed 28 Aug 2019
Papert SA (1966) The summer vision project. https://dspace.mit.edu/handle/1721.1/6125.
Russell SJ, Norvig P (2010) Artificial intelligence: a modern approach, 3rd edn. Prentice Hall Series in Artificial Intelligence
Satisky J (2019) A Duke study recorded thousands of students’ faces. now they’re being used all over the world. The Chronicle. https://www.dukechronicle.com/article/2019/06/duke-university-facial-recognition-data-set-study-surveillance-video-students-china-uyghur. Accessed 28 Aug 2019
Sekula A (1986) The body and the archive. JSTOR October 39:3–64
Siegel EH et al (2018) Emotion fingerprints or emotion populations? A meta-analytic investigation of autonomic features of emotion categories. Psychol Bull. https://doi.org/10.1037/bul0000128
Singh A (2018) Eye in the sky: real-time drone surveillance system DSS) for Violent Individuals Identification. https://www.youtube.com/watch?time_continue=1&v=zYypJPJipYc9. Accessed 20 Sept 2019
Solon O (2019) Facial recognition’s ‘Dirty Little Secret’: millions of online photos scraped without consent. https://www.nbcnews.com/tech/internet/facial-recognition-s-dirty-little-secret-millions-online-photos-scraped-n981921. Accessed 20 Sept 2019
Stewart R, Brainwash Dataset Stanford Digital Repository (2015). https://purl.stanford.edu/sx925dc9385. (Accessed 28 Aug 2019)
UTKFace – Aicip. http://aicip.eecs.utk.edu/wiki/UTKFace. (Accessed 28 Aug 2019)
Vincent J (2018) Drones taught to spot violent behavior in crowds using AI. The Verge. https://www.theverge.com/2018/6/6/17433482/ai-automated-surveillance-drones-spot-violent-behavior-crowds. Accessed 20 Sept 2019
Webscope | Yahoo Labs. https://webscope.sandbox.yahoo.com/catalog.php?datatype=i&did=67&guccounter=1. (Accessed 28 Aug 2019)
Whitaker M, et al. (2018) AI Now Report 2018. https://ainowinstitute.org/AI_Now_2018_Report.pdf. Accessed 20 Sept 2019
Miller GA (1998) WordNet: An electronic lexical database. MIT press
Acknowledgements
An earlier version of this article was originally published on September 19, 2019 at http://www.excavating.ai. Thanks to all those who have given editorial feedback, technical support, research contributions, and conversations on these issues over the years, including Arvind Narayanan, Daniel Neves, Varoon Mathur, Olga Russakovsky, Leif Ryge, Léa Saint-Raymond, and Kiran Samuel., Additional thanks to Mario Mainetti and Carlo Barbatti and all the staff at the Fondazione Prada, and to Alona Pardo and the staff at the Barbican Centre. The images in this essay and many more are part of the Training Humans exhibition, at the Fondazione Prada Osservatorio in Milan from September 12, 2019 through February 24, 2020.
Funding
None.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare no conflict of interest other than the professional affiliations as listed.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Crawford, K., Paglen, T. Excavating AI: the politics of images in machine learning training sets. AI & Soc 36, 1105–1116 (2021). https://doi.org/10.1007/s00146-021-01162-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00146-021-01162-8