The Digital Civil Society Lab was delighted to be joined by Dr. Alex Hanna who discussed, Shifting the Frame: The Labors of ImageNet and AI Data.
Artificial intelligence (AI) technologies like ChatGPT, Stable Diffusion, and LaMDA have led a multi-billion dollar industry in generative AI, and a potentially much larger industry in AI more generally. However, these technologies would not exist were it not for the immense amount of data mined to make them run, low-paid and exploited annotation labor required for labeling and content moderation, and questionable arrangements around consent to use these data. Although datasets used to train and evaluate commercial models are often obscured from view under the shroud of trade secrecy, we can learn a great deal about these systems by interrogating certain publicly available datasets which are considered foundational in academic AI research.
In this talk, Dr. Hanna investigates a single dataset, ImageNet. It is not an understatement to say that without ImageNet, we may not have the current wave of deep learning techniques which power nearly all modern AI technologies. She begins from three vantage points: the histories of ImageNet from the perspective of its curators and its linguistic predecessor WordNet, the testimony of the data annotators which labeled millions of ImageNet images, and the data subjects and the creators of the images within ImageNet. Academically, she situates this analysis within a larger theory and practice of infrastructure studies. Practically, she points to a vision for technology which is not based on practices of unrestricted data mining, exploited labor, and the use of images without meaningful consent.