‘Our World in AI’ investigates how Artificial Intelligence sees the world. We use AI to generate images for some aspect of society and analyse the result. Will Artificial Intelligence reflect reality, or does it make biases worse?
Here’s how it works. We use a prompt that describes a scene from everyday life. The description needs to be specific: that helps the AI generate consistent output quickly and helps us find relevant data about the real world. We then take the first 40 images, analyse them, and compare the result with reality. Here goes.
Today’s prompt: “the perfect English mum pushing a pram”
We tried OpenAI’s DALL-E and Stable Diffusion, which is open source. Fig 1 shows output from DALL-E on the left (or view the public collection here) and Stable Diffusion on the right.
Let’s look at DALL-E first. It really ran with that ‘perfect’ part of the prompt, producing a consistent early-2000s-South-East-England-middle-class-aspirational vibe. Perfect mums have long (mostly) blonde hair, are well-dressed, and in great shape. Stable Diffusion is similar but more realistic, with more variation in age and body shape. It also created our favourite buggies. Check out the third row from the bottom, the image on the right and the fifth row from the top, the second image from the left.
Stable Diffusion has an interesting pink theme. It feels like it was trained on a small data set yet displays more diversity than DALL-E. In any case, both AIs clearly grew up on a diet of stock photography and turn-of-the-century stereotypes. We return to this point at the end of the quarter when we review the previous 12 weeks.
For today’s analysis, we planned to look at ethnic minorities. The 2021 census published by the Office of National Statistics reports that 19% of the English population is non-White. But both AIs generated only white people. So, instead, we look at weight.
None of DALL-E’s mothers is overweight or obese. And Stable Diffusion shows two slightly overweight mums, if we are uncharitable. Let’s compare that to the real world. We use the 2021 Health Survey for England published by the NHS and take averages for the 25-34 and 35-44 age buckets. Fig 2 shows our findings.
The good news is that we don’t need to perform any statistical tests to check how the AIs compare with reality or each other. In the final section of this column, we choose whether AI’s interpretation of society is leading, lagging, or live.
Today’s verdict: Lagging
DALL-E and Stable Diffusion both generated images based on an artless and persistent notion of perfection. Or, at least, we think so. To be sure, we should test what happens if we repeat the prompt without the ‘perfect’.
Next week in Our World in AI: doctors.