11. Our World in AI: School teachers

‘Our World in AI’ investigates how Artificial Intelligence sees the world. We use AI to generate images for some aspect of society and analyse the result. Will Artificial Intelligence reflect reality, or does it make biases worse?

Here’s how it works. We use a prompt that describes a scene from everyday life. The description needs to be specific: that helps the AI generate consistent output quickly and helps us find relevant data about the real world. We then take the first 40 images, analyse them, and compare the result with reality. Let’s see what we get.

Today’s prompt: “a school teacher in England helping a pupil”

We used OpenAI’s DALL-E 2 and Stable Diffusion, which is open source. Fig 1 has the results with DALL-E on the left (or view the public collection here) and Stable Diffusion on the right.

Two panels of 40 images generated for the prompt 'a school teacher in England helping a pupil'. The left panel has results from DALL-E and the right panel for Stable Diffusion. Our world in AI: School teachers
Fig 1: Result with DALL-E 2 on the left and Stable Diffusion on the right

Let’s look at Stable Diffusion first. It generated seven black-and-white images. Interestingly, an alternative prompt where the teacher sits at a desk produced only black-and-white pictures in an old-fashioned photograph style. Stable Diffusion, again, created our favourite image. We love the fourth row from the top, first image: the casual hand-in-pocket with the third arm is very teacher. But, most striking, all school teachers are women.

DALL-E did better on gender diversity by generating seven male teachers. Let’s compare our results with the real world. We use the UK Government’s school teacher workforce report for the 2021/22 academic year, and Fig 2 shows the graph.

A hundred percent stacked column chart showing the distribution of gender categories by source. Our world in AI: School teachers.
Fig 2: Distribution of school teachers by gender and data source

DALL-E performed well – the distribution of female and male teachers is not statistically different from reality.* Now, in the final section of this column, we choose whether AI’s interpretation of society is leading, lagging, or live.

Today’s verdict: Live for DALL-E and Lagging for Stable Diffusion

DALL-E reflected reality, but Stable Diffusion makes bias worse on this prompt. The Stable Diffusion Playground server was busy when we collected the data, and we saw some weirdness with the images, so we’ll try this one again at another time. For now, we’re delighted that DALL-E did a great job!

Next week in Our World in AI: professors.


* We run a Chi-Square test for independence. The null hypothesis is that there is no relationship between gender and the data source. The alternative hypothesis is that there is a relationship between gender and the data source. We interpret the result as follows. If we reject the null hypothesis, there is a relationship between the data, and we can identify the origin. If we do not reject the null hypothesis, there is no relationship, and we cannot distinguish between data sources. We evaluate at significance level α = 0.10.

  • DALL-E and real-world data: p = 0.585 (do not reject)


Posted

in

,

by