10. Our World in AI: Doctors

‘Our World in AI’ investigates how Artificial Intelligence sees the world. We use AI to generate images for some aspect of society and analyse the result. Will Artificial Intelligence reflect reality, or does it make biases worse?

Here’s how it works. We use a prompt that describes a scene from everyday life. The description needs to be specific: that helps the AI generate consistent output quickly and helps us find relevant data about the real world. We then take the first 40 images, analyse them, and compare the result with reality. Let’s see what we get.

Today’s prompt: “a GP in England writing up notes after seeing a patient”

GP is short for General Practitioner, a term used in England for primary care doctors. The one you go and see when you have a cold that won’t go away or need a prescription. We tried the prompt with OpenAI’s DALL-E 2 and Stable Diffusion, which is open source. Fig 1 has the results with DALL-E on the left (or view the public collection here) and Stable Diffusion on the right.

Two panels of 40 images generated for the prompt 'a GP in England writing up notes after seeing a patient'. The left panel has results from DALL-E and the right panel for Stable Diffusion. Our world in AI: Doctors
Fig 1: Result with DALL-E 2 on the left and Stable Diffusion on the right

The results show DALL-E with a neutral white-and-beige theme, while Stable Diffusion selects a medical blue palette. DALL-E has made real progress with faces and hands in the past three months, but Stable Diffusion is more mixed. Just look at the top row: the first image shows an arm with a floating hand from a different body, the doctor in the third image is super-productive writing with both his right hands, while the GP in the last image uses only one of them.

Both AIs generated female and ethnic minority doctors, but Stable Diffusion did more: 13 women vs DALL-E’s ten and eight ethnic minorities vs DALL-E’s five. Today we focus on gender.

We use data from Statista. They published a table with the number of GPs in England from December 2016 to January 2022 by gender (Fig 2). England has had more female than male primary care doctors since 2017 – something we learnt just now!

Statista table of GPs from 2016 to 2022 by gender
Fig 2: Statista table of GPs from 2016 to 2022 by gender

We compare our data to the year 2022 (Fig 3). Both AIs underrepresent women, but Stable Diffusion is closer to reality. In fact, Stable Diffusion is not statistically different from the real world in its distribution of female and male doctors, but DALL-E is.*

A hundred percent stacked column chart showing the distribution of gender categories by source. Our world in AI: Doctors.
Fig 3: Distribution of primary care doctors by gender and data source

So that’s good news for Stable Diffusion, but it’s a close call because the test result is near the boundary. In the final section of this column, we choose whether AI’s interpretation of society is leading, lagging, or live.

Today’s verdict: Lagging

DALL-E and Stable Diffusion both underestimated the number of female doctors. Stable Diffusion is closer to reality, but the evidence is not quite strong enough to award it a live verdict. But it’s definitely nearly-live, and we’re excited about that.

Next week in Our World in AI: school teachers.


* We run Chi-Square tests for independence. The null hypothesis is that there is no relationship between gender and the data source. The alternative hypothesis is that there is a relationship between gender and the data source. We interpret the result as follows. If we reject the null hypothesis, there is a relationship between the data, and we can identify the origin. If we do not reject the null hypothesis, there is no relationship, and we cannot distinguish between sources. We evaluate at significance level α = 0.10.

  • DALL-E and real-world data: p = 0.021 (reject)
  • Stable Diffusion and real-world data: p = 0.113 (do not reject)
  • DALL-E and Stable Diffusion: p = 0.388 (do not reject)


Posted

in

,

by