AI and the future of work: Forecasts explained

A recent report by Goldman Sachs warns that AI could impact 300 million jobs, while a study by OpenAI predicts that half of your job may be affected. Both analyses use similar methods to arrive at their conclusions, and this article shows you what’s behind the headlines.

In what follows, I focus on the work by OpenAI, which has collaborators from OpenResearch and the University of Pennsylvania. Their paper titled ‘GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models’ finds that 80% of workers could see at least 1 in 10 tasks affected in the next few years. And for 19% of workers, that number could be as high as half.

The title ‘GPTs are GPTs’ seems cryptic, but it is actually quite clever. When expanded, it reads, ‘Generative Pre-trained Transformers are General-Purpose Technologies.’ Large Language Models (LLMs) like GPT-4 and Bard are examples of Generative Pre-trained Transformers. And general-purpose technologies are technologies that have the potential to revolutionise entire industries and even society as a whole. They are rare – think of the steam engine, electricity and the internet. So, ‘GPTs are GPTs’ is a neat way to say that LLMs are the next big thing.

They probably are, but knowing what will change and for whom is hard right now. Suppose 1 in 10 of your tasks is affected. In that case, LLMs might automate some repetitive activities, freeing time for more creative or strategic work. It could help you to improve productivity and efficiency, and you might enjoy a better work-life balance.

But what about the 19% who see an impact on half of their job? Who are they? Have their job opportunities suddenly halved? Or will they collaborate with LLMs, doing work that doesn’t exist today? 

Let’s see what’s behind the predictions – and who should start thinking about a side hustle.

The foundation

The analysis is based on two U.S. government datasets: the Occupational Employment Statistics dataset from the Bureau of Labor Statistics and the O*NET 27.2 database. The Occupational Employment Statistics dataset contains occupation-level information about wages, education requirements, and on-the-job training needs. The O*NET 27.2 database has 1,016 occupations associated with 2,087 Detailed Work Activities (DWAs) and 19,265 tasks. Table 1 gives an idea of what the data look like.

Sample of occupations, Detailed Work Activities (DWAs) and tasks as shown in the GPTs are GPTs paper. AI and the future of work: Forecasts explained
Fig 1: Sample of occupations as shown in the GPTs are GPTs paper

The critical test is whether an LLM would reduce the time it takes a human to do a task or DWA by at least half. If the answer is yes, the activity is directly exposed to AI; otherwise, it’s not. A third category also captures indirect exposure: the question is whether additional software built on top of the LLM could save the time.

A team of human annotators assessed each DWA and a subset of tasks. That’s why the threshold is set at ‘by at least half’. It’s somewhat arbitrary, but people can work with it. In the end, we get occupation-level exposure to GPTs by taking a weighted average of the scores.

The job was also completed by an AI in a separate exercise. An early version of GPT-4 rated all task-occupation pairs using a similar set of rules given as a prompt. Fig 2 shows that the decisions of human annotators and the AI were similar. The axes take values between 0 and 1; a higher number means more exposure.

Similarity between human and GPT-4 ratings as shown in the GPTs are GPTs paper.
Fig 2: Similarity between human and GPT-4 ratings from the GPTs are GPTs paper

There’s some divergence at the higher end. Human annotators rated highly-exposed jobs as more exposed than GPT-4, but there is good agreement overall. Unfortunately, however, that doesn’t mean the data are good.

The Occupational Employment dataset is limited in scope. It only captures data on documented employees in roles that the government knows about. This means it misses a portion of the economy, but there are other concerns for this analysis.

There are problems with the method. The annotators did not have in-depth knowledge about the DWAs and tasks, so their ratings were based on personal opinions rather than objective criteria. Take another look at Fig 1. How would you do if you were an annotator? Do you think they got your job right?

The same annotators also validated the AI output, so their biases may have been introduced into those data.

Then there is the task-based approach. It cannot capture the complexity of real-world jobs. For example, an LLM might write an email faster than a human, but would it manage an awkward relationship and strike the right tone? And if people are to edit a first draft by the machine, is that really more efficient than starting from scratch?

So, we should interpret the study’s conclusions with caution. Changing the ratings can radically change the results. And if the task-based approach doesn’t hold, the analysis is altogether void. With this in mind, let’s move on and investigate the source of the headline figures.

The headline figures

The authors built three models to predict the impact on the labour market. Model α is the lower bound and considers only direct exposure to LLMs. Model γ is the upper bound and considers direct exposure plus indirect exposure. And in the middle, model β considers direct exposure plus half the effect of indirect exposure.

We can think of the models as representing different horizons, even though the authors don’t explicitly state this. It takes time to build the additional software to realise the indirect benefits of GPTs. But as we do so, we can expect to see a growing impact on the workforce. Fig 3 shows the average exposure for the median occupation in each model.

Median occupation exposure with human and GPT-4 annotation from the GPTs are GPTs paper.
Fig 3: Median occupation exposure with human and GPT-4 annotation from the GPTs are GPTs paper

Put this way, LLMs will automate 14% of tasks in the short term (model α), 30% in the medium term (model β), and 50% in the long term (model γ). So, eventually, half of the tasks take less than half of the time.

But that’s only for the median occupation. In Fig 4, the authors translate all occupation-level data to workers and plot exposure for the U.S. economy.

Exposure of workers to automation from the GPTs are GPTs paper. AI and the future of work: Forecasts explained
Fig 4: Exposure of workers to automation from the GPTs are GPTs paper

And here, we find our headline figures, marked by two yellow circles. The one on the left show that 80% of workers (y-axis) have at least 10% exposed tasks (x-axis). The one on the right indicates that 19% have a minimum of 50% exposed tasks. And that’s according to the medium-term model β assessed by human annotators.

When using the short-term model with only direct exposure instead, the proportion of workers with at least 10% of their tasks exposed to LLMs nearly halves to 42%. And only 5% have half of their duties affected. But in the long term, with the full benefits of indirect exposure, that number grows to half of the workforce.

So, in the short term, less than half of the workforce can benefit from small LLM-driven efficiency improvements. In the longer term, the extent of disruption will depend on how quickly large organisations develop and adopt new software that leverages GPTs.

AI does not eliminate jobs outright, but it can change the nature of work. Some occupations are more vulnerable to GPTs than others. Let’s examine which fields are most likely to see an impact.

Who’s in trouble

Fig 5 lists the five most exposed occupations in each model, with those annotated by humans on the left and GPT-4 on the right.

Exposed occupations as shown in the GPTs are GPTs paper. Human-annotated on the left and GPT-4 annotated on the right.
Fig 5: Exposed occupations as shown in the GPTs are GPTs paper. Human-annotated on the left and GPT-4 annotated on the right

Humans and GPT-4 agree that mathematicians are most exposed, but otherwise, the two diverge. The human-annotated list on the left has more creative jobs. In contrast, the GPT-4 list on the right has more technical and analytical roles. Still, all of the occupations listed rely on language or programming skills.

In the past, new general-purpose technologies displaced blue-collar workers. Yet the occupations most affected by GPTs require a lot of education and training.

The authors used the O*NET database to group occupations into five “Job Zones” based on education requirements, experience, and on-the-job training needs. Job Zone 1 jobs require a high school diploma and less than three months of preparation, while Job Zone 5 jobs require a master’s degree or higher. Fig 6 shows that the level of exposure is highest for those holding Bachelor’s and Master’s degrees.

Job Zone exposure as show in the GPTs are GPTs paper. AI and the future of work: Forecasts explained
Fig 6: Job Zone exposure as show in the GPTs are GPTs paper

That’s not to say we should stop going to university. LLMs do not affect all skills equally, so let’s see which ones are future-proof.

Future-proof skills

Headlines report that science and critical thinking skills are safe, while programming and writing are vulnerable. Fig 7 shows the regression analysis that supports those statements.

OLS regression results of exposure measures on O*NET skills from the GPTs are GPTs paper. AI and the future of work: Forecasts explained
Fig 7: OLS regression results of exposure measures on O*NET skills from the GPTs are GPTs paper

LLMs cannot replicate the deep understanding of the natural world needed for research, nor can they evaluate information and make sound judgements. So, even in the long term, it’s unlikely that GPTs will replace science and critical thinking skills.

LLMs are also unlikely to change the need for reading comprehension, learning strategies, and monitoring skills. Those require a deep understanding of human cognition, and GPTs just don’t have that.

Writing and programming, on the other hand, are different. LLMs can already generate text and code, and this ability is expected to improve further. Jobs that rely on these skills are vulnerable. So are mathematics, speaking, and active listening, according to the analysis. But I’m not convinced that’s right.

Of course, the predictions are based on less-than-perfect data and the current state of LLMs, and we should expect to see changes as both improve.

In conclusion

Goldman Sachs and OpenAI used similar methods to predict that AI will significantly affect the labour market. Behind the headlines, we see that OpenAI’s projection is contingent on large organisations developing and adopting software to leverage the capabilities of GPTs.

Without additional software, less than half of the workforce can benefit from minor LLM-driven improvements. Still, in the last month, we’ve seen a few early adopters in the corporate world. Morgan Stanley is trialling ChatGPT for their financial advisors, and Goldman Sachs uses AI to extract data from documents and help engineers with code. It’s early days, but a shift has begun.

Jobs that rely on writing or programming skills are most likely to feel the effects of LLMs. This seems reasonable for technical writing, where LLMs can generate accurate content quickly and easily. But for creative writing, I’m not convinced LLMs can match the originality of humans. The same applies to programming, where I anticipate the nature of the work will change and lead to higher quality work at greater efficiency.

The analysis suggests that workers with Bachelor’s and Master’s degrees will be most at risk. At the same time, relatively unaffected skills include science, critical thinking, reading comprehension, learning strategies and monitoring. We develop these skills at universities – regardless of the subject studied. Skilled workers will have jobs, but their jobs may be different.

GPTs have the potential to change the nature of work significantly, and we must prepare for the future. OpenAI makes a valuable contribution with their paper. We need to understand the skills and type of workers we need in the future so governments can stimulate the development of relevant pathways. Researchers need to continue to improve the data and methodology. Ultimately, the foundations need to be solid to get the correct answers from the analysis.

And, finally, are GPTs really GPTs? General-purpose technologies must meet three criteria: improvement over time, pervasiveness throughout the economy, and the ability to spawn complementary innovations. It seems all three apply, and we are on the cusp of a new era.

For more new developments in AI, click here.


Posted

in

, , ,

by

Tags: