Energy and Policy Considerations

Spring 2019. Emma Strubell and MIT’s alert.

Energy and Policy Considerations for Deep Learning in NLP

🚨 On June 5, 2019, Emma Strubell, PhD candidate and lead author of the paper Energy and Policy Considerations for Deep Learning in NLP gave the alert to the scientific and political community of the terrible ecological impact of deep learning models in linguistics at that time. Her article appeared in ACL2019 🇮🇹, the largest linguistics conference. The Massachusetts Institute of Technology shared her alert on energy consumption and carbon emissions of models with millions of parameters like Transformers (Google, 2017) and BERT (Google, 2018) in its technology review of June 6, 2019. At that time, large models had up to 65 million parameters (1/1000 LLaMA, 2023).

The day following Emma Strubell and the MIT’s alert, BERT received the best long paper award at NAACL19 🇺🇸.

Training a single AI model can emit as much carbon as…

June 6, 2019. MIT Technology Review by Karen Hao.

MIT Technology Review, june 2019.
MIT Technology Review, june 2019.

The artificial-intelligence industry is often compared to the oil industry: once mined and refined, data, like oil, can be a highly lucrative commodity. Now it seems the metaphor may extend even further. Like its fossil-fuel counterpart, the process of deep learning has an outsize environmental impact.

In a new paper, researchers at the University of Massachusetts, Amherst, performed a life cycle assessment for training several common large AI models. They found that the process can emit more than 626,000 pounds of carbon dioxide equivalent—nearly five times the lifetime emissions of the average American car (and that includes manufacture of the car itself). What’s more, the researchers note that the figures should only be considered as baselines. “Training a single model is the minimum amount of work you can do," says Emma Strubell, a PhD candidate at the University of Massachusetts, Amherst, and the lead author of the paper. In practice, it’s much more likely that AI researchers would develop a new model from scratch or adapt an existing model to a new data set, either of which can require many more rounds of training and tuning.

The carbon footprint of natural-language processing

The paper specifically examines the model training process for natural-language processing (NLP), the subfield of AI that focuses on teaching machines to handle human language. In the last two years, the NLP community has reached several noteworthy performance milestones in machine translation, sentence completion, and other standard benchmarking tasks. OpenAI’s infamous GPT-2 model, as one example, excelled at writing convincing fake news articles. But such advances have required training ever larger models on sprawling data sets of sentences scraped from the internet. The approach is computationally expensive—and highly energy intensive.

The researchers looked at four models in the field that have been responsible for the biggest leaps in performance: the Transformer, ELMo, BERT, and GPT-2. They trained each on a single GPU for up to a day to measure its power draw. They then used the number of training hours listed in the model’s original papers to calculate the total energy consumed over the complete training process. That number was converted into pounds of carbon dioxide equivalent based on the average energy mix in the US, which closely matches the energy mix used by Amazon’s AWS, the largest cloud services provider.

February 2023. Macron decorates Bezos in secret. Le Point.

The estimated costs of training a model once

In practice, models are usually trained many times during research and development. They found that the computational and environmental costs of training grew proportionally to model size and then exploded when additional tuning steps were used to increase the model’s final accuracy.

Strubell and her colleagues used a model they’d produced in a previous paper as a case study. They found that the process of building and testing a final paper-worthy model required training 4,789 models over a six-month period. Converted to CO2 equivalent, it emitted more than 78,000 pounds and is likely representative of typical work in the field.

The significance of those figures is colossal—especially when considering the current trends in AI research. “This kind of analysis needed to be done to raise awareness about the resources being spent […] and will spark a debate.”, says Gómez-Rodríguez. “What probably many of us did not comprehend is the scale of it until we saw these comparisons,” echoed Siva Reddy, a postdoc at Stanford University who was not involved in the research.

The privatization of AI research

The results underscore another growing problem in AI, too: the sheer intensity of resources now required to produce paper-worthy results has made it increasingly challenging for people working in academia to continue contributing to research. “This trend toward training huge models on tons of data is not feasible for academics — grad students especially, because we don’t have the computational resources,” says Strubell. “So there’s an issue of equitable access between researchers in academia versus researchers in industry.”

Macron's government privatized AI research. Zuckerberg-Macron buzz at VivaTech. [Les echos](, May 2018. Zuckerberg-Macron met (again) at the Elysée. Five days before the second edition of 'Tech for good'. [Huffington post](, May 2019.
Macron’s government privatized AI research. Zuckerberg-Macron buzz at VivaTech. Les echos, May 2018. Zuckerberg-Macron met (again) at the Elysée. Five days before the second edition of ‘Tech for good’. Huffington post, May 2019.
April 2021. The digital bluff is all the impacts that we do not see, with Laurie Marrauld from the Shift Project and Cédric Villani. Libération.
@[Yann Lecun]( reply to the French mafia of AI. February 2023.
@Yann Lecun reply to the French mafia of AI. February 2023.
The training of our models have consumed a massive quantity of energy, responsible for the emission of carbon dioxide (section 6 Carbon footprint). We plan to release larger models trained on larger pretraining corpora in the future (Conclusion). Arxiv.
Carbon footprint of training a Meta LLaMA once. [Arxiv](
Carbon footprint of training a Meta LLaMA once. Arxiv.