Introduction
Motivate language studies.
1h20 introductory course
Why study languages?
Kate Jefferey is a professor in neuroscience at UCL, London, and scientific lead at Exctinction Rebellion. In her inspiring talk on the psychology of climate inaction, language, as a way to communicate with each other and collaborate, has a key role to play in the way we understand the past, envision the future, and deal with the present. With language we have gone further than any species on Earth, we went on the Moon. What if we all started to learn a language with empathy and used language to fix some of our biggest problems ?
Language has a fundamental role in understanding the hidden forces that shape our decisions. We need to embrace language and our irrationality to imaginate and co-create a better tomorrow.
Computational linguistics is an interdisciplinary field that deals with languages, psychology, social sciences, statistics, computer science, artificial intelligence and more. It has gained in popularity in the last decade with the release of open source datasets, libraries, courses, etc. Models have increased in accuracy and on other metrics, on different benchmarks (e.g., translation). However, this increase in performance comes with a drastic increase in complexity and ressources required (data, hardware, energy). A new paradigm in AI and computational lingustics is needed.
Why study frugal innovation?
All models are wrong, some are useful
Models in data science have drastically increased in complexity in the last 10 years, at the advantage of cloud providers like Google, Microsoft and Amazon 🌥️. First in computer vision in the 2012s, then gradually in linguistics since 2014 with word vectors, document embeddings and attention models.
BERT, RoBERT, CamemBERT 🧀 are models with a quadratric complexity. AI conferences, like NeurIPS, are dominated by players that run these models as a service. Why solve a problem in 5 minutes when you can charge more for hours? This conflict of interest may sound silly but that’s how the field became toxic. 🤢
Training a single AI model can emit as much carbon as five cars in their lifetimes (…) The most costly model, BERT, has a carbon footprint of roughly 1,400 pounds of carbon dioxide equivalent, close to a round-trip trans-America flight for one person. Technology Review, 2019.
December 2022, Elon Musk released a model with 175 million parameters, 60% more than BERT. It’s an ego-thing, who has the biggest neural network. BERT was a bazooka. OpenAI released a tank.
Decision making systems lack diversity. There is no such things as a universal language model trained on English by French engineers. Conflicts of interest set us further apart from our common goals like building an inclusive society or low carbon economy.
In addition, the way AI is done at Google, Facebook, Microsoft or Amazon, is not appropriate for many entrepreneurs or researchers, working on new problems with little to no data. Cheap labour and moderators used to supervise machine learning models is unethical.
We are at a crossroad in the way AI, NLP and computational linguistics are taught. While big players will continue building more complex models, we will focus on building simple, intelligible, useful models first and attempt to democratize the access to computational linguistics to empower creators, entrepreneurs and researchers. We will lay the scientific foundations for computational linguistics, and will not explore Artificial General Intelligence or Large Language Models. By reversing the trend set by big players, frugal innovation can get us closer to build an inclusive society and low carbon economy 🦓. This course on frugal innovation and computational linguistics is an open source, interdisciplinary alternative for people interested in addressing societal and environmental challenges with language learning and development practitioners. The course will explore different use cases and tested models to empower creators through illustrated examples.
Applications
Here are just a few ideas how you can apply what you will learn in this course
- Help students learn languages with gamefied applications like Duolingo.
- Support NGO’s defending Human rights by quantifying and monitoring diversity & inclusion indicators.
- Recommend similar articles or different point of views, for example in healthcare or jurisprudence.
- Generate vegetarian recipes in season, music and art.
- Counter fight fake news and hate speech.
Quiz
How many languages are spoken in the world today?
More than 7000 languages are spoken today, but just 23 languages account for more than half the world’s population. Data science, NLP and AI research is majoritarily done in English, introducing in a bias in the way we approach computational linguistics.
True or false, BERT has a carbon footprint close to a round-trip trans-America flight for one person?
True, according to Technology Review, 2019.
Reference
Emma Strubell, Ganesh Ananya and Andrew McCallum. Energy and policy considerations for deep learning in NLP. arXiv preprint arXiv:1906.02243 (2019). Published in the 57th Annual Meeting of the Association for Computational Linguistics (ACL). Florence, Italy. July 2019.
Karen Hao. Training a single AI model can emit as much carbon as five cars in their lifetimes. MIT Technology Review. June 6, 2019.