1 - Debias language
Use language for social innovation.
1h20 hands on
Gender analysis of 2 million streets in France and social regressions
More than 93% of the boulevards in France and on data.gouv.fr bear the name of a man among the names of celebrities (Victor Hugo, De Gaulle, Leclerc, Foch, etc.). By comparison, 60% of the gardens bear the name of a woman, some of which correspond to the names of trees or flowers such as Rose, Magnolia or Capucine. Previous studies quantified the rate of representation of women, in the naming of public spaces, in different cities or sectors, at different dates, by different methods. We propose indicators derived from the National Address Database  to estimate the rate of representation of women in the naming of public spaces, locally and at the scale of towns, departments and regions in France (over 2 million routes in mainland France and overseas). We find a rate of feminization of roads and public spaces close to 12% in Paris  and in cities like Nantes and Montpellier. We analyze the correlations with certain professions, sectors, and compare our results with public Wikipedia pages. Our analysis suggests that the imagination associated with certain professions (doctor, professor, captain, colonel) and sectors in France (mathematics, Fields medal, computer science, Turing prize) induces a gender bias in our social representations and cultivates stereotypes, amplified by the artificial intelligence models LLaMA from Facebook AI Research (FAIR) Paris , leading to forms of social regression. We make our implementation freely available to follow the evolution of these biases over time.
Words influence our representations from an early age and shape our collective imagination. Highly symbolic, the naming of streets and public spaces is an opportunity to pay tribute to famous people, especially women. Since 2014, the proportion of Parisian streets bearing the name of a woman has doubled, reaching 12% - Paris, 2021 .
The National Address Base is one of the nine databases of the public reference data service . It is the only address database officially recognized by the administration and as such placed under the responsibility of the Prime Minister. Its construction is ensured in the first place by the municipalities. It is accessible in the form of files and API. The dataset represents more than $2$ million routes in France, of which 10% contain a gendered name from the list of names and gender of data.gouv.fr . Figure 1 shows locally Montpellier streets containing a gendered name. In the rest of this article, we propose a method to quantify gender biases on the National Address Database . We compare our results to public Wikipedia pages and discuss how mathematics, statistics, and the development of competitive, foundational, state-of-the-art artificial intelligences in Paris [4 ] can lead to social regressions in France.
We annotate the street names of the National Address Database  with a label (‘F’, ‘H’ or ‘other’), from the first names identified from the dataset of names and genders of data.gouv.fr . We preprocess the street names (lowercase letters, without numbers and punctuation) before iterating over each word constituting a street name to extract the first names and gender. For example “rue Sainte Anne” contains the first name “Anne”, we annotate it with the label “F”. The road “av. Paul Valéry” contains “Paul”, we annotate it with the label “H”. Otherwise, we return the label ‘other’. The vocabulary used and the Python code are freely available on Framagit under CC-BY license at https://framagit.org/MichelDeudon/nlp201-street-names-gender-analysis.
Note: All results are calculated in May 2023, they are subject to change over time.
Aggregated data by municipality in mainland France is illustrated in Figure 2. For each municipality, we calculate a label between 1 and 2, which corresponds to the ratio of streets bearing a female/male first name. We report in the Table in the Appendix this indicator calculated at the national level and for the 10 largest French cities, and compare our results with other indicators such as the proportion of streets containing the word “Sainte” versus “Saint”, or the proportion of female names among the most popular $k$ names. Our results highlight the disproportionality between denominations of public spaces and gender, globally and with local disparities.
Among the 50 most common first names in street names, 4 are feminine: Marie, Blanche, Jeanne and Anne. The distribution of first names is skewed towards a male representation of history and its heroes, as illustrated in Figure 3.
The representation of women varies from one type of track to another. We observe that the gardens are connoted with deities of Nature, with names of flowers and female first names, while the avenues and boulevards are predominantly masculine and connoted with warlords and tanks or weapons of war. The tunnels are exclusively male. We report in Table 1 the F/M representation rates associated with four professions.
|Profession||N streets in France||Label H||Label F||F/M Representativeness (en %)|
F/M representativeness of street names in France, for different professions in 2023 (last column). These values are significantly lower than the national average of 12%. With these gender biases and a naive statistical model, it is necessary to generate more than 25 names of doctors (Albert Tomey, Paul Pezet, Albert Schweitzer, Robert Koch…) to obtain a female name randomly, on average, and more than 60 names of teachers to obtain a female name. Examples of these biases can be obtained by writing “rue du docteur…” in a search engine, on Google Maps or by walking in French streets, curious, with your head raised.
From the National Address Database to Wikipedia
The names of public roads and spaces in France and on data.gouv.fr quantify centuries of gender stereotypes, such as word embeddings learned from Wikipedia . Facebook AI Research (FAIR) Paris published in 2023 statistical models trained on Wikipedia, called LLaMA , partly funded by the CIFRE system. These models were developed during the winter of 2022/23 by 13 men out of 14 authors, including 3 normaliens and 7 polytechnicians. The models, dangerous (sexist, racist, generator of false news) according to the authors, were leaked between February 24 and March 7, 2023, in a context of a health, social and ecological crisis .
To the question “Who are the 5 people you would like to meet?", the FAIR Paris LLaMAs answer 5 male personalities from the Western world: Albert Einstein, Leonardo da Vinci, Socrates, William Shakespeare and Abraham Lincoln . This is partly explained by Zipf’s law applied to Wikipedia  and the risks inherent in training AI models on Wikipedia and social media, known since 2009 , captured by the famous example The doctor is to the nurse what the man is to the woman . Biases come first from humans before coming from algorithms or datasets. Wikipedia is governed by a bureaucracy of pairs , an unrepresentative, non-inclusive population . These Human biases come from a lack of diversity, equity, inclusion  and the privatization of AI research  which reinforce social inequalities . This lack of diversity can lead algorithms to reproduce biases - Villani, 2018 .
In 2023, the same AI models are used as in 2017 , with more parameters and energy at a more expensive cost. In 5-6 years, the complexity of the models has been multiplied by 1000, from 65 million parameters to 65 billion, from 27 kWh per experiment  to 499 MWh . The same problems are there (sexism, racism, fake news) and get worse with the complexity of the models according to the principle of overfitting in statistics : We observe that toxicity increases with model size . The authors conclude, however, we plan to release larger models, trained on larger training corpora in the future. Can we really consider this an innovation [16, 17]? AI experts call for AI to Rescue AI, Trusted AI (Reminiscent of Trusted Men or v-mann ). Human biases, conflicts of interest and disinformation in the spheres of power  are probably more causes than correlations of the social regressions observed in France and in linguistics: the publication and leakage of LLaMA models precedes the explosion of misinformation and trivialization of violence since March 2023.
|French Turing award||0/2|
|French Fields medal||0/13|
|Deep Learning god fathers||0/3|
|FAIR Paris LLaMA authors||1/13|
|Personnalities LLaMA would like to meet||0/5|
Gender bias in the development of heavy artillery in AI (models with billions of parameters for billions of human beings).
Imaginations and ideologies
Social regressions and eugenics
If the term regression (in statistics), ubiquitous in artificial intelligence comes from the regression towards the mean of the British Francis Galton in 1886 , Sir Galton also coined the term eugenics, used for the first time in 1883 in the context of his studies on the transmission of hereditary characteristics such as the size of individuals, without taking into account the environment, the way of life. Galton’s eugenics was born from an error between correlation and causality, and proposed to produce a superior human race by artificial selections, leading in the 20th century to a policy of eradication of characters deemed to be handicapping, the establishment of forced sterilization programs, a tightening of the legal framework for marriage and immigration restriction measures.
AI, a colonial fantasy of white men, of the West?
In science fiction movies (2001 Space Odyssey, Her, etc.), AI is represented by a humanoid or a voice, robotic or feminine. Siri, Alexa, smartphone voice assistants actually have the names and voices of female assistants, while the Wikipedia page on the beauty of mathematics refers to 26 men, including Mandelbrot, Russel, Erdos, Beethoven, Dirac, Euler, Harris, Leibniz, Pythagoras, Gauss, Andrew Wiles, Robert Langl, Richard Borcherds, Alexandre Grothendieck, Claude Chevalley, Georges Théodule Guilbaud, Hermann Weyl, Bourbaki, Jean Dieudonné, Hermann Weyl, Plato, Aristotle, Galileo, Alain Badiou, Kepler, Watson, no women . The page mentions “deep” results, reminiscent of “deep” learning and the three 2019 Turing Awards / god fathers of Deep Learning. The page cites what “makes people hard”. The legend says there are no Nobel Prizes in Mathematics and Computer Science for a reason. Under/over-sampling names or Wikipedia pages can simply debias statistical language models and computational linguistics, to avoid social regressions without resorting to heavy artillery. We must be wary of experts, who are not direct victims of the weapons produced and whose conflicts of interest can cause certain truths to be omitted and divert attention. We hypothesize that some experts calling for more AI to rescue are in good faith, but this questions the place of trainings on the social and ecological impacts of digital technology in French grandes écoles. This also questions the place of statistical regressions and cognitive biases to counter the decline of scientific culture in our schools, within the state and in our public policies : from 2019 to today, the number of students in the final year with more than 6 hours of mathematics per week, went from 200,000, including 96,000 girls, to 100,000, including 33,000 girls. Finally, the name given to the Bronner commission, Enlightenment in the digital age  will perhaps suffice for some to make the link between the forms of modern slavery, the colonial ideology  and the disinformation that characterizes the sexist and racist AI models of FAIR Paris , partly funded by the Ministry of Higher Education and Research, through the system of CIFRE thesis.
Ecology, feminism and ecolinguistics
In the words of Christine Lagarde on March 7, 2023, if the three 2019 Turing Awards were godmothers, rather than godfathers of AI, maybe there would be fewer crises? Maryam Mirzakhani, mathematician, first and only woman to date Fields Medal (2014), was born in Tehran, Iran. The city of Montpellier pays tribute to her since 2020 (see Appendix B). Marie Curie, born in Warsaw, Poland, is the only person to have won a Nobel Prize in two separate disciplines, physics and chemistry. As Paul Valéry said, there are two possible visions of the world: the one that divides, the one that unites. Alain Damasio in an interview for BLAST in May 2023 explains how his vision of science fiction has evolved over time  and calls for new imaginaries - in solidarity, social, ecological - and frugal innovations .
Public roads and spaces in France, like certain public Wikipedia pages, present gender biases that stem from Humans. GAFAM’s AI models, trained on Wikipedia, like FAIR Paris’ state-of-the-art foundational LLaMAs, cultivate and amplify these stereotypes, with energy at a more expensive price as a competitive advantage while benefiting from public funding. In this context, the voices and representations of women in public spaces are more than symbolic. Among the directions of future research, it could be interesting to quantify and fight against other forms of discrimination (religion, color, sexual orientation, age, nationality, handicap, physical appearance, socio-economic status), or to analyze other countries and distributions. Authors, researchers, scientists, artists, offer alternatives, new imaginaries, stories and social representations to fight against patriarchy and surveillance capitalism. In these forms of modern resistance, libraries are to the maquis what books and culture are to weapons, a means to escape.
- Blast. Comment vivre et lutter face au capitalisme de surveillance? 05/2023.
- Assemblee Nationale. Contrer le recul de la culture scientifique a l’école, au sein de l’état et dans nos politiques publiques. 2eme seance de debat. 04/2023.
- Mediapart. Écrans et Santé : Il est urgent d’agir! 03/2023.
- Touvron, H., Lavril, T. et al. LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971. 02/2023.
- Data.gouv.fr. Base Adresse Nationale. 2022.
- Ville de Paris. Féminisons les noms des rues! 2021.
- Loose, F., Belghiti-Mahut, S., et Lafont A.L. ”L’informatique, c’est pas pour les filles!”: Impacts du stéréotype de genre sur celles qui choisissent des études dans ce secteur. 32ème Congrès de l’AGRH. 2021.
- Macron, E., Bronner, G. et al. Les Lumières a l’ère numérique. 2020.
- Pellerin, P. Les Lumières, l’esclavage et l’idéologie coloniale, XVIIIe-XXe siècles. Garnier, collection Rencontres XVIIIe siècle, Paris, 560 p. 2020.
- Grenard, F. La traque des Résistants. Éditions Tallandier. 2019.
- Strubell, E., Ganesh, A., & McCallum, A. Energy and Policy Considerations for Deep Learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019.
- Villani, C., Bonnet, Y. et al. Donner un sens a l’intelligence artificielle: pour une stratégie nationale et européenne. Conseil national du numérique. 2018.
- Garg, N., L. Schiebinger, L. et al. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences. 2018.
- Vaswani, Ashish, et al. Attention is all you need. Advances in neural information processing systems 30. 2017.
- O’Neil, C. Weapons of Math Destruction. 2016.
- Belghiti-Mahut, S, et al. Gender gap in innovation: a confused link? Journal of Innovation Economics & Management 1: 159-177. 2016.
- Belghiti-Mahut, S., et al. Genre et innovateur frugal: 4 cas de femmes innovatrices. Innovations 3: 69-93. 2016.
- Data.gouv.fr. Liste de prénoms et genres. 2014.
- Eckert, S., & Steiner, L. (Re) triggering backlash: Responses to news about Wikipedia’s gender gap. Journal of Communication Inquiry. 2013.
- Aaltonen, A., & Lanzara, G. F. Unpacking Wikipedia governance: the emergence of a bureaucracy of peers. In 3rd Latin American and European Meeting on Organization Studies (LAEMOS). 2010.
- Carstensen, T. Gender. Trouble in Web 2.0. Gender perspectives on social network sites, wikis and weblogs. International Journal of Gender, Science and Technology. 2009.
- Bellman, R. Curse of dimensionality. Adaptive control processes: a guided tour. Princeton, NJ 3.2. 1961.
- Zipf, G.K. Human Behaviour and the Principle of Least Effort: An Introduction to Human Ecology. AW. 1949.
- Galton, F. Regression towards mediocrity in hereditary stature. The Journal of the Anthropological Institute of Great Britain and Ireland, 15, 246-263. 1886.
Appendix A. Rate of feminization of roads and public spaces
Appendix B. Local heroines, in Occitania
|Tribute in Montpellier||Birth||Imaginary|
|Agnès Mac Laren||Edimbourg, 1837||First woman to graduate from medical school.|
|Albertine Sarrazin||Alger, 1937||French writer, died at 29 in Montpellier.|
|Anne Bragance||Casablanca, 1945||French writer, Alice-Louis-Barthou prize in 1978.|
|Anne Marie de Backer||Contres, 1908||Poetess and translator, died in Montpellier in 1987.|
|Catherine Booth||Mumford, 1829||Co-founded the Salvation Army.|
|Clara d’Anduza||Gard, 1200||Occitan-speaking Trobairitz.|
|Clara Haskil||Bucarest, 1895||Romanian and Swiss pianist.|
|Chantal Mauduit||Paris, 1964||Alpinist.|
|Clara Zetki||Wiederau, 1857||Teacher, journalist, figure of socialist feminism.|
|Dora Maar||Paris, 1907||Photographer and artist.|
|Elena Bonner||Mary, 1923||Pediatrician, human rights activist.|
|Elyse Deroche||Paris, 1882||Actress and aviator.|
|Frances de Cezelli||Montpellier, 1558||Heroine during the religious war.|
|Frida Kalho||Coyoacan, 1907||Mexican painter.|
|Gabriela Mistral||Vicuna, 1889||Chilean poetess.|
|Germaine Bousquet||Castres, 1920||Dean of Rieumes.|
|Helene de Savoie||Cetinje, 1873||Died in Montpellier in 1952.|
|Janine Teisson||Toulon, 1948||Novelist.|
|Jeanne Demessieux||Montpellier, 1921||Organist, pianist, improviser, teacher and composer.|
|Jeanne Dieulafoy||Toulouse, 1851||Archaeologist.|
|Jeanne Galzy||Montpellier, 1833||Associate professor, writer and prize femina-happy life.|
|Joelle Wintrebert||Toulon, 1949||Writer.|
|Judith Restnick||Akron, 1949||American astronaut.|
|Juliette Greco||Montpellier, 1927||Singer and actress.|
|Juliette Cauquil||Suc-et-Sentenac, 1914||Resistant.|
|Louise Guiraud||Montpellier, 1860||Historian.|
|Lucie Février Pascal||Hérault, 1911||Heroine, recognized as “Righteous Among the Nations”.|
|Madeleine Roch||Mureaux, 1883||French actress and tragedian.|
|Malika Mokeddem||Kénadsa, 1949||Writer.|
|Marcelle Huc||Montady, 1901||Teacher, trade union and political activist.|
|Maria Blanchard||Santander, 1881||Artist and painter.|
|Maria Casarès||La Corogne, 1922||Actress and tragedian.|
|Marie Agnès Péron||Calais, XX||Sailor disappeared at sea in 1991.|
|Marie Caizergues||XX, 1797||Benefactress.|
|Marie Reynès Montlaur||Montpellier, 1866||Writer, first woman at the Montpellier Academy.|
|Marie Sagnie||Saint-Pons-de-M, 1898||Mathematics and physics-chemistry teacher.|
|Marie Thérèse Barbé||Limoges, 1913||Writer.|
|Maryam Mirzakhani||Téhéran, 1977||Mathematician, professor and Fields Medalist.|
|Paulette Hauchard||Fécamp, 1932||President of the Secular Association.|
|Régine Detambe||Saint-Avold, 1963||Writer.|
|Rosa Luxemburg||Zamosc, 1871||German communist activist and revolutionary.|
|Ruth Bader Ginsburg||New York, 1933||American lawyer, jurist, scholar and judge.|
|Suzanne Ballivet||Paris, 1904||Painter and illustrator.|
|Suzanne Bernard||Troyes, 1893||Aviator.|
|Sylvie et Josephine Fabre||Grenoble, 1951||Writer and poetess, Louise-Labé Prize.|
|Yvonne le Roux||Toulon, 1882||Resistant.|
|Yvette Llere||Amélie-les-Bains, 1939||Writer.|
|Yvonne Molinier||Grand-Combe, 1924||Dedicated her life to the cause of children.|
Tributes to XX women in Montpellier, Occitanie.
See also the page Portraits of women from Montpellier.