Summary
AI STORIES is premised on the hypothesis that narrative archetypes fundamentally structure the output of contemporary artificial intelligence (AI). Large language models (LLMs) like GTP-4 are trained on vast quantities of text and images and generate new texts that are statistically similar to the training data. The scientific consensus acknowledges that LLMs replicate and sometimes exacerbate historical biases in their training data.
AI STORIES proposes that LLMs are also affected by a deeper bias: that of the narrative structures in the social media posts, news stories, marketing blurbs and novels the models are trained on. If this is the case it will deeply impact how we use and apply AI, and how we think about bias and cultural diversity in AI models. Currently available LLMs are largely trained on English-language texts, with a heavy weighting towards the United States. When they generate texts in non-English languages they may succeed in producing grammatically correct texts, but if my hypothesis is correct, their deeper content will be fundamentally structured by the stories that dominate in the training data. This is a threat to cultural diversity that goes well beyond the purely linguistic.
AI STORIES applies the humanities’ deep knowledge of narrative to AI research by developing and testing this hypothesis. We will apply narratology to understand the narrative structures of LLM’s training data. We test the hypothesis by training LLMs on specific kinds of narratives, then using prompt engineering and both qualitative and computational narratological analysis to reverse engineer the structures of AI-generated output. Three comparative case studies will look specifically at Scandinavian, Australian and either Indian or Nigerian stories.
The overall objective is to develop a narratology of AI, and to leverage the findings to ensure that policymakers, developers, educators and other stakeholders can use our research to direct the future of AI.
AI STORIES proposes that LLMs are also affected by a deeper bias: that of the narrative structures in the social media posts, news stories, marketing blurbs and novels the models are trained on. If this is the case it will deeply impact how we use and apply AI, and how we think about bias and cultural diversity in AI models. Currently available LLMs are largely trained on English-language texts, with a heavy weighting towards the United States. When they generate texts in non-English languages they may succeed in producing grammatically correct texts, but if my hypothesis is correct, their deeper content will be fundamentally structured by the stories that dominate in the training data. This is a threat to cultural diversity that goes well beyond the purely linguistic.
AI STORIES applies the humanities’ deep knowledge of narrative to AI research by developing and testing this hypothesis. We will apply narratology to understand the narrative structures of LLM’s training data. We test the hypothesis by training LLMs on specific kinds of narratives, then using prompt engineering and both qualitative and computational narratological analysis to reverse engineer the structures of AI-generated output. Three comparative case studies will look specifically at Scandinavian, Australian and either Indian or Nigerian stories.
The overall objective is to develop a narratology of AI, and to leverage the findings to ensure that policymakers, developers, educators and other stakeholders can use our research to direct the future of AI.
Unfold all
/
Fold all
More information & hyperlinks
Web resources: | https://cordis.europa.eu/project/id/101142306 |
Start date: | 01-08-2024 |
End date: | 31-07-2029 |
Total budget - Public funding: | 2 500 000,00 Euro - 2 500 000,00 Euro |
Cordis data
Original description
AI STORIES is premised on the hypothesis that narrative archetypes fundamentally structure the output of contemporary artificial intelligence (AI). Large language models (LLMs) like GTP-4 are trained on vast quantities of text and images and generate new texts that are statistically similar to the training data. The scientific consensus acknowledges that LLMs replicate and sometimes exacerbate historical biases in their training data.AI STORIES proposes that LLMs are also affected by a deeper bias: that of the narrative structures in the social media posts, news stories, marketing blurbs and novels the models are trained on. If this is the case it will deeply impact how we use and apply AI, and how we think about bias and cultural diversity in AI models. Currently available LLMs are largely trained on English-language texts, with a heavy weighting towards the United States. When they generate texts in non-English languages they may succeed in producing grammatically correct texts, but if my hypothesis is correct, their deeper content will be fundamentally structured by the stories that dominate in the training data. This is a threat to cultural diversity that goes well beyond the purely linguistic.
AI STORIES applies the humanities’ deep knowledge of narrative to AI research by developing and testing this hypothesis. We will apply narratology to understand the narrative structures of LLM’s training data. We test the hypothesis by training LLMs on specific kinds of narratives, then using prompt engineering and both qualitative and computational narratological analysis to reverse engineer the structures of AI-generated output. Three comparative case studies will look specifically at Scandinavian, Australian and either Indian or Nigerian stories.
The overall objective is to develop a narratology of AI, and to leverage the findings to ensure that policymakers, developers, educators and other stakeholders can use our research to direct the future of AI.
Status
SIGNEDCall topic
ERC-2023-ADGUpdate Date
22-11-2024
Images
No images available.
Geographical location(s)