TRIFECTA Capturing Identity, Change, and the Long Tail in Knowledge Graphs

Summary

At first blush entities and concepts such as “Dutch East India Company” or “coffee” may seem straightforward, but in fact they are complex and multifaceted. The wealth of digital sources presents the massive potential to study these notions at an unprecedented scale. However, current technologies for distant reading are not capable of dealing with this.
TRIFECTA aims to create a database that describes complex entities and concepts and their contexts by combining language and semantic web technology to extract and relate information from different texts over time. In addition, a key aim of TRIFECTA is to advance the state of the art in these technologies to deal with change over time and connections to many different narratives. Sophisticated knowledge representation methods from the semantic web can mitigate the failing that many language technology methods do not incorporate enough background knowledge to recognise and interpret complex entities and concepts in their historical contexts. By treating them as rich networks (or graphs) of knowledge that can express change and relationships to different concepts in space and time, semantic databases can handle the complexity needed to make the outputs of language technology tools suited to humanities research.
Via two use cases, I identify a set of core contentious entities and concepts in maritime and food history. Next, through a data-driven, iterative approach, I advance beyond the state-of-the-art in natural language technology for the humanities by targeting three key aspects of the recognition and modelling of complex concepts (i.e. identity, change, and the long tail). I propose a novel peer-evaluation approach in which a team of humanities scholars, computational linguists, and semantic web researchers collaborate closely to create truly hybrid artificial intelligence systems that will enable humanities research to scale to big data without losing sight of the contextual complexity.

Resources

Show all and search (2)

Unfold all

Fold all

More information & hyperlinks

Web resources:	https://cordis.europa.eu/project/id/101088548
Start date:	01-11-2023
End date:	31-10-2028
Total budget - Public funding:	1 998 351,00 Euro - 1 998 351,00 Euro

Cordis data

Original description

At first blush entities and concepts such as Dutch East India Company or coffee may seem straightforward, but in fact they are complex and multifaceted. The wealth of digital sources presents the massive potential to study these notions at an unprecedented scale. However, current technologies for distant reading are not capable of dealing with this.
TRIFECTA aims to create a database that describes complex entities and concepts and their contexts by combining language and semantic web technology to extract and relate information from different texts over time. In addition, a key aim of TRIFECTA is to advance the state of the art in these technologies to deal with change over time and connections to many different narratives. Sophisticated knowledge representation methods from the semantic web can mitigate the failing that many language technology methods do not incorporate enough background knowledge to recognise and interpret complex entities and concepts in their historical contexts. By treating them as rich networks (or graphs) of knowledge that can express change and relationships to different concepts in space and time, semantic databases can handle the complexity needed to make the outputs of language technology tools suited to humanities research.
Via two use cases, I identify a set of core contentious entities and concepts in maritime and food history. Next, through a data-driven, iterative approach, I advance beyond the state-of-the-art in natural language technology for the humanities by targeting three key aspects of the recognition and modelling of complex concepts (i.e. identity, change, and the long tail). I propose a novel peer-evaluation approach in which a team of humanities scholars, computational linguists, and semantic web researchers collaborate closely to create truly hybrid artificial intelligence systems that will enable humanities research to scale to big data without losing sight of the contextual complexity.