GraphInt | Principles of Graph Data Integration

Summary
The present proposal tackles fundamental problems in data management, leveraging expressive, large-scale and heterogeneous graph structures in order to integrate both unstructured (e.g., text) and structured (e.g., relational) content. Integrating heterogeneous content has become a key hurdle in the deployment of Big Data applications, due to the meteoric rise of both machine and user-generated data storing information in a variety of formats. Traditional integration techniques cleaning up, fusing and then mapping heterogeneous data onto rigid abstractions fall short of accurately capturing the complexity and wild heterogeneity of today’s information. Having closely followed the emergence of heterogeneous information sources online, I am convinced that only an interdisciplinary approach drawing both from classical data management and from large-scale Web information processing techniques can solve the formidable data integration challenges that they pose. The following project proposes an ambitious overhaul of information integration techniques embracing the scale and heterogeneity of today’s data. I propose the use of expressive and heterogeneous graphs of entities to continuously and dynamically interrelate disparate pieces of content while capturing their idiosyncrasies. The following project focuses on three core issues related to large-scale and heterogeneous information graphs: i) the effective extraction of fined-grained information from unstructured sources and their proper integration into large-scale heterogeneous and probabilistic graphs, ii) the creation of novel physical storage structures and primitives to durably and efficiently manage the profusion of data considered by such graphs using clusters of commodity machines, and iii) the development of logical data abstraction mechanisms facilitating the effective and efficient resolution of complex analytic and data integration queries on top of the physical layer.
Unfold all
/
Fold all
More information & hyperlinks
Web resources: https://cordis.europa.eu/project/id/683253
Start date: 01-08-2016
End date: 31-07-2021
Total budget - Public funding: 1 998 339,00 Euro - 1 998 339,00 Euro
Cordis data

Original description

The present proposal tackles fundamental problems in data management, leveraging expressive, large-scale and heterogeneous graph structures in order to integrate both unstructured (e.g., text) and structured (e.g., relational) content. Integrating heterogeneous content has become a key hurdle in the deployment of Big Data applications, due to the meteoric rise of both machine and user-generated data storing information in a variety of formats. Traditional integration techniques cleaning up, fusing and then mapping heterogeneous data onto rigid abstractions fall short of accurately capturing the complexity and wild heterogeneity of today’s information. Having closely followed the emergence of heterogeneous information sources online, I am convinced that only an interdisciplinary approach drawing both from classical data management and from large-scale Web information processing techniques can solve the formidable data integration challenges that they pose. The following project proposes an ambitious overhaul of information integration techniques embracing the scale and heterogeneity of today’s data. I propose the use of expressive and heterogeneous graphs of entities to continuously and dynamically interrelate disparate pieces of content while capturing their idiosyncrasies. The following project focuses on three core issues related to large-scale and heterogeneous information graphs: i) the effective extraction of fined-grained information from unstructured sources and their proper integration into large-scale heterogeneous and probabilistic graphs, ii) the creation of novel physical storage structures and primitives to durably and efficiently manage the profusion of data considered by such graphs using clusters of commodity machines, and iii) the development of logical data abstraction mechanisms facilitating the effective and efficient resolution of complex analytic and data integration queries on top of the physical layer.

Status

CLOSED

Call topic

ERC-CoG-2015

Update Date

27-04-2024
Images
No images available.
Geographical location(s)
Structured mapping
Unfold all
/
Fold all
Horizon 2020
H2020-EU.1. EXCELLENT SCIENCE
H2020-EU.1.1. EXCELLENT SCIENCE - European Research Council (ERC)
ERC-2015
ERC-2015-CoG
ERC-CoG-2015 ERC Consolidator Grant