Skye | A programming language bridging theory and practice for scientific data curation

Summary
Science is increasingly data-driven. Scientific research funders now routinely mandate open publication of publicly-funded research data. Safely reusing such data currently requires labour-intensive curation. Provenance recording the history and derivation of the data is critical to reaping the benefits and avoiding the pitfalls of data sharing. There are hundreds of curated scientific databases in biomedicine that need fine-grained provenance; one important example is GtoPdb, a pharmacological database developed by colleagues in Edinburgh.

Currently there are no reusable methodologies or practical tools that support provenance for curated databases, forcing each project to start from scratch. Research on provenance for scientific databases is still at an early stage, and prototypes have so far proven challenging to deploy or evaluate in the field. Also, most techniques to date focus on provenance within a single database, but this is only part of the problem: real solutions will have to integrate database provenance with the multiple tiers of web applications, and no-one has begun to address this challenge.

I propose research on how to build support for curation into the programming language itself, building on my recent research on the Links Web programming language and on data curation. Links is a strongly-typed language that provides state-of-the-art support for language-integrated query and Web programming. I propose to build on Links and other recent language designs for heterogeneous meta-programming to develop a new language, called Skye, that can express modular, reusable curation and provenance techniques. To keep focus on the real needs of scientific databases, Skye will be evaluated in the context of GtoPdb and other scientific database projects. Bridging the gap between curation research and the practices of scientific database curators will catalyse a virtuous cycle that will increase the pace of breakthrough results from data-driven science.
Unfold all
/
Fold all
More information & hyperlinks
Web resources: https://cordis.europa.eu/project/id/682315
Start date: 01-09-2016
End date: 28-02-2023
Total budget - Public funding: 1 995 181,00 Euro - 1 995 181,00 Euro
Cordis data

Original description

Science is increasingly data-driven. Scientific research funders now routinely mandate open publication of publicly-funded research data. Safely reusing such data currently requires labour-intensive curation. Provenance recording the history and derivation of the data is critical to reaping the benefits and avoiding the pitfalls of data sharing. There are hundreds of curated scientific databases in biomedicine that need fine-grained provenance; one important example is GtoPdb, a pharmacological database developed by colleagues in Edinburgh.

Currently there are no reusable methodologies or practical tools that support provenance for curated databases, forcing each project to start from scratch. Research on provenance for scientific databases is still at an early stage, and prototypes have so far proven challenging to deploy or evaluate in the field. Also, most techniques to date focus on provenance within a single database, but this is only part of the problem: real solutions will have to integrate database provenance with the multiple tiers of web applications, and no-one has begun to address this challenge.

I propose research on how to build support for curation into the programming language itself, building on my recent research on the Links Web programming language and on data curation. Links is a strongly-typed language that provides state-of-the-art support for language-integrated query and Web programming. I propose to build on Links and other recent language designs for heterogeneous meta-programming to develop a new language, called Skye, that can express modular, reusable curation and provenance techniques. To keep focus on the real needs of scientific databases, Skye will be evaluated in the context of GtoPdb and other scientific database projects. Bridging the gap between curation research and the practices of scientific database curators will catalyse a virtuous cycle that will increase the pace of breakthrough results from data-driven science.

Status

CLOSED

Call topic

ERC-CoG-2015

Update Date

27-04-2024
Images
No images available.
Geographical location(s)
Structured mapping
Unfold all
/
Fold all
Horizon 2020
H2020-EU.1. EXCELLENT SCIENCE
H2020-EU.1.1. EXCELLENT SCIENCE - European Research Council (ERC)
ERC-2015
ERC-2015-CoG
ERC-CoG-2015 ERC Consolidator Grant