ScReeningData | Scalable Learning for Reproducibility in High-Dimensional Biomedical Signal Processing: A Robust Data Science Framework

Summary
Data science has quickly expanded the boundaries of signal processing and statistical learning beyond their accustomed domains. Powerful and complex machine learning architectures have evolved to distinguish relevant information from randomness, artifacts and irrelevant data. However, existing learning frameworks lack computationally scalable, tractable, and robust methods for high-dimensional data. Consequently, discoveries, for example, in genomic data can be the result of coincidental findings that happen to reach statistical significance. As long as groundbreaking advances in biotechnology are not accompanied by appropriate learning frameworks, valuable efforts are spent on researching false positives. ScReeningData develops a coherent fast and scalable learning framework that jointly addresses the fundamental challenges of drastically reducing computational complexity, providing statistical and robustness guarantees, and quantifying reproducibility in large-scale and high-dimensional settings. An unprecedented approach is developed that builds upon very recent work of the PI. The underlying concept is to repeat randomized controlled experiments that use computer-generated fake variables as negative controls to trigger an early stopping of the learning algorithms, thereby mitigating the so-called curse of dimensionality. In contrast to existing methods, the proposed methods are completely tractable and scalable to ultra-high dimensions. The gains of developing advanced robust learning methods that are computed ultra-fast and with tight guarantees on the targeted rate of false positives are enormous. They lead to new reproducible discoveries that can be made with high statistical power. Due to the fundamental nature and the broad applicability of the proposed learning methods, the impacts of this project extend far beyond the considered biomedical signal processing use-cases, benefitting all scientific domains that analyze high-dimensional data.
Unfold all
/
Fold all
More information & hyperlinks
Web resources: https://cordis.europa.eu/project/id/101042407
Start date: 01-09-2022
End date: 31-08-2027
Total budget - Public funding: 1 500 000,00 Euro - 1 500 000,00 Euro
Cordis data

Original description

Data science has quickly expanded the boundaries of signal processing and statistical learning beyond their accustomed domains. Powerful and complex machine learning architectures have evolved to distinguish relevant information from randomness, artifacts and irrelevant data. However, existing learning frameworks lack computationally scalable, tractable, and robust methods for high-dimensional data. Consequently, discoveries, for example, in genomic data can be the result of coincidental findings that happen to reach statistical significance. As long as groundbreaking advances in biotechnology are not accompanied by appropriate learning frameworks, valuable efforts are spent on researching false positives. ScReeningData develops a coherent fast and scalable learning framework that jointly addresses the fundamental challenges of drastically reducing computational complexity, providing statistical and robustness guarantees, and quantifying reproducibility in large-scale and high-dimensional settings. An unprecedented approach is developed that builds upon very recent work of the PI. The underlying concept is to repeat randomized controlled experiments that use computer-generated fake variables as negative controls to trigger an early stopping of the learning algorithms, thereby mitigating the so-called curse of dimensionality. In contrast to existing methods, the proposed methods are completely tractable and scalable to ultra-high dimensions. The gains of developing advanced robust learning methods that are computed ultra-fast and with tight guarantees on the targeted rate of false positives are enormous. They lead to new reproducible discoveries that can be made with high statistical power. Due to the fundamental nature and the broad applicability of the proposed learning methods, the impacts of this project extend far beyond the considered biomedical signal processing use-cases, benefitting all scientific domains that analyze high-dimensional data.

Status

SIGNED

Call topic

ERC-2021-STG

Update Date

09-02-2023
Images
No images available.
Geographical location(s)
Structured mapping
Unfold all
/
Fold all
Horizon Europe
HORIZON.1 Excellent Science
HORIZON.1.1 European Research Council (ERC)
HORIZON.1.1.0 Cross-cutting call topics
ERC-2021-STG ERC STARTING GRANTS
HORIZON.1.1.1 Frontier science
ERC-2021-STG ERC STARTING GRANTS