LearningStructurE | Machine Learning and Mass Spectrometry for Structural Elucidation of Novel Toxic Chemicals

Summary
Nearly half a million known chemicals have been deemed relevant for exposure studies and an even larger number of their transformation products are likely to co-occur in the environment. This mind-blowing number of possible chemical structures makes it impossible to in-silico generate all these structures, let alone synthesise and analytically confirm them, thereby limiting the discovery of novel chemicals. Today, the structural elucidation of chemicals detected with high resolution mass spectrometry relies on databases and machine learning models trained on the known chemical space. Both are fundamentally ill-suited for discovering novel chemical structures. As a result, only a few percent of the toxic activity of the environmental samples is explained by the currently known and monitored chemicals. It is crucial to access the novel chemical space to improve our understanding of the origin, fate, and impact of these chemicals.

The aim of LearningStructurE is to turn the discovery of novel chemical structures from serendipity to routine. As a steppingstone in this pursuit, I will combine the fundamental understanding of chromatography and high resolution mass spectrometry with machine learning to pinpoint novel toxic chemical structures based on their empirical analytical information. To significantly advance the predictive power of machine learning models for empirical analytical information, I will take advantage of the candidate structures as a sample specific training set for machine learning models. The improved predictive power will feed into in-silico structure generation, allowing to elucidate the structure directly from the empirical analytical information.

LearningStructurE will pave the way for exploration of the unknown chemical space detected from environmental samples, and thereby improve our understanding of the emissions, chemical processes transforming the emitted chemicals, and close the gap in measured and explained toxicity.
Results, demos, etc. Show all and search (0)
Unfold all
/
Fold all
More information & hyperlinks
Web resources: https://cordis.europa.eu/project/id/101124488
Start date: 01-01-2024
End date: 31-12-2028
Total budget - Public funding: 1 867 187,00 Euro - 1 867 187,00 Euro
Cordis data

Original description

Nearly half a million known chemicals have been deemed relevant for exposure studies and an even larger number of their transformation products are likely to co-occur in the environment. This mind-blowing number of possible chemical structures makes it impossible to in-silico generate all these structures, let alone synthesise and analytically confirm them, thereby limiting the discovery of novel chemicals. Today, the structural elucidation of chemicals detected with high resolution mass spectrometry relies on databases and machine learning models trained on the known chemical space. Both are fundamentally ill-suited for discovering novel chemical structures. As a result, only a few percent of the toxic activity of the environmental samples is explained by the currently known and monitored chemicals. It is crucial to access the novel chemical space to improve our understanding of the origin, fate, and impact of these chemicals.

The aim of LearningStructurE is to turn the discovery of novel chemical structures from serendipity to routine. As a steppingstone in this pursuit, I will combine the fundamental understanding of chromatography and high resolution mass spectrometry with machine learning to pinpoint novel toxic chemical structures based on their empirical analytical information. To significantly advance the predictive power of machine learning models for empirical analytical information, I will take advantage of the candidate structures as a sample specific training set for machine learning models. The improved predictive power will feed into in-silico structure generation, allowing to elucidate the structure directly from the empirical analytical information.

LearningStructurE will pave the way for exploration of the unknown chemical space detected from environmental samples, and thereby improve our understanding of the emissions, chemical processes transforming the emitted chemicals, and close the gap in measured and explained toxicity.

Status

SIGNED

Call topic

ERC-2023-COG

Update Date

12-03-2024
Images
No images available.
Geographical location(s)