Summary
Metagenome-assembled genomes (MAGs) obtained from metagenomics are of fundamental value to understanding diverse ecological niches of microbes such as the human gut, with applications in medicine, biotechnology, and climate science. However, the quality of MAGs constructed with state-of-the-art tools is often unsatisfactory and worse than the self-reported quality. The main source of error is binning, a computational step that groups sequences assembled from short sequencing reads (contigs) into species-wise bins. The two chief challenges are accurately binning (1) genomes with low abundance and (2) highly conserved regions. Due to cross-mapping of reads, the contigs from conserved regions appear to have abundances equal to the sum of the abundances of the related species or strains. As conventional binning tools all rely on clustering contigs according to their abundances across samples, conserved regions end up forming separate bins. Besides, most existing methods optimise quality measures (purity and completeness based on conserved marker genes) and assess the final quality on these very measures, leading to highly optimistic results. I aim to solve these problems by developing a binning algorithm that applies
i) linear mixture models using non-negative matrix factorization to account for cross-mapping,
ii) Poisson statistics to accurately model low abundance, and
iii) Bayesian statistics-based multinomial clustering to calculate bin numbers. Importantly, it does not require marker gene-based quality measures for binning.
By improving the binning of low-abundance and highly conserved contigs, this approach should yield more high-quality MAGs, thereby enhancing a multitude of downstream metagenomic analyses for all areas of microbiome research.
i) linear mixture models using non-negative matrix factorization to account for cross-mapping,
ii) Poisson statistics to accurately model low abundance, and
iii) Bayesian statistics-based multinomial clustering to calculate bin numbers. Importantly, it does not require marker gene-based quality measures for binning.
By improving the binning of low-abundance and highly conserved contigs, this approach should yield more high-quality MAGs, thereby enhancing a multitude of downstream metagenomic analyses for all areas of microbiome research.
Unfold all
/
Fold all
More information & hyperlinks
Web resources: | https://cordis.europa.eu/project/id/101111457 |
Start date: | 01-08-2023 |
End date: | 31-07-2025 |
Total budget - Public funding: | - 189 687,00 Euro |
Cordis data
Original description
Metagenome-assembled genomes (MAGs) obtained from metagenomics are of fundamental value to understanding diverse ecological niches of microbes such as the human gut, with applications in medicine, biotechnology, and climate science. However, the quality of MAGs constructed with state-of-the-art tools is often unsatisfactory and worse than the self-reported quality. The main source of error is binning, a computational step that groups sequences assembled from short sequencing reads (contigs) into species-wise bins. The two chief challenges are accurately binning (1) genomes with low abundance and (2) highly conserved regions. Due to cross-mapping of reads, the contigs from conserved regions appear to have abundances equal to the sum of the abundances of the related species or strains. As conventional binning tools all rely on clustering contigs according to their abundances across samples, conserved regions end up forming separate bins. Besides, most existing methods optimise quality measures (purity and completeness based on conserved marker genes) and assess the final quality on these very measures, leading to highly optimistic results. I aim to solve these problems by developing a binning algorithm that appliesi) linear mixture models using non-negative matrix factorization to account for cross-mapping,
ii) Poisson statistics to accurately model low abundance, and
iii) Bayesian statistics-based multinomial clustering to calculate bin numbers. Importantly, it does not require marker gene-based quality measures for binning.
By improving the binning of low-abundance and highly conserved contigs, this approach should yield more high-quality MAGs, thereby enhancing a multitude of downstream metagenomic analyses for all areas of microbiome research.
Status
SIGNEDCall topic
HORIZON-MSCA-2022-PF-01-01Update Date
31-07-2023
Images
No images available.
Geographical location(s)