![]() Leveraging the availability of a healthy Singaporean adult cohort comprising three ethnicities (Chinese, Malay and Indian), two of which (Chinese and Malay) represent significant populations in Southeast Asia, we deeply characterized 109 gut metagenomes with state-of-the-art hybrid sequencing (short and long read) and Hi-C technologies ( Singapore Platinum Metagenomes Project – SPMP). As access to genome sequencing becomes democratized and gut metagenomes are explored in understudied populations such as those in Southeast Asia, the strategy and value for establishing population-specific MAG references remains an open question. In addition, recent advances in sequencing assays (e.g., Hi-C 23, read cloud 24), hybrid 25 and long-read metagenomic analysis 26 have sought to address the shortcomings of short-read metagenomics, and opened the possibility that long-read based MAGs can provide near-complete genomes rivaling isolate genomes in quality. While these studies have added an impressive collection of genomes to existing databases, it is unclear yet if they are representative of the genetic diversity seen in gut metagenomes around the world. The availability of a large number of short-read metagenomic datasets (e.g., >20,000 for human gut in public repositories) has spurred the generation of MAG reference collections based on short-read assembly 13, 20, 21, 22. In particular, existing studies suggest that there might be key population-specific differences in metagenomic associations with various diseases 17, 18, 19. Metagenome-wide association studies frequently rely on the completeness of reference genomes to correctly assign short reads to taxa, and link microbial genes and function to diseases 16. Human gut metagenomes represent an area of intense scientific interest due to their association with various cancers, metabolic, immunological and neurological disease conditions 14, 15. Despite this, existing databases only represent approximately 48,000 species with genome sequences, and the accuracy and completeness of short-read based MAGs is frequently lower than isolate-based references 2. Improvements in metagenomic assembly workflows 7, 8, 9, 10, 11 and computing resources have further enabled the assembly of these large datasets to construct metagenome-assembled genomes (MAGs) that serve to augment isolate-based reference genome databases 12, 13. Microbial communities in a wide-range of biospheres have been explored, including terrestrial 3, aquatic 4 and extreme environments 5, as well as plant, animal and human-associated microbiomes 6. The use of culture-free metagenomic techniques has therefore been key to unravel this ‘dark matter’ of genetic diversity on Earth. While estimates for microbial diversity on Earth vary widely, studies suggest that there are nearly a million prokaryotic species of which only around 20,000 have been cultured 1, 2. These results reveal significant uncharacterized gut microbial diversity in Southeast Asian populations and highlight the utility of hybrid metagenomic references for bioprospecting and disease-focused studies. Annotation of biosynthetic gene clusters (BGCs) uncovered more than 27,000 BGCs with a large fraction (36–88%) unrepresented in current databases, and with several unique clusters predicted to produce bacteriocins that could significantly alter microbiome community structure. Among the top 10 most abundant gut bacteria in our study, one of the species and >80% of strains were unrepresented in existing databases. Species-level clustering identified 70 (>10% of total) novel gut species out of 685, improved reference genomes for 363 species (53% of total), and discovered 3413 strains unique to these populations. Leveraging advances in hybrid assembly (using short and long reads) and Hi-C technologies in a cross-sectional survey, we deeply characterized 109 gut microbiomes from three ethnicities in Singapore to comprehensively reconstruct 4497 medium and high-quality metagenome assembled genomes, 1708 of which were missing in short-read only analysis and with >28× N50 improvement. Population-specific biases in genomic reference databases can further compound this problem. Despite extensive efforts to address it, the vastness of uncharacterized ‘dark matter’ microbial genetic diversity can impact short-read sequencing based metagenomic studies.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |