A gene atlas for human pathogenic fungi.
In recent years there has been a dramatic increase in the number of fully sequenced and annotated unicellular fungi. Many of these fungi are human pathogenic fungi, and a major motivation for their sequencing was a need to ultimately have a reliable set of protein-coding genes, and their DNA and encoded protein sequences. Unfortunately, in a number of the medically important fungal species and their sequenced strains, the quality of the genes' protein-coding sequences, and the automated predictions of exactly where they are located on the chromosomal sequence or contig (coordinates of start, stop, and exon/intron boundaries) are not yet satisfactory, for a large percentage of the genes. Consistency checks are needed, which may then suggest re-annotation, and in some cases resequencing or imputing of unidentified/missing nucleotides in order to reveal the true gene structure. Two very recent advances make this an opportune moment to fill the remaining knowledge gap in a systematic way. The first is that in some parts of the fungal tree, there is already a high phylogenetic density of fully sequenced, annotated genomes, allowing alignment of genes and imputation with high confidence. The second is that affordable Illumina next-generation sequencing (NGS), delivering paired short reads, has now increased its read lengths to above 100 bp as well as the quality of the reads (i.e., ease of NGS-only assembly and accuracy) and the throughput one can expect at a reasonable cost. We propose to construct a sustainable, expandable, human-curatable and easily modifiable prototype database system to hold fungal gene sets (alignments of orthologous or homologous coding sequences) for the nuclear and mitochondrial protein-coding genes of species sequenced by our group and by other sequencing centers. For the prototype we will construct and test by ongoing use in this project, we propose to populate the database with the human pathogenic and other fungi of the Onygenales order, which includes important (primary) pathogens endemic in South and/or North America such as the dimorphic fungi Histoplasma capsulatum, Coccidioides spp., Paracoccidioides brasiliensis, and Blastomyces dermatitidis; as an 'outgroup' we will also include another clinically important group within the ascomycetes, such as the genus Aspergillus. The protein-coding gene sets of these taxa will be carefully corrected and curated, by imputing unidentified/missing nucleotides via alignment and re-annotation where this is possible, and by sequencing 10 previously sequenced or unsequenced strains where the initial results indicate a need for higher sequence quality, or for complementing with additional species/strains in regions of the fungal tree. We will also use this work to bring the gene sets and approximately 10,000 expected alignments to the level of a gene atlas: for the selected (and later for additional) human pathogenic fungi, we will have curated descriptions of the roles of the genes (orthology classes) and associated metadata on them. The design, platform and database structure of the system will be consciously chosen to allow, as far as possible, efficient ongoing human curation that can be sustained also after the end of the 3 year duration of the project.
Main Author: | |
---|---|
Other Authors: | |
Format: | Informe de investigación biblioteca |
Language: | spa |
Published: |
2020-10-23T13:26:32Z
|
Subjects: | Bioinformatics, Fungal pathogens, Genome informatics, Microbial genomics, Onygenales, |
Online Access: | https://colciencias.metadirectorio.org/handle/11146/39122 http://colciencias.metabiblioteca.com.co |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|