GnpAnnot community annotation system : Features, qualifiers, values

In January 2009, 991 complete genomes have been already published and 3376 genome sequencing projects are ongoing, leading to an explosion of data that needs to be stored, curated and analyzed. GnpAnnot is a project on green genomics which intends to develop a system of structural and functional annotation supported by comparative genomics and dedicated to plant and bio-aggressor genomes allowing both automatic predictions and manual curations of genomic objects. The core of GnpAnnot is a community annotation system (CAS) based on GMOD components: Chado / GBrowse / Apollo / Artemis. The system should also enable to browse comparative genomics results, to build queries and to export sets of gene lists and gene reports in various formats. The system should allow the annotation reconciliation, history, integrity, consistency and update and the management of public and private projects. To facilitate the work of the curators, four steps are crucial: 1. To provide homogeneous features, qualifiers and values for genomic objects; 2. To share a strong CAS: run high quality combiners / pipelines to predict automatically genomic objects which are stored in a relational database management system and then available from graphical and textual fast browsers and powerful editors; 3. To define annotation rules, train the annotators and organize annotation jamborees; 4. To submit the results in public sequence knowledge bases in an easy way. In this work we focus on the first and third steps. A mapping between different known sources: sequence ontology, DDBJ / EMBL / GenBank feature definition, GFF3, Chado, gene nomenclatures, transposable element classification and annotation guidelines from various genome project consortia is described. Homogeneous feature keys, qualifiers and value format with a maximum of controlled vocabularies for genes and transposable elements are proposed. Rules to annotate, in a coherent way, the structure and the function of genes and the structure and the classification of transposable elements are proposed. These rules could be useful both for automatic predictions and manual curation. Examples of annotations on a BAC sequence of a monocot are presented. (Texte intégral)

Saved in:
Bibliographic Details
Main Authors: Sidibé-Bocs, Stéphanie, Legeai, Fabrice, Droc, Gaëtan, Rouard, Mathieu, Alaux, Michael, Leroy, Philippe, Fournier, P., Terrier, Nancy, Baurens, Franc-Christophe, Garsmeur, Olivier, Poiron, Claire, Guignon, Valentin, Simon, A., Hoede, Claire, Steinbach Samson, Delphine, Lebrun, Marc-Henri, Tagu, Denis, Quesneville, H., Amselem, Joelle
Format: conference_item biblioteca
Language:eng
Published: s.n.
Subjects:C30 - Documentation et information, F30 - Génétique et amélioration des plantes, H10 - Ravageurs des plantes, génie génétique, http://aims.fao.org/aos/agrovoc/c_15974,
Online Access:http://agritrop.cirad.fr/551749/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In January 2009, 991 complete genomes have been already published and 3376 genome sequencing projects are ongoing, leading to an explosion of data that needs to be stored, curated and analyzed. GnpAnnot is a project on green genomics which intends to develop a system of structural and functional annotation supported by comparative genomics and dedicated to plant and bio-aggressor genomes allowing both automatic predictions and manual curations of genomic objects. The core of GnpAnnot is a community annotation system (CAS) based on GMOD components: Chado / GBrowse / Apollo / Artemis. The system should also enable to browse comparative genomics results, to build queries and to export sets of gene lists and gene reports in various formats. The system should allow the annotation reconciliation, history, integrity, consistency and update and the management of public and private projects. To facilitate the work of the curators, four steps are crucial: 1. To provide homogeneous features, qualifiers and values for genomic objects; 2. To share a strong CAS: run high quality combiners / pipelines to predict automatically genomic objects which are stored in a relational database management system and then available from graphical and textual fast browsers and powerful editors; 3. To define annotation rules, train the annotators and organize annotation jamborees; 4. To submit the results in public sequence knowledge bases in an easy way. In this work we focus on the first and third steps. A mapping between different known sources: sequence ontology, DDBJ / EMBL / GenBank feature definition, GFF3, Chado, gene nomenclatures, transposable element classification and annotation guidelines from various genome project consortia is described. Homogeneous feature keys, qualifiers and value format with a maximum of controlled vocabularies for genes and transposable elements are proposed. Rules to annotate, in a coherent way, the structure and the function of genes and the structure and the classification of transposable elements are proposed. These rules could be useful both for automatic predictions and manual curation. Examples of annotations on a BAC sequence of a monocot are presented. (Texte intégral)