Improving transcriptome de novo assembly by using a reference genome of a related species: translational genomics from oil palm to coconut

Transcriptome sequencing by next-generation techniques has allowed expanding knowledge of genomics in nonmodel organisms. However, when the transcripts identified by these techniques are compared against the transcripts references (model organisms), there is a high variability in the percentage of recovered transcripts resulting from several biological and technical factors. In this work we propose a methodology that improved the results of de novo assembly of coconut - our species of interest - by using a reference genome (oil palm), thereby ensuring the subjacent biological relevance of results. Two critical stages were evaluated: the identification of novel transcripts and/or exons and the translation of the transcript into amino acid sequences. Two existing approaches (BRANCH and Cufflinks RABT) were used for identifying new transcripts and/or exons, not found by de novo assembly, using the genes of a related species, the raw data of the transcriptome (reads) and transcripts obtained by de novo assembly. In the translation stage a pipeline based on a statistical model that discriminates coding regions, FrameDP, was assessed. The biological pertinence of these two stages was evaluated through a functional analysis. The two approaches identified more than 70000 new transcripts and exons in Cocos nucifera representing an increase of 35% in relation to the initial amount. Over 96% of these new sequences had significant alignments against Uniprot database. A more detailed functional analysis revealed specific features of these new transcripts and exons, compared to those obtained from de novo assembly. The translation pipeline defined protein sequences for 56% of new transcripts. The quality of the protein sequences was confirmed by a significant improvement in the identification of biological functions. Finally, these annotated sequences were used for a comparative analysis in palms. Our methodology allows a more comprehensive representation of the gene space of nonmodel organisms, prior to comparative genomics.

Saved in:
Bibliographic Details
Main Authors: Armero Villanueva, Alix Augusto, Bocs, Stéphanie, Baudouin, Luc, This, Dominique
Format: conference_item biblioteca
Language:eng
Published: Elsevier
Subjects:F30 - Génétique et amélioration des plantes,
Online Access:http://agritrop.cirad.fr/580121/
http://agritrop.cirad.fr/580121/1/Poster_PGE_COCNU_armero_2015_LB-2.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!