Consistency of metagenomic assignment programs in simulated and real data

[Backgroun] Metagenomics is the genomic study of uncultured environmental samples, which has been greatly facilitated by the advent of shotgun-sequencing technologies. One of the main focuses of metagenomics is the discovery of previously uncultured microorganisms, which makes the assignment of sequences to a particular taxon a challenge and a crucial step. Recently, several methods have been developed to perform this task, based on different methodologies such as sequence composition or sequence similarity. The sequence composition methods have the ability to completely assign the whole dataset. However, their use in metagenomics and the study of their performance with real data is limited. In this work, we assess the consistency of three different methods (BLAST + Lowest Common Ancestor, Phymm, and Naïve Bayesian Classifier) in assigning real and simulated sequence reads.

Saved in:
Bibliographic Details
Main Authors: García-Etxebarria, Koldo, Garcia-Garcerà, Marc, Calafell, Francesc
Other Authors: Ministerio de Ciencia e Innovación (España)
Format: artículo biblioteca
Published: BioMed Central 2014-03-28
Online Access:http://hdl.handle.net/10261/95685
http://dx.doi.org/10.13039/501100004837
Tags: Add Tag
No Tags, Be the first to tag this record!
id dig-ibe-es-10261-95685
record_format koha
spelling dig-ibe-es-10261-956852021-12-28T16:14:17Z Consistency of metagenomic assignment programs in simulated and real data García-Etxebarria, Koldo Garcia-Garcerà, Marc Calafell, Francesc Ministerio de Ciencia e Innovación (España) [Backgroun] Metagenomics is the genomic study of uncultured environmental samples, which has been greatly facilitated by the advent of shotgun-sequencing technologies. One of the main focuses of metagenomics is the discovery of previously uncultured microorganisms, which makes the assignment of sequences to a particular taxon a challenge and a crucial step. Recently, several methods have been developed to perform this task, based on different methodologies such as sequence composition or sequence similarity. The sequence composition methods have the ability to completely assign the whole dataset. However, their use in metagenomics and the study of their performance with real data is limited. In this work, we assess the consistency of three different methods (BLAST + Lowest Common Ancestor, Phymm, and Naïve Bayesian Classifier) in assigning real and simulated sequence reads. [Results] Both in real and in simulated data, BLAST + Lowest Common Ancestor (BLAST + LCA), Phymm, and Naïve Bayesian Classifier consistently assign a larger number of reads in higher taxonomic levels than in lower levels. However, discrepancies increase at lower taxonomic levels. In simulated data, consistent assignments between all three methods showed greater precision than assignments based on Phymm or Bayesian Classifier alone, since the BLAST + LCA algorithm performed best. In addition, assignment consistency in real data increased with sequence read length, in agreement with previously published simulation results. [Conclusions] The use and combination of different approaches is advisable to assign metagenomic reads. Although the sensitivity could be reduced, the reliability can be increased by using the reads consistently assigned to the same taxa by, at least, two methods, and by training the programs using all available information. This work was financed by the MICINN (Spanish Ministry of Science and Innovation) grant SAF2010-16240. MGG was supported by a predoctoral fellowship from MICINN. Peer Reviewed 2014-04-22T03:37:18Z 2014-04-22T03:37:18Z 2014-03-28 2014-04-22T03:37:18Z artículo http://purl.org/coar/resource_type/c_6501 BMC Bioinformatics 15(1): 90 (2014) http://hdl.handle.net/10261/95685 10.1186/1471-2105-15-90 1471-2105 http://dx.doi.org/10.13039/501100004837 24678591 Publisher’s version http://dx.doi.org/10.1186/1471-2105-15-90 Sí open BioMed Central
institution IBE ES
collection DSpace
country España
countrycode ES
component Bibliográfico
access En linea
databasecode dig-ibe-es
tag biblioteca
region Europa del Sur
libraryname Biblioteca del IBE España
description [Backgroun] Metagenomics is the genomic study of uncultured environmental samples, which has been greatly facilitated by the advent of shotgun-sequencing technologies. One of the main focuses of metagenomics is the discovery of previously uncultured microorganisms, which makes the assignment of sequences to a particular taxon a challenge and a crucial step. Recently, several methods have been developed to perform this task, based on different methodologies such as sequence composition or sequence similarity. The sequence composition methods have the ability to completely assign the whole dataset. However, their use in metagenomics and the study of their performance with real data is limited. In this work, we assess the consistency of three different methods (BLAST + Lowest Common Ancestor, Phymm, and Naïve Bayesian Classifier) in assigning real and simulated sequence reads.
author2 Ministerio de Ciencia e Innovación (España)
author_facet Ministerio de Ciencia e Innovación (España)
García-Etxebarria, Koldo
Garcia-Garcerà, Marc
Calafell, Francesc
format artículo
author García-Etxebarria, Koldo
Garcia-Garcerà, Marc
Calafell, Francesc
spellingShingle García-Etxebarria, Koldo
Garcia-Garcerà, Marc
Calafell, Francesc
Consistency of metagenomic assignment programs in simulated and real data
author_sort García-Etxebarria, Koldo
title Consistency of metagenomic assignment programs in simulated and real data
title_short Consistency of metagenomic assignment programs in simulated and real data
title_full Consistency of metagenomic assignment programs in simulated and real data
title_fullStr Consistency of metagenomic assignment programs in simulated and real data
title_full_unstemmed Consistency of metagenomic assignment programs in simulated and real data
title_sort consistency of metagenomic assignment programs in simulated and real data
publisher BioMed Central
publishDate 2014-03-28
url http://hdl.handle.net/10261/95685
http://dx.doi.org/10.13039/501100004837
work_keys_str_mv AT garciaetxebarriakoldo consistencyofmetagenomicassignmentprogramsinsimulatedandrealdata
AT garciagarceramarc consistencyofmetagenomicassignmentprogramsinsimulatedandrealdata
AT calafellfrancesc consistencyofmetagenomicassignmentprogramsinsimulatedandrealdata
_version_ 1777668573189111808