Empirical Validation of Pooled Whole Genome Population Re-Sequencing in Drosophila melanogaster

The sequencing of pooled non-barcoded individuals is an inexpensive and efficient means of assessing genome-wide population allele frequencies, yet its accuracy has not been thoroughly tested. We assessed the accuracy of this approach on whole, complex eukaryotic genomes by resequencing pools of largely isogenic, individually sequenced Drosophila melanogaster strains. We called SNPs in the pooled data and estimated false positive and false negative rates using the SNPs called in individual strain as a reference. We also estimated allele frequency of the SNPs using “pooled” data and compared them with “true” frequencies taken from the estimates in the individual strains. We demonstrate that pooled sequencing provides a faithful estimate of population allele frequency with the error well approximated by binomial sampling, and is a reliable means of novel SNP discovery with low false positive rates. However, a sufficient number of strains should be used in the pooling because variation in the amount of DNA derived from individual strains is a substantial source of noise when the number of pooled strains is low. Our results and analysis confirm that pooled sequencing is a very powerful and cost-effective technique for assessing of patterns of sequence variation in populations on genome-wide scales, and is applicable to any dataset where sequencing individuals or individual cells is impossible, difficult, time consuming, or expensive.

Saved in:
Bibliographic Details
Main Authors: Zhu, Yuan, Bergland, Alan O., González Pérez, Josefa, Petrov, Dmitri A.
Other Authors: Ministerio de Ciencia e Innovación (España)
Format: artículo biblioteca
Language:English
Published: Public Library of Science 2012-07-26
Online Access:http://hdl.handle.net/10261/99950
http://dx.doi.org/10.13039/501100004837
http://dx.doi.org/10.13039/501100000780
http://dx.doi.org/10.13039/100000002
http://dx.doi.org/10.13039/501100001348
Tags: Add Tag
No Tags, Be the first to tag this record!
id dig-ibe-es-10261-99950
record_format koha
spelling dig-ibe-es-10261-999502021-12-28T16:27:33Z Empirical Validation of Pooled Whole Genome Population Re-Sequencing in Drosophila melanogaster Zhu, Yuan Bergland, Alan O. González Pérez, Josefa Petrov, Dmitri A. Ministerio de Ciencia e Innovación (España) European Commission National Institutes of Health (US) Agency for Science, Technology and Research A*STAR (Singapore) The sequencing of pooled non-barcoded individuals is an inexpensive and efficient means of assessing genome-wide population allele frequencies, yet its accuracy has not been thoroughly tested. We assessed the accuracy of this approach on whole, complex eukaryotic genomes by resequencing pools of largely isogenic, individually sequenced Drosophila melanogaster strains. We called SNPs in the pooled data and estimated false positive and false negative rates using the SNPs called in individual strain as a reference. We also estimated allele frequency of the SNPs using “pooled” data and compared them with “true” frequencies taken from the estimates in the individual strains. We demonstrate that pooled sequencing provides a faithful estimate of population allele frequency with the error well approximated by binomial sampling, and is a reliable means of novel SNP discovery with low false positive rates. However, a sufficient number of strains should be used in the pooling because variation in the amount of DNA derived from individual strains is a substantial source of noise when the number of pooled strains is low. Our results and analysis confirm that pooled sequencing is a very powerful and cost-effective technique for assessing of patterns of sequence variation in populations on genome-wide scales, and is applicable to any dataset where sequencing individuals or individual cells is impossible, difficult, time consuming, or expensive. YZ was supported by the A*STAR National Science Scholarship PhD. AOB was supported by the NIH NRSA fellowship (F32 GM097837-01). JG was supported by a Spanish Ministry of Science and Innovation (RYC-2010-07306) and a European Commission Marie Curie CIG Grant (PCIG09-GA-2011-293860). DAP was supported by NIH grants 1R01GM089926 and P50HG002568. Peer reviewed 2014-07-16T07:52:21Z 2014-07-16T07:52:21Z 2012-07-26 artículo http://purl.org/coar/resource_type/c_6501 PLoS ONE 7(7): e41901 (2012) 1932-6203 http://hdl.handle.net/10261/99950 10.1371/journal.pone.0041901 http://dx.doi.org/10.13039/501100004837 http://dx.doi.org/10.13039/501100000780 http://dx.doi.org/10.13039/100000002 http://dx.doi.org/10.13039/501100001348 22848651 en Publisher's version http://dx.doi.org/10.1371/journal.pone.0041901 Sí open Public Library of Science
institution IBE ES
collection DSpace
country España
countrycode ES
component Bibliográfico
access En linea
databasecode dig-ibe-es
tag biblioteca
region Europa del Sur
libraryname Biblioteca del IBE España
language English
description The sequencing of pooled non-barcoded individuals is an inexpensive and efficient means of assessing genome-wide population allele frequencies, yet its accuracy has not been thoroughly tested. We assessed the accuracy of this approach on whole, complex eukaryotic genomes by resequencing pools of largely isogenic, individually sequenced Drosophila melanogaster strains. We called SNPs in the pooled data and estimated false positive and false negative rates using the SNPs called in individual strain as a reference. We also estimated allele frequency of the SNPs using “pooled” data and compared them with “true” frequencies taken from the estimates in the individual strains. We demonstrate that pooled sequencing provides a faithful estimate of population allele frequency with the error well approximated by binomial sampling, and is a reliable means of novel SNP discovery with low false positive rates. However, a sufficient number of strains should be used in the pooling because variation in the amount of DNA derived from individual strains is a substantial source of noise when the number of pooled strains is low. Our results and analysis confirm that pooled sequencing is a very powerful and cost-effective technique for assessing of patterns of sequence variation in populations on genome-wide scales, and is applicable to any dataset where sequencing individuals or individual cells is impossible, difficult, time consuming, or expensive.
author2 Ministerio de Ciencia e Innovación (España)
author_facet Ministerio de Ciencia e Innovación (España)
Zhu, Yuan
Bergland, Alan O.
González Pérez, Josefa
Petrov, Dmitri A.
format artículo
author Zhu, Yuan
Bergland, Alan O.
González Pérez, Josefa
Petrov, Dmitri A.
spellingShingle Zhu, Yuan
Bergland, Alan O.
González Pérez, Josefa
Petrov, Dmitri A.
Empirical Validation of Pooled Whole Genome Population Re-Sequencing in Drosophila melanogaster
author_sort Zhu, Yuan
title Empirical Validation of Pooled Whole Genome Population Re-Sequencing in Drosophila melanogaster
title_short Empirical Validation of Pooled Whole Genome Population Re-Sequencing in Drosophila melanogaster
title_full Empirical Validation of Pooled Whole Genome Population Re-Sequencing in Drosophila melanogaster
title_fullStr Empirical Validation of Pooled Whole Genome Population Re-Sequencing in Drosophila melanogaster
title_full_unstemmed Empirical Validation of Pooled Whole Genome Population Re-Sequencing in Drosophila melanogaster
title_sort empirical validation of pooled whole genome population re-sequencing in drosophila melanogaster
publisher Public Library of Science
publishDate 2012-07-26
url http://hdl.handle.net/10261/99950
http://dx.doi.org/10.13039/501100004837
http://dx.doi.org/10.13039/501100000780
http://dx.doi.org/10.13039/100000002
http://dx.doi.org/10.13039/501100001348
work_keys_str_mv AT zhuyuan empiricalvalidationofpooledwholegenomepopulationresequencingindrosophilamelanogaster
AT berglandalano empiricalvalidationofpooledwholegenomepopulationresequencingindrosophilamelanogaster
AT gonzalezperezjosefa empiricalvalidationofpooledwholegenomepopulationresequencingindrosophilamelanogaster
AT petrovdmitria empiricalvalidationofpooledwholegenomepopulationresequencingindrosophilamelanogaster
_version_ 1777668574076207104