Separating homeologs by phasing in the tetraploid wheat transcriptome
Background: The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. Results: A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96 percent of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22 percent relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7 percent of SNPs analyzed are correctly separated by phasing. Conclusions: Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies.
Main Authors: | , , , , , , , , , , , |
---|---|
Format: | Texto biblioteca |
Language: | eng |
Subjects: | GENE PREDICTION, PHASING, POLYPLOID, PSEUDOGENES, TRANSCRIPTOME ASSEMBLY, TRITICUM TURGIDUM, TRITICUM URARTU, WHEAT, CONTIG, PROTEOME, TRANSCRIPTOME, CONTROLLED STUDY, DIPLOIDY, GENE SEQUENCE, GENOME, GENOMICS, HETEROZYGOTE, HOMEOLOG, NONHUMAN, OPEN READING FRAME, PLANT GENOME, SINGLE NUCLEOTIDE POLYMORPHISM, TETRAPLOIDY, TRITICUM AESTIVUM, MULTIPLE K-MER ASSEMBLY, COMPLEMENTARY DNA, , |
Online Access: | http://ceiba.agro.uba.ar/cgi-bin/koha/opac-detail.pl?biblionumber=46945 http://ceiba.agro.uba.ar/cgi-bin/koha/opac-detail.pl?biblionumber= http://ceiba.agro.uba.ar/cgi-bin/koha/opac-detail.pl?biblionumber= |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
KOHA-OAI-AGRO:46945 |
---|---|
record_format |
koha |
institution |
UBA FA |
collection |
Koha |
country |
Argentina |
countrycode |
AR |
component |
Bibliográfico |
access |
En linea En linea |
databasecode |
cat-ceiba |
tag |
biblioteca |
region |
America del Sur |
libraryname |
Biblioteca Central FAUBA |
language |
eng |
topic |
GENE PREDICTION PHASING POLYPLOID PSEUDOGENES TRANSCRIPTOME ASSEMBLY TRITICUM TURGIDUM TRITICUM URARTU WHEAT CONTIG PROTEOME TRANSCRIPTOME CONTROLLED STUDY DIPLOIDY GENE SEQUENCE GENOME GENOMICS HETEROZYGOTE HOMEOLOG NONHUMAN OPEN READING FRAME PLANT GENOME SINGLE NUCLEOTIDE POLYMORPHISM TETRAPLOIDY TRITICUM AESTIVUM MULTIPLE K-MER ASSEMBLY PHASING COMPLEMENTARY DNA GENE PREDICTION PHASING POLYPLOID PSEUDOGENES TRANSCRIPTOME ASSEMBLY TRITICUM TURGIDUM TRITICUM URARTU WHEAT CONTIG PROTEOME TRANSCRIPTOME CONTROLLED STUDY DIPLOIDY GENE SEQUENCE GENOME GENOMICS HETEROZYGOTE HOMEOLOG NONHUMAN OPEN READING FRAME PLANT GENOME SINGLE NUCLEOTIDE POLYMORPHISM TETRAPLOIDY TRITICUM AESTIVUM MULTIPLE K-MER ASSEMBLY PHASING COMPLEMENTARY DNA |
spellingShingle |
GENE PREDICTION PHASING POLYPLOID PSEUDOGENES TRANSCRIPTOME ASSEMBLY TRITICUM TURGIDUM TRITICUM URARTU WHEAT CONTIG PROTEOME TRANSCRIPTOME CONTROLLED STUDY DIPLOIDY GENE SEQUENCE GENOME GENOMICS HETEROZYGOTE HOMEOLOG NONHUMAN OPEN READING FRAME PLANT GENOME SINGLE NUCLEOTIDE POLYMORPHISM TETRAPLOIDY TRITICUM AESTIVUM MULTIPLE K-MER ASSEMBLY PHASING COMPLEMENTARY DNA GENE PREDICTION PHASING POLYPLOID PSEUDOGENES TRANSCRIPTOME ASSEMBLY TRITICUM TURGIDUM TRITICUM URARTU WHEAT CONTIG PROTEOME TRANSCRIPTOME CONTROLLED STUDY DIPLOIDY GENE SEQUENCE GENOME GENOMICS HETEROZYGOTE HOMEOLOG NONHUMAN OPEN READING FRAME PLANT GENOME SINGLE NUCLEOTIDE POLYMORPHISM TETRAPLOIDY TRITICUM AESTIVUM MULTIPLE K-MER ASSEMBLY PHASING COMPLEMENTARY DNA Krasileva, Ksenia V. Buffalo, Vince Bailey, Paul Pearce, Stephen Ayling, Sarah Tabbita, Facundo Soria, Marcelo Abel Wang, Shichen Akhunov, Eduard Uauy, Cristobal Dubcovsky, Jorge IWGS, Consortium Separating homeologs by phasing in the tetraploid wheat transcriptome |
description |
Background: The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. Results: A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96 percent of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22 percent relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7 percent of SNPs analyzed are correctly separated by phasing. Conclusions: Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies. |
format |
Texto |
topic_facet |
GENE PREDICTION PHASING POLYPLOID PSEUDOGENES TRANSCRIPTOME ASSEMBLY TRITICUM TURGIDUM TRITICUM URARTU WHEAT CONTIG PROTEOME TRANSCRIPTOME CONTROLLED STUDY DIPLOIDY GENE SEQUENCE GENOME GENOMICS HETEROZYGOTE HOMEOLOG NONHUMAN OPEN READING FRAME PLANT GENOME SINGLE NUCLEOTIDE POLYMORPHISM TETRAPLOIDY TRITICUM AESTIVUM MULTIPLE K-MER ASSEMBLY PHASING COMPLEMENTARY DNA |
author |
Krasileva, Ksenia V. Buffalo, Vince Bailey, Paul Pearce, Stephen Ayling, Sarah Tabbita, Facundo Soria, Marcelo Abel Wang, Shichen Akhunov, Eduard Uauy, Cristobal Dubcovsky, Jorge IWGS, Consortium |
author_facet |
Krasileva, Ksenia V. Buffalo, Vince Bailey, Paul Pearce, Stephen Ayling, Sarah Tabbita, Facundo Soria, Marcelo Abel Wang, Shichen Akhunov, Eduard Uauy, Cristobal Dubcovsky, Jorge IWGS, Consortium |
author_sort |
Krasileva, Ksenia V. |
title |
Separating homeologs by phasing in the tetraploid wheat transcriptome |
title_short |
Separating homeologs by phasing in the tetraploid wheat transcriptome |
title_full |
Separating homeologs by phasing in the tetraploid wheat transcriptome |
title_fullStr |
Separating homeologs by phasing in the tetraploid wheat transcriptome |
title_full_unstemmed |
Separating homeologs by phasing in the tetraploid wheat transcriptome |
title_sort |
separating homeologs by phasing in the tetraploid wheat transcriptome |
url |
http://ceiba.agro.uba.ar/cgi-bin/koha/opac-detail.pl?biblionumber=46945 http://ceiba.agro.uba.ar/cgi-bin/koha/opac-detail.pl?biblionumber= http://ceiba.agro.uba.ar/cgi-bin/koha/opac-detail.pl?biblionumber= |
work_keys_str_mv |
AT krasilevakseniav separatinghomeologsbyphasinginthetetraploidwheattranscriptome AT buffalovince separatinghomeologsbyphasinginthetetraploidwheattranscriptome AT baileypaul separatinghomeologsbyphasinginthetetraploidwheattranscriptome AT pearcestephen separatinghomeologsbyphasinginthetetraploidwheattranscriptome AT aylingsarah separatinghomeologsbyphasinginthetetraploidwheattranscriptome AT tabbitafacundo separatinghomeologsbyphasinginthetetraploidwheattranscriptome AT soriamarceloabel separatinghomeologsbyphasinginthetetraploidwheattranscriptome AT wangshichen separatinghomeologsbyphasinginthetetraploidwheattranscriptome AT akhunoveduard separatinghomeologsbyphasinginthetetraploidwheattranscriptome AT uauycristobal separatinghomeologsbyphasinginthetetraploidwheattranscriptome AT dubcovskyjorge separatinghomeologsbyphasinginthetetraploidwheattranscriptome AT iwgsconsortium separatinghomeologsbyphasinginthetetraploidwheattranscriptome |
_version_ |
1787218012601843712 |
spelling |
KOHA-OAI-AGRO:469452023-11-23T14:52:04Zhttp://ceiba.agro.uba.ar/cgi-bin/koha/opac-detail.pl?biblionumber=46945http://ceiba.agro.uba.ar/cgi-bin/koha/opac-detail.pl?biblionumber=http://ceiba.agro.uba.ar/cgi-bin/koha/opac-detail.pl?biblionumber=AAGSeparating homeologs by phasing in the tetraploid wheat transcriptomeKrasileva, Ksenia V.Buffalo, VinceBailey, PaulPearce, StephenAyling, Sarah Tabbita, FacundoSoria, Marcelo AbelWang, ShichenAkhunov, EduardUauy, CristobalDubcovsky, JorgeIWGS, Consortiumtextengapplication/pdfBackground: The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. Results: A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96 percent of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22 percent relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7 percent of SNPs analyzed are correctly separated by phasing. Conclusions: Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies.Background: The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. Results: A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96 percent of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22 percent relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7 percent of SNPs analyzed are correctly separated by phasing. Conclusions: Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies.GENE PREDICTIONPHASINGPOLYPLOIDPSEUDOGENESTRANSCRIPTOME ASSEMBLYTRITICUM TURGIDUMTRITICUM URARTUWHEATCONTIGPROTEOMETRANSCRIPTOMECONTROLLED STUDYDIPLOIDYGENE SEQUENCEGENOMEGENOMICSHETEROZYGOTEHOMEOLOGNONHUMANOPEN READING FRAMEPLANT GENOMESINGLE NUCLEOTIDE POLYMORPHISMTETRAPLOIDYTRITICUM AESTIVUMMULTIPLE K-MER ASSEMBLYPHASINGCOMPLEMENTARY DNAGenome Biology |