Separating homeologs by phasing in the tetraploid wheat transcriptome

Background: The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. Results: A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96 percent of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22 percent relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7 percent of SNPs analyzed are correctly separated by phasing. Conclusions: Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies.

Saved in:
Bibliographic Details
Main Authors: Krasileva, Ksenia V., Buffalo, Vince, Bailey, Paul, Pearce, Stephen, Ayling, Sarah, Tabbita, Facundo, Soria, Marcelo Abel, Wang, Shichen, Akhunov, Eduard, Uauy, Cristobal, Dubcovsky, Jorge, IWGS, Consortium
Format: Texto biblioteca
Language:eng
Subjects:GENE PREDICTION, PHASING, POLYPLOID, PSEUDOGENES, TRANSCRIPTOME ASSEMBLY, TRITICUM TURGIDUM, TRITICUM URARTU, WHEAT, CONTIG, PROTEOME, TRANSCRIPTOME, CONTROLLED STUDY, DIPLOIDY, GENE SEQUENCE, GENOME, GENOMICS, HETEROZYGOTE, HOMEOLOG, NONHUMAN, OPEN READING FRAME, PLANT GENOME, SINGLE NUCLEOTIDE POLYMORPHISM, TETRAPLOIDY, TRITICUM AESTIVUM, MULTIPLE K-MER ASSEMBLY, COMPLEMENTARY DNA, ,
Online Access:http://ceiba.agro.uba.ar/cgi-bin/koha/opac-detail.pl?biblionumber=46945
http://ceiba.agro.uba.ar/cgi-bin/koha/opac-detail.pl?biblionumber=
http://ceiba.agro.uba.ar/cgi-bin/koha/opac-detail.pl?biblionumber=
Tags: Add Tag
No Tags, Be the first to tag this record!
id KOHA-OAI-AGRO:46945
record_format koha
institution UBA FA
collection Koha
country Argentina
countrycode AR
component Bibliográfico
access En linea
En linea
databasecode cat-ceiba
tag biblioteca
region America del Sur
libraryname Biblioteca Central FAUBA
language eng
topic GENE PREDICTION
PHASING
POLYPLOID
PSEUDOGENES
TRANSCRIPTOME ASSEMBLY
TRITICUM TURGIDUM
TRITICUM URARTU
WHEAT
CONTIG
PROTEOME
TRANSCRIPTOME
CONTROLLED STUDY
DIPLOIDY
GENE SEQUENCE
GENOME
GENOMICS
HETEROZYGOTE
HOMEOLOG
NONHUMAN
OPEN READING FRAME
PLANT GENOME
SINGLE NUCLEOTIDE POLYMORPHISM
TETRAPLOIDY
TRITICUM AESTIVUM
MULTIPLE K-MER ASSEMBLY
PHASING
COMPLEMENTARY DNA

GENE PREDICTION
PHASING
POLYPLOID
PSEUDOGENES
TRANSCRIPTOME ASSEMBLY
TRITICUM TURGIDUM
TRITICUM URARTU
WHEAT
CONTIG
PROTEOME
TRANSCRIPTOME
CONTROLLED STUDY
DIPLOIDY
GENE SEQUENCE
GENOME
GENOMICS
HETEROZYGOTE
HOMEOLOG
NONHUMAN
OPEN READING FRAME
PLANT GENOME
SINGLE NUCLEOTIDE POLYMORPHISM
TETRAPLOIDY
TRITICUM AESTIVUM
MULTIPLE K-MER ASSEMBLY
PHASING
COMPLEMENTARY DNA
spellingShingle GENE PREDICTION
PHASING
POLYPLOID
PSEUDOGENES
TRANSCRIPTOME ASSEMBLY
TRITICUM TURGIDUM
TRITICUM URARTU
WHEAT
CONTIG
PROTEOME
TRANSCRIPTOME
CONTROLLED STUDY
DIPLOIDY
GENE SEQUENCE
GENOME
GENOMICS
HETEROZYGOTE
HOMEOLOG
NONHUMAN
OPEN READING FRAME
PLANT GENOME
SINGLE NUCLEOTIDE POLYMORPHISM
TETRAPLOIDY
TRITICUM AESTIVUM
MULTIPLE K-MER ASSEMBLY
PHASING
COMPLEMENTARY DNA

GENE PREDICTION
PHASING
POLYPLOID
PSEUDOGENES
TRANSCRIPTOME ASSEMBLY
TRITICUM TURGIDUM
TRITICUM URARTU
WHEAT
CONTIG
PROTEOME
TRANSCRIPTOME
CONTROLLED STUDY
DIPLOIDY
GENE SEQUENCE
GENOME
GENOMICS
HETEROZYGOTE
HOMEOLOG
NONHUMAN
OPEN READING FRAME
PLANT GENOME
SINGLE NUCLEOTIDE POLYMORPHISM
TETRAPLOIDY
TRITICUM AESTIVUM
MULTIPLE K-MER ASSEMBLY
PHASING
COMPLEMENTARY DNA
Krasileva, Ksenia V.
Buffalo, Vince
Bailey, Paul
Pearce, Stephen
Ayling, Sarah
Tabbita, Facundo
Soria, Marcelo Abel
Wang, Shichen
Akhunov, Eduard
Uauy, Cristobal
Dubcovsky, Jorge
IWGS, Consortium
Separating homeologs by phasing in the tetraploid wheat transcriptome
description Background: The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. Results: A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96 percent of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22 percent relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7 percent of SNPs analyzed are correctly separated by phasing. Conclusions: Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies.
format Texto
topic_facet
GENE PREDICTION
PHASING
POLYPLOID
PSEUDOGENES
TRANSCRIPTOME ASSEMBLY
TRITICUM TURGIDUM
TRITICUM URARTU
WHEAT
CONTIG
PROTEOME
TRANSCRIPTOME
CONTROLLED STUDY
DIPLOIDY
GENE SEQUENCE
GENOME
GENOMICS
HETEROZYGOTE
HOMEOLOG
NONHUMAN
OPEN READING FRAME
PLANT GENOME
SINGLE NUCLEOTIDE POLYMORPHISM
TETRAPLOIDY
TRITICUM AESTIVUM
MULTIPLE K-MER ASSEMBLY
PHASING
COMPLEMENTARY DNA
author Krasileva, Ksenia V.
Buffalo, Vince
Bailey, Paul
Pearce, Stephen
Ayling, Sarah
Tabbita, Facundo
Soria, Marcelo Abel
Wang, Shichen
Akhunov, Eduard
Uauy, Cristobal
Dubcovsky, Jorge
IWGS, Consortium
author_facet Krasileva, Ksenia V.
Buffalo, Vince
Bailey, Paul
Pearce, Stephen
Ayling, Sarah
Tabbita, Facundo
Soria, Marcelo Abel
Wang, Shichen
Akhunov, Eduard
Uauy, Cristobal
Dubcovsky, Jorge
IWGS, Consortium
author_sort Krasileva, Ksenia V.
title Separating homeologs by phasing in the tetraploid wheat transcriptome
title_short Separating homeologs by phasing in the tetraploid wheat transcriptome
title_full Separating homeologs by phasing in the tetraploid wheat transcriptome
title_fullStr Separating homeologs by phasing in the tetraploid wheat transcriptome
title_full_unstemmed Separating homeologs by phasing in the tetraploid wheat transcriptome
title_sort separating homeologs by phasing in the tetraploid wheat transcriptome
url http://ceiba.agro.uba.ar/cgi-bin/koha/opac-detail.pl?biblionumber=46945
http://ceiba.agro.uba.ar/cgi-bin/koha/opac-detail.pl?biblionumber=
http://ceiba.agro.uba.ar/cgi-bin/koha/opac-detail.pl?biblionumber=
work_keys_str_mv AT krasilevakseniav separatinghomeologsbyphasinginthetetraploidwheattranscriptome
AT buffalovince separatinghomeologsbyphasinginthetetraploidwheattranscriptome
AT baileypaul separatinghomeologsbyphasinginthetetraploidwheattranscriptome
AT pearcestephen separatinghomeologsbyphasinginthetetraploidwheattranscriptome
AT aylingsarah separatinghomeologsbyphasinginthetetraploidwheattranscriptome
AT tabbitafacundo separatinghomeologsbyphasinginthetetraploidwheattranscriptome
AT soriamarceloabel separatinghomeologsbyphasinginthetetraploidwheattranscriptome
AT wangshichen separatinghomeologsbyphasinginthetetraploidwheattranscriptome
AT akhunoveduard separatinghomeologsbyphasinginthetetraploidwheattranscriptome
AT uauycristobal separatinghomeologsbyphasinginthetetraploidwheattranscriptome
AT dubcovskyjorge separatinghomeologsbyphasinginthetetraploidwheattranscriptome
AT iwgsconsortium separatinghomeologsbyphasinginthetetraploidwheattranscriptome
_version_ 1787218012601843712
spelling KOHA-OAI-AGRO:469452023-11-23T14:52:04Zhttp://ceiba.agro.uba.ar/cgi-bin/koha/opac-detail.pl?biblionumber=46945http://ceiba.agro.uba.ar/cgi-bin/koha/opac-detail.pl?biblionumber=http://ceiba.agro.uba.ar/cgi-bin/koha/opac-detail.pl?biblionumber=AAGSeparating homeologs by phasing in the tetraploid wheat transcriptomeKrasileva, Ksenia V.Buffalo, VinceBailey, PaulPearce, StephenAyling, Sarah Tabbita, FacundoSoria, Marcelo AbelWang, ShichenAkhunov, EduardUauy, CristobalDubcovsky, JorgeIWGS, Consortiumtextengapplication/pdfBackground: The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. Results: A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96 percent of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22 percent relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7 percent of SNPs analyzed are correctly separated by phasing. Conclusions: Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies.Background: The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. Results: A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96 percent of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22 percent relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7 percent of SNPs analyzed are correctly separated by phasing. Conclusions: Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies.GENE PREDICTIONPHASINGPOLYPLOIDPSEUDOGENESTRANSCRIPTOME ASSEMBLYTRITICUM TURGIDUMTRITICUM URARTUWHEATCONTIGPROTEOMETRANSCRIPTOMECONTROLLED STUDYDIPLOIDYGENE SEQUENCEGENOMEGENOMICSHETEROZYGOTEHOMEOLOGNONHUMANOPEN READING FRAMEPLANT GENOMESINGLE NUCLEOTIDE POLYMORPHISMTETRAPLOIDYTRITICUM AESTIVUMMULTIPLE K-MER ASSEMBLYPHASINGCOMPLEMENTARY DNAGenome Biology