ddRADseq‑mediated detection of genetic variants in sugarcane
Sugarcane (Saccharum sp.), a world-wide known feedstock for sugar production, bioethanol, and energy, has an extremely complex genome, being highly polyploid and aneuploid. A double-digestion restriction site-associated DNA sequencing protocol (ddRADseq) was tested in four commercial sugarcane hybrids and one high-fbre biotype for the detec tion of single nucleotide polymorphisms (SNPs). In this work we tested two Illumina sequencing platforms, read size (70 vs. 150 bp), diferent sequencing coverage per individual (medium and high coverage), and single-reads versus paired-end reads. We also explored diferent variant calling strategies (with and without reference genome) and fltering schemes [com bining two minor allele frequencies (MAFs) with three depth of coverage thresholds]. For the discovery of a large number of novel SNPs in sugarcane, we recommend longer size and paired-end reads, medium sequencing coverage per individual and Illumina platform NovaSeq6000 for a cost-efective approach, and flter parameters of lower MAF and higher depth coverages thresholds. Although the de novo analysis retrieved more SNPs, the reference-based method allows downstream characterization of variants. For the two best performing matrices, the number of SNPs per chromosome correlated positively with chromosome length, demonstrating the presence of variants throughout the genome. Multivariate comparisons, with both matrices, showed closer relationships among commercial hybrids than with the high-fbre biotype. Functional analysis of the SNPs demonstrated that more than half of them landed within regulatory regions, whereas the other half afected cod ing, intergenic and intronic regions. Allelic distances values were lower than 0.07 when analysing two replicated genotypes, confrming the protocol robustness.
Main Authors: | , , , , , , , |
---|---|
Format: | info:ar-repo/semantics/artículo biblioteca |
Language: | eng |
Published: |
Springer
2022-11-11
|
Subjects: | Single Nucleotide Polymorphism, Hybrids, Sugar Cane, Polimorfismo de un Solo Nucleótido, Saccharum, Híbridos, Caña de Azúcar, Genotyping by Sequencing, Polyploid Genome, Sequencing, Genotipado por Secuenciación, Genoma Poliploide, Secuenciación, |
Online Access: | http://hdl.handle.net/20.500.12123/13460 https://link.springer.com/article/10.1007/s11103-022-01322-4 https://doi.org/10.1007/s11103-022-01322-4 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Sugarcane (Saccharum sp.), a world-wide known feedstock for sugar production, bioethanol, and energy, has an extremely complex genome, being highly polyploid and aneuploid. A double-digestion restriction site-associated DNA sequencing protocol (ddRADseq) was tested in four commercial sugarcane hybrids and one high-fbre biotype for the detec tion of single nucleotide polymorphisms (SNPs). In this work we tested two Illumina sequencing platforms, read size (70 vs. 150 bp), diferent sequencing coverage per individual (medium and high coverage), and single-reads versus paired-end reads. We also explored diferent variant calling strategies (with and without reference genome) and fltering schemes [com bining two minor allele frequencies (MAFs) with three depth of coverage thresholds]. For the discovery of a large number
of novel SNPs in sugarcane, we recommend longer size and paired-end reads, medium sequencing coverage per individual and Illumina platform NovaSeq6000 for a cost-efective approach, and flter parameters of lower MAF and higher depth coverages thresholds. Although the de novo analysis retrieved more SNPs, the reference-based method allows downstream characterization of variants. For the two best performing matrices, the number of SNPs per chromosome correlated positively with chromosome length, demonstrating the presence of variants throughout the genome. Multivariate comparisons, with
both matrices, showed closer relationships among commercial hybrids than with the high-fbre biotype. Functional analysis of the SNPs demonstrated that more than half of them landed within regulatory regions, whereas the other half afected cod ing, intergenic and intronic regions. Allelic distances values were lower than 0.07 when analysing two replicated genotypes, confrming the protocol robustness. |
---|