Minimum error correction-based haplotype assembly : Considerations for long read data

The single nucleotide polymorphism (SNP) is the most widely studied type of genetic variation. A haplotype is defined as the sequence of alleles at SNP sites on each haploid chromosome. Haplotype information is essential in unravelling the genome-phenotype association. Haplotype assembly is a well-known approach for reconstructing haplotypes, exploiting reads generated by DNA sequencing devices. The Minimum Error Correction (MEC) metric is often used for reconstruction of haplotypes from reads. However, problems with the MEC metric have been reported. Here, we investigate the MEC approach to demonstrate that it may result in incorrectly reconstructed haplotypes for devices that produce error-prone long reads. Specifically, we evaluate this approach for devices developed by Illumina, Pacific BioSciences and Oxford Nanopore Technologies. We show that imprecise haplotypes may be reconstructed with a lower MEC than that of the exact haplotype. The performance of MEC is explored for different coverage levels and error rates of data. Our simulation results reveal that in order to avoid incorrect MEC-based haplotypes, a coverage of 25 is needed for reads generated by Pacific BioSciences RS systems.

Saved in:
Bibliographic Details
Main Authors: Majidian, Sina, Kahaei, Mohammad Hossein, de Ridder, Dick
Format: Article/Letter to editor biblioteca
Language:English
Subjects:Life Science,
Online Access:https://research.wur.nl/en/publications/minimum-error-correction-based-haplotype-assembly-considerations-
Tags: Add Tag
No Tags, Be the first to tag this record!
id dig-wur-nl-wurpubs-566366
record_format koha
spelling dig-wur-nl-wurpubs-5663662024-12-04 Majidian, Sina Kahaei, Mohammad Hossein de Ridder, Dick Article/Letter to editor PLoS ONE 15 (2020) 6 ISSN: 1932-6203 Minimum error correction-based haplotype assembly : Considerations for long read data 2020 The single nucleotide polymorphism (SNP) is the most widely studied type of genetic variation. A haplotype is defined as the sequence of alleles at SNP sites on each haploid chromosome. Haplotype information is essential in unravelling the genome-phenotype association. Haplotype assembly is a well-known approach for reconstructing haplotypes, exploiting reads generated by DNA sequencing devices. The Minimum Error Correction (MEC) metric is often used for reconstruction of haplotypes from reads. However, problems with the MEC metric have been reported. Here, we investigate the MEC approach to demonstrate that it may result in incorrectly reconstructed haplotypes for devices that produce error-prone long reads. Specifically, we evaluate this approach for devices developed by Illumina, Pacific BioSciences and Oxford Nanopore Technologies. We show that imprecise haplotypes may be reconstructed with a lower MEC than that of the exact haplotype. The performance of MEC is explored for different coverage levels and error rates of data. Our simulation results reveal that in order to avoid incorrect MEC-based haplotypes, a coverage of 25 is needed for reads generated by Pacific BioSciences RS systems. en application/pdf https://research.wur.nl/en/publications/minimum-error-correction-based-haplotype-assembly-considerations- 10.1371/journal.pone.0234470 https://edepot.wur.nl/525368 Life Science https://creativecommons.org/licenses/by/4.0/ https://creativecommons.org/licenses/by/4.0/ Wageningen University & Research
institution WUR NL
collection DSpace
country Países bajos
countrycode NL
component Bibliográfico
access En linea
databasecode dig-wur-nl
tag biblioteca
region Europa del Oeste
libraryname WUR Library Netherlands
language English
topic Life Science
Life Science
spellingShingle Life Science
Life Science
Majidian, Sina
Kahaei, Mohammad Hossein
de Ridder, Dick
Minimum error correction-based haplotype assembly : Considerations for long read data
description The single nucleotide polymorphism (SNP) is the most widely studied type of genetic variation. A haplotype is defined as the sequence of alleles at SNP sites on each haploid chromosome. Haplotype information is essential in unravelling the genome-phenotype association. Haplotype assembly is a well-known approach for reconstructing haplotypes, exploiting reads generated by DNA sequencing devices. The Minimum Error Correction (MEC) metric is often used for reconstruction of haplotypes from reads. However, problems with the MEC metric have been reported. Here, we investigate the MEC approach to demonstrate that it may result in incorrectly reconstructed haplotypes for devices that produce error-prone long reads. Specifically, we evaluate this approach for devices developed by Illumina, Pacific BioSciences and Oxford Nanopore Technologies. We show that imprecise haplotypes may be reconstructed with a lower MEC than that of the exact haplotype. The performance of MEC is explored for different coverage levels and error rates of data. Our simulation results reveal that in order to avoid incorrect MEC-based haplotypes, a coverage of 25 is needed for reads generated by Pacific BioSciences RS systems.
format Article/Letter to editor
topic_facet Life Science
author Majidian, Sina
Kahaei, Mohammad Hossein
de Ridder, Dick
author_facet Majidian, Sina
Kahaei, Mohammad Hossein
de Ridder, Dick
author_sort Majidian, Sina
title Minimum error correction-based haplotype assembly : Considerations for long read data
title_short Minimum error correction-based haplotype assembly : Considerations for long read data
title_full Minimum error correction-based haplotype assembly : Considerations for long read data
title_fullStr Minimum error correction-based haplotype assembly : Considerations for long read data
title_full_unstemmed Minimum error correction-based haplotype assembly : Considerations for long read data
title_sort minimum error correction-based haplotype assembly : considerations for long read data
url https://research.wur.nl/en/publications/minimum-error-correction-based-haplotype-assembly-considerations-
work_keys_str_mv AT majidiansina minimumerrorcorrectionbasedhaplotypeassemblyconsiderationsforlongreaddata
AT kahaeimohammadhossein minimumerrorcorrectionbasedhaplotypeassemblyconsiderationsforlongreaddata
AT deridderdick minimumerrorcorrectionbasedhaplotypeassemblyconsiderationsforlongreaddata
_version_ 1819144875355930624