Bioinformatics' approaches to detect genetic variation in whole genome sequencing data

Current genetic marker repositories are not sufficient or even are completely lacking for most farm animals. However, genetic markers are essential for the development of a research tool facilitating discovery of genetic factors that contribute to resistance to disease and the overall welfare and performance in farm animals. By large scale identification of Single Nucleotide Polymorphisms (SNPs) and Structural Variants (SVs) we aimed to contribute to the development of a repository of genetic variants for farm animals. For this purpose bioinformatics data pipelines were designed and validated to address the challenge of the cost effective identification of genetic markers in DNA sequencing data even in absence of a fully sequenced reference genome. To find SNPs in pig, we analysed publicly available whole genome shotgun sequencing datasets by sequence alignment and clustering. Sequence clusters were assigned to genomic locations using publicly available BAC sequencing and BAC mapping data. Within the sequence clusters thousands of SNPs were detected of which the genomic location is roughly known. For turkey and duck, species that both were lacking a sufficient sequence data repository for variant discovery, we applied next-generation sequencing (NGS) on a reduced genome representation of a pooled DNA sample. For turkey a genome reference was reconstructed from our sequencing data and available public sequencing data whereas in duck the reference genome constructed by a (NGS) project was used. SNPs obtained by our cost-effective SNP detection procedure still turned out to cover, at intervals, the whole turkey and duck genomes and are of sufficient quality to be used in genotyping studies. Allele frequencies, obtained by genotyping animal panels with a subset our SNPs, correlated well with those observed during SNP detection. The availability of two external duck SNP datasets allowed for the construction of a subset of SNPs which we had in common with these sets. Genotyping turned out that this subset was of outstanding quality and can be used for benchmarking other SNPs that we identified within duck. Ongoing developments in (NGS) allowed for paired end sequencing which is an extension on sequencing analysis that provides information about which pair of reads are coming from the outer ends of one sequenced DNA fragment. We applied this technique on a reduced genome representation of four chicken breeds to detect SVs. Paired end reads were mapped to the chicken reference genome and SVs were identified as abnormally aligned read pairs that have orientation or span sizes discordant from the reference genome. SV detection parameters, to distinguish true structural variants from false positives, were designed and optimized by validation of a small representative sample of SVs using PCR and traditional capillary sequencing. To conclude: we developed SNP repositories which fulfils a requirement for SNPs to perform linkage analysis, comparative genomics QTL studies and ultimately GWA studies in a range of farm animals. We also set the first step in developing a repository for SVs in chicken, a relatively new genetic marker in animal sciences.

Saved in:
Bibliographic Details
Main Author: Kerstens, H.H.D.
Other Authors: Groenen, Martien
Format: Doctoral thesis biblioteca
Language:English
Subjects:anas platyrhynchos, animal breeding, bioinformatics, fowls, genetic variation, genomes, genomics, marker assisted breeding, nucleotide sequences, pigs, single nucleotide polymorphism, turkeys, bio-informatica, dierveredeling, genetische variatie, genexpressieanalyse, genomen, kalkoenen, kippen, nucleotidenvolgordes, varkens,
Online Access:https://research.wur.nl/en/publications/bioinformatics-approaches-to-detect-genetic-variation-in-whole-ge
Tags: Add Tag
No Tags, Be the first to tag this record!
id dig-wur-nl-wurpubs-396761
record_format koha
spelling dig-wur-nl-wurpubs-3967612024-12-23 Kerstens, H.H.D. Groenen, Martien Smits, Mari Doctoral thesis Bioinformatics' approaches to detect genetic variation in whole genome sequencing data 2010 Current genetic marker repositories are not sufficient or even are completely lacking for most farm animals. However, genetic markers are essential for the development of a research tool facilitating discovery of genetic factors that contribute to resistance to disease and the overall welfare and performance in farm animals. By large scale identification of Single Nucleotide Polymorphisms (SNPs) and Structural Variants (SVs) we aimed to contribute to the development of a repository of genetic variants for farm animals. For this purpose bioinformatics data pipelines were designed and validated to address the challenge of the cost effective identification of genetic markers in DNA sequencing data even in absence of a fully sequenced reference genome. To find SNPs in pig, we analysed publicly available whole genome shotgun sequencing datasets by sequence alignment and clustering. Sequence clusters were assigned to genomic locations using publicly available BAC sequencing and BAC mapping data. Within the sequence clusters thousands of SNPs were detected of which the genomic location is roughly known. For turkey and duck, species that both were lacking a sufficient sequence data repository for variant discovery, we applied next-generation sequencing (NGS) on a reduced genome representation of a pooled DNA sample. For turkey a genome reference was reconstructed from our sequencing data and available public sequencing data whereas in duck the reference genome constructed by a (NGS) project was used. SNPs obtained by our cost-effective SNP detection procedure still turned out to cover, at intervals, the whole turkey and duck genomes and are of sufficient quality to be used in genotyping studies. Allele frequencies, obtained by genotyping animal panels with a subset our SNPs, correlated well with those observed during SNP detection. The availability of two external duck SNP datasets allowed for the construction of a subset of SNPs which we had in common with these sets. Genotyping turned out that this subset was of outstanding quality and can be used for benchmarking other SNPs that we identified within duck. Ongoing developments in (NGS) allowed for paired end sequencing which is an extension on sequencing analysis that provides information about which pair of reads are coming from the outer ends of one sequenced DNA fragment. We applied this technique on a reduced genome representation of four chicken breeds to detect SVs. Paired end reads were mapped to the chicken reference genome and SVs were identified as abnormally aligned read pairs that have orientation or span sizes discordant from the reference genome. SV detection parameters, to distinguish true structural variants from false positives, were designed and optimized by validation of a small representative sample of SVs using PCR and traditional capillary sequencing. To conclude: we developed SNP repositories which fulfils a requirement for SNPs to perform linkage analysis, comparative genomics QTL studies and ultimately GWA studies in a range of farm animals. We also set the first step in developing a repository for SVs in chicken, a relatively new genetic marker in animal sciences. en application/pdf https://research.wur.nl/en/publications/bioinformatics-approaches-to-detect-genetic-variation-in-whole-ge 10.18174/151106 https://edepot.wur.nl/151106 anas platyrhynchos animal breeding bioinformatics fowls genetic variation genomes genomics marker assisted breeding nucleotide sequences pigs single nucleotide polymorphism turkeys anas platyrhynchos bio-informatica dierveredeling genetische variatie genexpressieanalyse genomen kalkoenen kippen marker assisted breeding nucleotidenvolgordes single nucleotide polymorphism varkens Wageningen University & Research
institution WUR NL
collection DSpace
country Países bajos
countrycode NL
component Bibliográfico
access En linea
databasecode dig-wur-nl
tag biblioteca
region Europa del Oeste
libraryname WUR Library Netherlands
language English
topic anas platyrhynchos
animal breeding
bioinformatics
fowls
genetic variation
genomes
genomics
marker assisted breeding
nucleotide sequences
pigs
single nucleotide polymorphism
turkeys
anas platyrhynchos
bio-informatica
dierveredeling
genetische variatie
genexpressieanalyse
genomen
kalkoenen
kippen
marker assisted breeding
nucleotidenvolgordes
single nucleotide polymorphism
varkens
anas platyrhynchos
animal breeding
bioinformatics
fowls
genetic variation
genomes
genomics
marker assisted breeding
nucleotide sequences
pigs
single nucleotide polymorphism
turkeys
anas platyrhynchos
bio-informatica
dierveredeling
genetische variatie
genexpressieanalyse
genomen
kalkoenen
kippen
marker assisted breeding
nucleotidenvolgordes
single nucleotide polymorphism
varkens
spellingShingle anas platyrhynchos
animal breeding
bioinformatics
fowls
genetic variation
genomes
genomics
marker assisted breeding
nucleotide sequences
pigs
single nucleotide polymorphism
turkeys
anas platyrhynchos
bio-informatica
dierveredeling
genetische variatie
genexpressieanalyse
genomen
kalkoenen
kippen
marker assisted breeding
nucleotidenvolgordes
single nucleotide polymorphism
varkens
anas platyrhynchos
animal breeding
bioinformatics
fowls
genetic variation
genomes
genomics
marker assisted breeding
nucleotide sequences
pigs
single nucleotide polymorphism
turkeys
anas platyrhynchos
bio-informatica
dierveredeling
genetische variatie
genexpressieanalyse
genomen
kalkoenen
kippen
marker assisted breeding
nucleotidenvolgordes
single nucleotide polymorphism
varkens
Kerstens, H.H.D.
Bioinformatics' approaches to detect genetic variation in whole genome sequencing data
description Current genetic marker repositories are not sufficient or even are completely lacking for most farm animals. However, genetic markers are essential for the development of a research tool facilitating discovery of genetic factors that contribute to resistance to disease and the overall welfare and performance in farm animals. By large scale identification of Single Nucleotide Polymorphisms (SNPs) and Structural Variants (SVs) we aimed to contribute to the development of a repository of genetic variants for farm animals. For this purpose bioinformatics data pipelines were designed and validated to address the challenge of the cost effective identification of genetic markers in DNA sequencing data even in absence of a fully sequenced reference genome. To find SNPs in pig, we analysed publicly available whole genome shotgun sequencing datasets by sequence alignment and clustering. Sequence clusters were assigned to genomic locations using publicly available BAC sequencing and BAC mapping data. Within the sequence clusters thousands of SNPs were detected of which the genomic location is roughly known. For turkey and duck, species that both were lacking a sufficient sequence data repository for variant discovery, we applied next-generation sequencing (NGS) on a reduced genome representation of a pooled DNA sample. For turkey a genome reference was reconstructed from our sequencing data and available public sequencing data whereas in duck the reference genome constructed by a (NGS) project was used. SNPs obtained by our cost-effective SNP detection procedure still turned out to cover, at intervals, the whole turkey and duck genomes and are of sufficient quality to be used in genotyping studies. Allele frequencies, obtained by genotyping animal panels with a subset our SNPs, correlated well with those observed during SNP detection. The availability of two external duck SNP datasets allowed for the construction of a subset of SNPs which we had in common with these sets. Genotyping turned out that this subset was of outstanding quality and can be used for benchmarking other SNPs that we identified within duck. Ongoing developments in (NGS) allowed for paired end sequencing which is an extension on sequencing analysis that provides information about which pair of reads are coming from the outer ends of one sequenced DNA fragment. We applied this technique on a reduced genome representation of four chicken breeds to detect SVs. Paired end reads were mapped to the chicken reference genome and SVs were identified as abnormally aligned read pairs that have orientation or span sizes discordant from the reference genome. SV detection parameters, to distinguish true structural variants from false positives, were designed and optimized by validation of a small representative sample of SVs using PCR and traditional capillary sequencing. To conclude: we developed SNP repositories which fulfils a requirement for SNPs to perform linkage analysis, comparative genomics QTL studies and ultimately GWA studies in a range of farm animals. We also set the first step in developing a repository for SVs in chicken, a relatively new genetic marker in animal sciences.
author2 Groenen, Martien
author_facet Groenen, Martien
Kerstens, H.H.D.
format Doctoral thesis
topic_facet anas platyrhynchos
animal breeding
bioinformatics
fowls
genetic variation
genomes
genomics
marker assisted breeding
nucleotide sequences
pigs
single nucleotide polymorphism
turkeys
anas platyrhynchos
bio-informatica
dierveredeling
genetische variatie
genexpressieanalyse
genomen
kalkoenen
kippen
marker assisted breeding
nucleotidenvolgordes
single nucleotide polymorphism
varkens
author Kerstens, H.H.D.
author_sort Kerstens, H.H.D.
title Bioinformatics' approaches to detect genetic variation in whole genome sequencing data
title_short Bioinformatics' approaches to detect genetic variation in whole genome sequencing data
title_full Bioinformatics' approaches to detect genetic variation in whole genome sequencing data
title_fullStr Bioinformatics' approaches to detect genetic variation in whole genome sequencing data
title_full_unstemmed Bioinformatics' approaches to detect genetic variation in whole genome sequencing data
title_sort bioinformatics' approaches to detect genetic variation in whole genome sequencing data
url https://research.wur.nl/en/publications/bioinformatics-approaches-to-detect-genetic-variation-in-whole-ge
work_keys_str_mv AT kerstenshhd bioinformaticsapproachestodetectgeneticvariationinwholegenomesequencingdata
_version_ 1822273160348696576