Bioinformatics' approaches to detect genetic variation in whole genome sequencing data

Current genetic marker repositories are not sufficient or even are completely lacking for most farm animals. However, genetic markers are essential for the development of a research tool facilitating discovery of genetic factors that contribute to resistance to disease and the overall welfare and performance in farm animals. By large scale identification of Single Nucleotide Polymorphisms (SNPs) and Structural Variants (SVs) we aimed to contribute to the development of a repository of genetic variants for farm animals. For this purpose bioinformatics data pipelines were designed and validated to address the challenge of the cost effective identification of genetic markers in DNA sequencing data even in absence of a fully sequenced reference genome. To find SNPs in pig, we analysed publicly available whole genome shotgun sequencing datasets by sequence alignment and clustering. Sequence clusters were assigned to genomic locations using publicly available BAC sequencing and BAC mapping data. Within the sequence clusters thousands of SNPs were detected of which the genomic location is roughly known. For turkey and duck, species that both were lacking a sufficient sequence data repository for variant discovery, we applied next-generation sequencing (NGS) on a reduced genome representation of a pooled DNA sample. For turkey a genome reference was reconstructed from our sequencing data and available public sequencing data whereas in duck the reference genome constructed by a (NGS) project was used. SNPs obtained by our cost-effective SNP detection procedure still turned out to cover, at intervals, the whole turkey and duck genomes and are of sufficient quality to be used in genotyping studies. Allele frequencies, obtained by genotyping animal panels with a subset our SNPs, correlated well with those observed during SNP detection. The availability of two external duck SNP datasets allowed for the construction of a subset of SNPs which we had in common with these sets. Genotyping turned out that this subset was of outstanding quality and can be used for benchmarking other SNPs that we identified within duck. Ongoing developments in (NGS) allowed for paired end sequencing which is an extension on sequencing analysis that provides information about which pair of reads are coming from the outer ends of one sequenced DNA fragment. We applied this technique on a reduced genome representation of four chicken breeds to detect SVs. Paired end reads were mapped to the chicken reference genome and SVs were identified as abnormally aligned read pairs that have orientation or span sizes discordant from the reference genome. SV detection parameters, to distinguish true structural variants from false positives, were designed and optimized by validation of a small representative sample of SVs using PCR and traditional capillary sequencing. To conclude: we developed SNP repositories which fulfils a requirement for SNPs to perform linkage analysis, comparative genomics QTL studies and ultimately GWA studies in a range of farm animals. We also set the first step in developing a repository for SVs in chicken, a relatively new genetic marker in animal sciences.

Saved in:
Bibliographic Details
Main Author: Kerstens, H.H.D.
Other Authors: Groenen, Martien
Format: Doctoral thesis biblioteca
Language:English
Subjects:anas platyrhynchos, animal breeding, bioinformatics, fowls, genetic variation, genomes, genomics, marker assisted breeding, nucleotide sequences, pigs, single nucleotide polymorphism, turkeys, bio-informatica, dierveredeling, genetische variatie, genexpressieanalyse, genomen, kalkoenen, kippen, nucleotidenvolgordes, varkens,
Online Access:https://research.wur.nl/en/publications/bioinformatics-approaches-to-detect-genetic-variation-in-whole-ge
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Current genetic marker repositories are not sufficient or even are completely lacking for most farm animals. However, genetic markers are essential for the development of a research tool facilitating discovery of genetic factors that contribute to resistance to disease and the overall welfare and performance in farm animals. By large scale identification of Single Nucleotide Polymorphisms (SNPs) and Structural Variants (SVs) we aimed to contribute to the development of a repository of genetic variants for farm animals. For this purpose bioinformatics data pipelines were designed and validated to address the challenge of the cost effective identification of genetic markers in DNA sequencing data even in absence of a fully sequenced reference genome. To find SNPs in pig, we analysed publicly available whole genome shotgun sequencing datasets by sequence alignment and clustering. Sequence clusters were assigned to genomic locations using publicly available BAC sequencing and BAC mapping data. Within the sequence clusters thousands of SNPs were detected of which the genomic location is roughly known. For turkey and duck, species that both were lacking a sufficient sequence data repository for variant discovery, we applied next-generation sequencing (NGS) on a reduced genome representation of a pooled DNA sample. For turkey a genome reference was reconstructed from our sequencing data and available public sequencing data whereas in duck the reference genome constructed by a (NGS) project was used. SNPs obtained by our cost-effective SNP detection procedure still turned out to cover, at intervals, the whole turkey and duck genomes and are of sufficient quality to be used in genotyping studies. Allele frequencies, obtained by genotyping animal panels with a subset our SNPs, correlated well with those observed during SNP detection. The availability of two external duck SNP datasets allowed for the construction of a subset of SNPs which we had in common with these sets. Genotyping turned out that this subset was of outstanding quality and can be used for benchmarking other SNPs that we identified within duck. Ongoing developments in (NGS) allowed for paired end sequencing which is an extension on sequencing analysis that provides information about which pair of reads are coming from the outer ends of one sequenced DNA fragment. We applied this technique on a reduced genome representation of four chicken breeds to detect SVs. Paired end reads were mapped to the chicken reference genome and SVs were identified as abnormally aligned read pairs that have orientation or span sizes discordant from the reference genome. SV detection parameters, to distinguish true structural variants from false positives, were designed and optimized by validation of a small representative sample of SVs using PCR and traditional capillary sequencing. To conclude: we developed SNP repositories which fulfils a requirement for SNPs to perform linkage analysis, comparative genomics QTL studies and ultimately GWA studies in a range of farm animals. We also set the first step in developing a repository for SVs in chicken, a relatively new genetic marker in animal sciences.