Bayesian networks for omics data analysis

Gavai, A.K.; Leunissen, Jack

Bayesian networks for omics data analysis

This thesis focuses on two aspects of high throughput technologies, i.e. data storage and data analysis, in particular in transcriptomics and metabolomics. Both technologies are part of a research field that is generally called ‘omics’ (or ‘-omics’, with a leading hyphen), which refers to genomics, transcriptomics, proteomics, or metabolomics. Although these techniques study different entities (genes, gene expression, proteins, or metabolites), they all have in common that they use high-throughput technologies such as microarrays and mass spectrometry, and thus generate huge amounts of data. Experiments conducted using these technologies allow one to compare different states of a living cell, for example a healthy cell versus a cancer cell or the effect of food on cell condition, and at different levels. The tools needed to apply omics technologies, in particular microarrays, are often manufactured by different vendors and require separate storage and analysis software for the data generated by them. Moreover experiments conducted using different technologies cannot be analyzed simultaneously to answer a biological question. Chapter 3 presents MADMAX, our software system which supports storage and analysis of data from multiple microarray platforms. It consists of a vendor-independent database which is tightly coupled with vendor-specific analysis tools. Upcoming technologies like metabolomics, proteomics and high-throughput sequencing can easily be incorporated in this system. Once the data are stored in this system, one obviously wants to deduce a biological relevant meaning from these data and here statistical and machine learning techniques play a key role. The aim of such analysis is to search for relationships between entities of interest, such as genes, metabolites or proteins. One of the major goals of these techniques is to search for causal relationships rather than mere correlations. It is often emphasized in the literature that "correlation is not causation" because people tend to jump to conclusions by making inferences about causal relationships when they actually only see correlations. Statistics are often good in finding these correlations; techniques called linear regression and analysis of variance form the core of applied multivariate statistics. However, these techniques cannot find causal relationships, neither are they able to incorporate prior knowledge of the biological domain. Graphical models, a machine learning technique, on the other hand do not suffer from these limitations. Graphical models, a combination of graph theory, statistics and information science, are one of the most exciting things happening today in the field of machine learning applied to biological problems (see chapter 2 for a general introduction). This thesis deals with a special type of graphical models known as probabilistic graphical models, belief networks or Bayesian networks. The advantage of Bayesian networks over classical statistical techniques is that they allow the incorporation of background knowledge from a biological domain, and that analysis of data is intuitive as it is represented in the form of graphs (nodes and edges). Standard statistical techniques are good in describing the data but are not able to find non-linear relations whereas Bayesian networks allow future prediction and discovering nonlinear relations. Moreover, Bayesian networks allow hierarchical representation of data, which makes them particularly useful for representing biological data, since most biological processes are hierarchical by nature. Once we have such a causal graph made either by a computer program or constructed manually we can predict the effects of a certain entity by manipulating the state of other entities, or make backward inferences from effects to causes. Of course, if the graph is big, doing the necessary calculations can be very difficult and CPU-expensive, and in such cases approximate methods are used. Chapter 4 demonstrates the use of Bayesian networks to determine the metabolic state of feeding and fasting mice to determine the effect of a high fat diet on gene expression. This chapter also shows how selection of genes based on key biological processes generates more informative results than standard statistical tests. In chapter 5 the use of Bayesian networks is shown on the combination of gene expression data and clinical parameters, to determine the effect of smoking on gene expression and which genes are responsible for the DNA damage and the raise in plasma cotinine levels of blood of a smoking population. This study was conducted at Maastricht University where 22 twin smokers were profiled. Chapter 6 presents the reconstruction of a key metabolic pathway which plays an important role in ripening of tomatoes, thus showing the versatility of the use of Bayesian networks in metabolomics data analysis. The general trend in research shows a flood of data emerging from sequencing and metabolomics experiments. This means that to perform data mining on these data one requires intelligent techniques that are computationally feasible and able to take the knowledge of experts into account to generate relevant results. Graphical models fit this paradigm well and we expect them to play a key role in mining the data generated from omics experiments.

Saved in:

Bibliographic Details
Main Author:	Gavai, A.K.
Other Authors:	Leunissen, Jack
Format:	Doctoral thesis biblioteca
Language:	English
Subjects:	bayesian theory, biochemical pathways, bioinformatics, gene expression, genomics, human nutrition research, microarrays, network analysis, networks, nutrigenomics, probabilistic models, smoking, volatile compounds, bayesiaanse theorie, bio-informatica, biochemische omzettingen, genexpressie, genexpressieanalyse, netwerkanalyse, netwerken, nutrigenomica, roken, vluchtige verbindingen, voedingsonderzoek bij de mens, waarschijnlijkheidsmodellen,
Online Access:	https://research.wur.nl/en/publications/bayesian-networks-for-omics-data-analysis
Tags:	Add Tag No Tags, Be the first to tag this record!

id	dig-wur-nl-wurpubs-380114
record_format	koha
institution	WUR NL
collection	DSpace
country	Países bajos
countrycode	NL
component	Bibliográfico
access	En linea
databasecode	dig-wur-nl
tag	biblioteca
region	Europa del Oeste
libraryname	WUR Library Netherlands
language	English
topic	bayesian theory biochemical pathways bioinformatics gene expression genomics human nutrition research microarrays network analysis networks nutrigenomics probabilistic models smoking volatile compounds bayesiaanse theorie bio-informatica biochemische omzettingen genexpressie genexpressieanalyse microarrays netwerkanalyse netwerken nutrigenomica roken vluchtige verbindingen voedingsonderzoek bij de mens waarschijnlijkheidsmodellen bayesian theory biochemical pathways bioinformatics gene expression genomics human nutrition research microarrays network analysis networks nutrigenomics probabilistic models smoking volatile compounds bayesiaanse theorie bio-informatica biochemische omzettingen genexpressie genexpressieanalyse microarrays netwerkanalyse netwerken nutrigenomica roken vluchtige verbindingen voedingsonderzoek bij de mens waarschijnlijkheidsmodellen
spellingShingle	bayesian theory biochemical pathways bioinformatics gene expression genomics human nutrition research microarrays network analysis networks nutrigenomics probabilistic models smoking volatile compounds bayesiaanse theorie bio-informatica biochemische omzettingen genexpressie genexpressieanalyse microarrays netwerkanalyse netwerken nutrigenomica roken vluchtige verbindingen voedingsonderzoek bij de mens waarschijnlijkheidsmodellen bayesian theory biochemical pathways bioinformatics gene expression genomics human nutrition research microarrays network analysis networks nutrigenomics probabilistic models smoking volatile compounds bayesiaanse theorie bio-informatica biochemische omzettingen genexpressie genexpressieanalyse microarrays netwerkanalyse netwerken nutrigenomica roken vluchtige verbindingen voedingsonderzoek bij de mens waarschijnlijkheidsmodellen Gavai, A.K. Bayesian networks for omics data analysis
description	This thesis focuses on two aspects of high throughput technologies, i.e. data storage and data analysis, in particular in transcriptomics and metabolomics. Both technologies are part of a research field that is generally called ‘omics’ (or ‘-omics’, with a leading hyphen), which refers to genomics, transcriptomics, proteomics, or metabolomics. Although these techniques study different entities (genes, gene expression, proteins, or metabolites), they all have in common that they use high-throughput technologies such as microarrays and mass spectrometry, and thus generate huge amounts of data. Experiments conducted using these technologies allow one to compare different states of a living cell, for example a healthy cell versus a cancer cell or the effect of food on cell condition, and at different levels. The tools needed to apply omics technologies, in particular microarrays, are often manufactured by different vendors and require separate storage and analysis software for the data generated by them. Moreover experiments conducted using different technologies cannot be analyzed simultaneously to answer a biological question. Chapter 3 presents MADMAX, our software system which supports storage and analysis of data from multiple microarray platforms. It consists of a vendor-independent database which is tightly coupled with vendor-specific analysis tools. Upcoming technologies like metabolomics, proteomics and high-throughput sequencing can easily be incorporated in this system. Once the data are stored in this system, one obviously wants to deduce a biological relevant meaning from these data and here statistical and machine learning techniques play a key role. The aim of such analysis is to search for relationships between entities of interest, such as genes, metabolites or proteins. One of the major goals of these techniques is to search for causal relationships rather than mere correlations. It is often emphasized in the literature that "correlation is not causation" because people tend to jump to conclusions by making inferences about causal relationships when they actually only see correlations. Statistics are often good in finding these correlations; techniques called linear regression and analysis of variance form the core of applied multivariate statistics. However, these techniques cannot find causal relationships, neither are they able to incorporate prior knowledge of the biological domain. Graphical models, a machine learning technique, on the other hand do not suffer from these limitations. Graphical models, a combination of graph theory, statistics and information science, are one of the most exciting things happening today in the field of machine learning applied to biological problems (see chapter 2 for a general introduction). This thesis deals with a special type of graphical models known as probabilistic graphical models, belief networks or Bayesian networks. The advantage of Bayesian networks over classical statistical techniques is that they allow the incorporation of background knowledge from a biological domain, and that analysis of data is intuitive as it is represented in the form of graphs (nodes and edges). Standard statistical techniques are good in describing the data but are not able to find non-linear relations whereas Bayesian networks allow future prediction and discovering nonlinear relations. Moreover, Bayesian networks allow hierarchical representation of data, which makes them particularly useful for representing biological data, since most biological processes are hierarchical by nature. Once we have such a causal graph made either by a computer program or constructed manually we can predict the effects of a certain entity by manipulating the state of other entities, or make backward inferences from effects to causes. Of course, if the graph is big, doing the necessary calculations can be very difficult and CPU-expensive, and in such cases approximate methods are used. Chapter 4 demonstrates the use of Bayesian networks to determine the metabolic state of feeding and fasting mice to determine the effect of a high fat diet on gene expression. This chapter also shows how selection of genes based on key biological processes generates more informative results than standard statistical tests. In chapter 5 the use of Bayesian networks is shown on the combination of gene expression data and clinical parameters, to determine the effect of smoking on gene expression and which genes are responsible for the DNA damage and the raise in plasma cotinine levels of blood of a smoking population. This study was conducted at Maastricht University where 22 twin smokers were profiled. Chapter 6 presents the reconstruction of a key metabolic pathway which plays an important role in ripening of tomatoes, thus showing the versatility of the use of Bayesian networks in metabolomics data analysis. The general trend in research shows a flood of data emerging from sequencing and metabolomics experiments. This means that to perform data mining on these data one requires intelligent techniques that are computationally feasible and able to take the knowledge of experts into account to generate relevant results. Graphical models fit this paradigm well and we expect them to play a key role in mining the data generated from omics experiments.
author2	Leunissen, Jack
author_facet	Leunissen, Jack Gavai, A.K.
format	Doctoral thesis
topic_facet	bayesian theory biochemical pathways bioinformatics gene expression genomics human nutrition research microarrays network analysis networks nutrigenomics probabilistic models smoking volatile compounds bayesiaanse theorie bio-informatica biochemische omzettingen genexpressie genexpressieanalyse microarrays netwerkanalyse netwerken nutrigenomica roken vluchtige verbindingen voedingsonderzoek bij de mens waarschijnlijkheidsmodellen
author	Gavai, A.K.
author_sort	Gavai, A.K.
title	Bayesian networks for omics data analysis
title_short	Bayesian networks for omics data analysis
title_full	Bayesian networks for omics data analysis
title_fullStr	Bayesian networks for omics data analysis
title_full_unstemmed	Bayesian networks for omics data analysis
title_sort	bayesian networks for omics data analysis
url	https://research.wur.nl/en/publications/bayesian-networks-for-omics-data-analysis
work_keys_str_mv	AT gavaiak bayesiannetworksforomicsdataanalysis
_version_	1819150407834796032
spelling	dig-wur-nl-wurpubs-3801142024-12-03 Gavai, A.K. Leunissen, Jack Muller, Michael Hooiveld, Guido Lucas, P.J.F. Doctoral thesis Bayesian networks for omics data analysis 2009 This thesis focuses on two aspects of high throughput technologies, i.e. data storage and data analysis, in particular in transcriptomics and metabolomics. Both technologies are part of a research field that is generally called ‘omics’ (or ‘-omics’, with a leading hyphen), which refers to genomics, transcriptomics, proteomics, or metabolomics. Although these techniques study different entities (genes, gene expression, proteins, or metabolites), they all have in common that they use high-throughput technologies such as microarrays and mass spectrometry, and thus generate huge amounts of data. Experiments conducted using these technologies allow one to compare different states of a living cell, for example a healthy cell versus a cancer cell or the effect of food on cell condition, and at different levels. The tools needed to apply omics technologies, in particular microarrays, are often manufactured by different vendors and require separate storage and analysis software for the data generated by them. Moreover experiments conducted using different technologies cannot be analyzed simultaneously to answer a biological question. Chapter 3 presents MADMAX, our software system which supports storage and analysis of data from multiple microarray platforms. It consists of a vendor-independent database which is tightly coupled with vendor-specific analysis tools. Upcoming technologies like metabolomics, proteomics and high-throughput sequencing can easily be incorporated in this system. Once the data are stored in this system, one obviously wants to deduce a biological relevant meaning from these data and here statistical and machine learning techniques play a key role. The aim of such analysis is to search for relationships between entities of interest, such as genes, metabolites or proteins. One of the major goals of these techniques is to search for causal relationships rather than mere correlations. It is often emphasized in the literature that "correlation is not causation" because people tend to jump to conclusions by making inferences about causal relationships when they actually only see correlations. Statistics are often good in finding these correlations; techniques called linear regression and analysis of variance form the core of applied multivariate statistics. However, these techniques cannot find causal relationships, neither are they able to incorporate prior knowledge of the biological domain. Graphical models, a machine learning technique, on the other hand do not suffer from these limitations. Graphical models, a combination of graph theory, statistics and information science, are one of the most exciting things happening today in the field of machine learning applied to biological problems (see chapter 2 for a general introduction). This thesis deals with a special type of graphical models known as probabilistic graphical models, belief networks or Bayesian networks. The advantage of Bayesian networks over classical statistical techniques is that they allow the incorporation of background knowledge from a biological domain, and that analysis of data is intuitive as it is represented in the form of graphs (nodes and edges). Standard statistical techniques are good in describing the data but are not able to find non-linear relations whereas Bayesian networks allow future prediction and discovering nonlinear relations. Moreover, Bayesian networks allow hierarchical representation of data, which makes them particularly useful for representing biological data, since most biological processes are hierarchical by nature. Once we have such a causal graph made either by a computer program or constructed manually we can predict the effects of a certain entity by manipulating the state of other entities, or make backward inferences from effects to causes. Of course, if the graph is big, doing the necessary calculations can be very difficult and CPU-expensive, and in such cases approximate methods are used. Chapter 4 demonstrates the use of Bayesian networks to determine the metabolic state of feeding and fasting mice to determine the effect of a high fat diet on gene expression. This chapter also shows how selection of genes based on key biological processes generates more informative results than standard statistical tests. In chapter 5 the use of Bayesian networks is shown on the combination of gene expression data and clinical parameters, to determine the effect of smoking on gene expression and which genes are responsible for the DNA damage and the raise in plasma cotinine levels of blood of a smoking population. This study was conducted at Maastricht University where 22 twin smokers were profiled. Chapter 6 presents the reconstruction of a key metabolic pathway which plays an important role in ripening of tomatoes, thus showing the versatility of the use of Bayesian networks in metabolomics data analysis. The general trend in research shows a flood of data emerging from sequencing and metabolomics experiments. This means that to perform data mining on these data one requires intelligent techniques that are computationally feasible and able to take the knowledge of experts into account to generate relevant results. Graphical models fit this paradigm well and we expect them to play a key role in mining the data generated from omics experiments. en application/pdf https://research.wur.nl/en/publications/bayesian-networks-for-omics-data-analysis 10.18174/7208 https://edepot.wur.nl/7208 bayesian theory biochemical pathways bioinformatics gene expression genomics human nutrition research microarrays network analysis networks nutrigenomics probabilistic models smoking volatile compounds bayesiaanse theorie bio-informatica biochemische omzettingen genexpressie genexpressieanalyse microarrays netwerkanalyse netwerken nutrigenomica roken vluchtige verbindingen voedingsonderzoek bij de mens waarschijnlijkheidsmodellen Wageningen University & Research

Bayesian networks for omics data analysis

Similar Items

Resource Map