Uncovering cluster structure and group-specific associations: variable selection in multivariate mixture regression models.

Variable selection for mixture of regression models has been the focus of much research in recent years. These models combine the ideas of mixture models, regression models, and variable selection to uncover group structures and key relationships between data sets. The objective is to identify homogeneous groups of objects and determine the cluster-specific subsets of covariates modulating the outcomes. In this chapter we review frequentist and Bayesian methods we have proposed to address in a unified manner the problems of cluster identification and cluster-specific variable selection in the context of mixture of regression models. These methods have a wide range of applications, in particular in the context of high-dimensional data analysis. We illustrate their performance in two diverse areas: one in ecology for modeling species-rich ecosystems and the other in genomics for integrating data from different genomic sources.

Saved in:
Bibliographic Details
Main Authors: Tadesse, Mahlet G., Mortier, Frédéric, Monni, Stefano
Format: book_section biblioteca
Language:eng
Published: Springer International Publishing
Subjects:U10 - Informatique, mathématiques et statistiques,
Online Access:http://agritrop.cirad.fr/585189/
http://agritrop.cirad.fr/585189/1/tadesse16.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Variable selection for mixture of regression models has been the focus of much research in recent years. These models combine the ideas of mixture models, regression models, and variable selection to uncover group structures and key relationships between data sets. The objective is to identify homogeneous groups of objects and determine the cluster-specific subsets of covariates modulating the outcomes. In this chapter we review frequentist and Bayesian methods we have proposed to address in a unified manner the problems of cluster identification and cluster-specific variable selection in the context of mixture of regression models. These methods have a wide range of applications, in particular in the context of high-dimensional data analysis. We illustrate their performance in two diverse areas: one in ecology for modeling species-rich ecosystems and the other in genomics for integrating data from different genomic sources.