On the estimation of mixtures of Poisson regression models with large number of components

Panagiotis PAPASTAMOULIS, Marie-Laure MARTIN-MAGNIETTE, Cathy MAUGIS-RABUSSEAU

2016, January

Abstract

Modelling heterogeneity in large datasets of counts under the presence of covariates demands advanced clustering methods. Towards this direction a mixture of Poisson regressions is proposed. Conditionally on the covariates and a cluster, the multivariate distribution is a product of independent Poisson distributions. A variety of different parameterizations is taken into account for the slope of the conditional log-means. Also considered is the case of partitioning the response variables into sets of replicates sharing the same conditional log-mean up to an additive constant. Model parameters are estimated via an Expectation–Maximization algorithm with Newton–Raphson steps. In particular, an efficient initialization is introduced in order to improve the inference: a splitting scheme is combined with a Small-EM strategy. Simulations and application on two real high-throughput sequencing datasets highlight improvements of parameter estimations. The proposed methodology is implemented in the R package poisson.glm.mix, available on CRAN.

Type

Journal article

Publication

Computational Statistics & Data Analysis, (93), pp. 97–106

On the estimation of mixtures of Poisson regression models with large number of components

Abstract

Cathy MAUGIS-RABUSSEAU

Professor