Matlab Codes and Data Sets
return to the main page


MATLAB Codes : inference in logistic regression models                     [Updated : January, 31 2005]


Download the package PLS-logistic.tar.gz , developped by G. Fort.           
contact about the codes : gfort at tsi dot enst dot fr
Demo :     download the package
                           call the function   ExampleBinary  in the Matlab command window for a demonstration (binary classification).
                           call the function   ExampleMulti    in the Matlab command window for a demonstration (multi-class classification).

The package contains :  
  • Codes for the 2-class case :
    • IRRLS : a  Matlab function called by RPLS.

    • ExampleBinary : an example of Matlab instructions to use the above functions. The different procedures are applied to the estimation of a regression coefficient in a logistic regression model. The data set (from the Microarrays litterature) is characterized by a number of covariables far larger than the number of observations, and a highly collinear data matrix. In this example, the estimate of the regression coefficient raising from extensions of PLS to GLM is plugged in a logistic discrimination procedure, and (binary) classification is performed. The data set is the Colon data (see below); the publicly available data have to be pre-processed and this is done by calling the function PreProcess.
    • PreProcess : a Matlab function called by ExampleBinary, that codes pre-processing steps (that can be applied to Leukemia data, Colon data, Prostate data, ...).
    • MIRRLS : a  Matlab function called by MRPLS.
    • ExampleMulti : an example of Matlab instructions to use the above functions. The different procedures are applied to the estimation of a regression coefficient in a logistic regression model. The data set (from the Microarrays litterature) is characterized by a number of covariables far larger than the number of observations, and a highly collinear data matrix. In this example, the estimate of the regression coefficient raising from extensions of PLS to GLM is plugged in a polychotomous discrimination procedure, and (multi-class) classification is performed. The data set is the Leukemia data (see below); the publicly available data have to be pre-processed and this is done by calling the function PreProcess (same as above).
  •  Help
    • The functions have a "help". For a help on the function RPLS, for example, type help RPLS in the MATLAB command window.
    • We now roughly present the input and output variables of the different functions :

INPUT arguments of the functions NR, IRPLS, IRPLSF, RPLS, RIDGE, MNR, MIRPLSF, MRPLS, MRIDGE :
    • Y contains the {0, ..., c}-valued response variable from the learning set : matrix nLearn x 1.
    • X contains the data matrix : matrix nLearn x p (do NOT include the intercept term; it is added in the function).
    • flag is a {0,1}-valued; it is set to 1 if the data matrix has to be standardized (each column is centered with norm 1).
    • All of these methods have a dimension reduction step based on PLS; Kappa contains the number of PLS components.
    • All of these methods contain at least one iterative part; Parameters is a vector that contains the maximal number of iterations and a tolerance for the stopping criterion. A suggestion for the definition of this variable is given in the help of each function.
    • In the multi-class case, NbrClass is the number of classes.  Ex. if  5 classes, NbrClass=5 and Y is {0, ..., 4}-valued.
    • RPLS and MRPLS also depends on a shrinkage parameter; Lambdarange is a range of candidate values; the computation of the 'optimal' one is part of the RPLS and MRPLS functions.
OUTPUT structure of the functions NR, IRPLS, IRPLSF, RPLS, RIDGE, MNR, MIRPLSF, MRPLS, MRIDGE :
    • The structure contains a field Gamma, that collects the estimate of the regression coefficients
      • 2-class case : the regression coefficient is a (p+1) x 1 matrix. Thus Gamma contains an estimate for each value of the hyper-parameter Kappa : (p+1) x length(Kappa) matrix.
      • Multi-class case : the regression coefficient is a (p+1) x (NbrClass-1) matrix. Thus Gamma contains an estimate for each value of the hyper-parameter Kappa : length(Kappa) x (p+1) x (NbrClass-1) matrix.
    • The structure also collects informations on the convergence of the different iterative schemes.
    • The functions RPLS and MRPLS also return the optimal value of the hyperparameter Lambda determined over the range Lambdarange.




DATA SETS (publicly available) from the Microarray literature.

In our convention, the expression levels of a microarray are collected in a row of the data matrix :
** column #j contains all the values relative to gene #j.
** row #i contains all the values relative to sample #i.

Some data sets have to be pre-processed before use. The classical pre-processing steps for the first three data sets is coded in the MATLAB function PreProcess (see above).


      Colon data set :
    • download the data in .txt format from   Princeton : Data matrix 62 x 2000.
    • see Alon et al. (1999) for a complete description of the data set.
    • 2 class example. To be pre-processed.
       Prostate data set :
    • download the data from  MIT  : Data matrix 102 x 12600
    • see Singh et al. (2002)  for a complete description of the data set.
    • 2 class example. To be pre-processed.
        Leukemia data set :
    • download the data in .txt format from the  MIT : training set (38 x 7129) and test set (34 x 7129).
    • see Golub et al. (1999) for a complete description of the data set.
    • 2 or 3 class example. To be pre-processed.
        NCI60 data set :
    • download the data from nci : Data matrix 60 x 1415
        Ovarian data set :
    • download the data from the Latent Process Decomposition  home page : Data matrix 54 x 1536
        Other famous data set (breast cancer, diffuse large B-cell lymphoma, ...) :