1. Feature Selection Methods for Efficient Classification of Gene Expression Data

    Dr. B. Chandra

    Abstract - Classification of gene expression data plays a significant role in prediction and diagnosis of diseases. Gene expression data has a special characteristic that there is a mismatch in gene dimension as opposed to sample dimension. All genes do not contribute for efficient classification of samples. A robust feature selection algorithm is required to identify the important genes which help in classifying the samples efficiently. The tutorial will focus on both supervised and unsupervised feature selection techniques suited for Gene Expression data. Supervised feature selection techniques like Relief F, mRMR and unsupervised feature selection techniques like Laplacian score will be discussed in depth with benchmark microarray datasets. A new feature selection technique based on statistically defined effective range of features termed as Effective Range based Gene Selection , ERGS(Chandra et.al, 2011) will also be dealt with in depth which helps in identifying the most relevant genes responsible for diseases like leukemia, colon cancer. Topics to be covered:

    1. Introduction to filter wrapper and hybrid approaches in feature selection for Gene Expression Data
    2. Relief - F feature selection method and its illustration
    3. mRMR feature selection method for Gene Expression Data (both continuous and discrete)
    4. Laplacian Score - an unsupervised feature selection technique
    5. Effective Range-Based Gene selection technique developed by the presenter and published in Journal of Biomedical Informatics, 2011
    6. Comparison of all methods