Feature Selection for Gene Expression Data Analysis – A Review
DOI:
https://doi.org/10.61841/k6renr51Keywords:
Feature Selection Methods, Microarray Gene Expression Data, Gene Selection, ClassificationAbstract
Gene selection in microarray data analysis is defined as the process of identifying a small number of informative and relevant genes that can find any sample from the dataset in the correct class. The feature selection process is categorized into three types: wrapper, embedded, and filter techniques. Filter methods use statistical ranking for feature selection by ordering the features individually. They select the relevant features independent of any supervised learning algorithm. The wrapper techniques use a number of search methods to evaluate the possible subset of important features. From that it selects the subset of features that gives the best classification accuracy. In embedded methods, feature selection methods are incorporated in the training process. This paper reviews several feature selection methods used to find significant features from gene expression data for use in classification.
Downloads
References
[1] Ai-Jun, Y. & Xin-Yuan, S. 2009, ‘Bayesian variable selection for disease classification using gene expression data,’ Bioinformatics, vol. 26, no. 2, pp. 215-222.
[2] Arauzo, A., Aznarte, JL., Benítez, JM. 2011, ‘Empirical study of feature selection methods based on individual feature evaluation for classification problems,’ Expert Systems with Applications, vol. 38, no. 7, pp. 8170-8177.
[3] Battiti, R. 1994, ‘Using mutual information for selecting features in supervised neural net learning,’ IEEE Transactions on Neural Networks, vol. 5, no. 4, pp. 537-550.
[4] Bol´on-Canedo, V, Seth, S, Sanchez-Maro˜no, N & Alonso-Betanzos, A 2011, ‘Statistical dependence measure for feature selection in microarray datasets,’ Proceedings of nineteenth European Symposium on Artificial Neural Networks, pp. 27-29.
[5] Dai, J & Xu, Q 2013, ‘Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification,’ Applied Soft Computing, vol. 13, pp. 211-221.
[6] Deb, K, Agrawal, S, Pratap, A & Meyarivan, T 2000, ‘A fast elitist non-dominated sorting genetic algorithm for multiobjective optimization: Nsga-ii,’ Proceedings of International conference on parallel problem solving from nature, pp. 849-858
[7] Díaz-Uriarte, R & de Andrés, SA 2006, ‘Gene selection and classification of microarray data using random forest,’ BMC Bioinformatics, vol. 7, no. 3, pp. 1-13.
[8] Hall MA 1999, Correlation-based Feature Selection for Machine Learning. Ph.D. thesis, University of Waikato.
[9] Hoque, N, Ahmeda, HA, Bhattacharyyaa, DK & Kalitab, JK. 2016. A Fuzzy Mutual Information-based
feature selection method for classification,’ Fuzzy Information and Engineering, vol. 8, no. 3, pp. 355-384.
[10] Hoquea, N, Bhattacharyyaa, DK & Kalitab, JK 2014, ‘MIFS-ND:
[11] A mutual information-based feature selection method,’ Expert Systems with Applications, vol. 41, no. 14,pp. 6371-6385.
[12] Horng, JT, Wu, LC, Liu, BJ, Kuo, JL, Kuo, WH& Zhang, JJ 2009, ‘An expert system to classify microarray gene expression data using gene selection by decision tree,’ Expert Systems with Applications, vol. 36, no. 5, pp. 9072-9081.
[13] Hui, KH, Ooi, CS, Lim, MH, Leong, MS & Al-Obaidi, SM 2016, ‘An improved wrapper-based feature selection method for machinery fault diagnosis,’ PLoS ONE, vol. 12, no. 1, pp. 1-10.
[14] Ji, G, Yang, Z., & You, W 2011, ‘PLS-Based Gene Selection and Identification of Tumor-Specific Genes,’
IEEE Transactions on Systems, Man, and Cybernetics, vol. 41, no. 6, pp. 830-841.
[15] Kang, S & Song, J 2017, ‘Robust gene selection methods using weighting schemes for microarray data
analysis,’ BMC Bioinformatics, vol. 18, no. 1, pp. 1-15.
[16] Kennedy, J. & Eberhart, RC 1995. ‘Particle swarm optimization.’, Proceedings of IEEE International
conference on neural networks, pp. 1942-1948.
[17] Kira, K & Rendell L.A. 1992, ‘A practical approach to feature selection,’ Proceedings of ninth International
conference on machine learning, pp. 249-256.
[18] Li, W & Zhang, W 2006, ‘Gene selection using rough set theory,’ Proceedings of the first International
conference on rough sets and knowledge technology, pp. 778-785.
[19] Liu, JX, Wang, YT, Zheng, C, Sha, W, Mi J & XuY 2013, ‘Robust PCA based method for discovering
differentially expressed genes,’ Proceedings of International Conference on Intelligent, pp. 25-29.
[20] Maldonado, S., Weber, R., & Basak, J. 2011, ‘Simultaneous feature selection and classification using kernel penalized support vector machines,’ Information Sciences, vol. 181, no. 1, pp. 115-128.
[21] Maugis, C, Celeux, G., & Martin-Magniette, M 2009, ‘Variable Selection for Clustering with Gaussian
Mixture Models,’ Biometrics, vol. 65,no. 3, pp. 701-709.
[22] Meyer, PE, Schretter, C & Bontempi, G 2008, ‘Information-theoretic feature selection in microarray data
using variable complementarity,’ IEEE Journal of Selected Topics in Signal Processing, vol. 2, no. 3, pp.
261-274.
[23] Mishra, D. & Sahu, B 2011, ‘Feature selection for cancer classification: A Signal-to-Noise Ratio approach,’
International Journal of Scientific & Engineering Research, vol. 2, no. 4, pp. 1-7.
[24] Navarro, FG & Munoz, LB 2009, ‘Gene subset selection in microarray data using entropic filtering for
cancer classification,’ Expert Systems, vol. 26, no. 1, pp. 113-124.
[25] Pena, JM, Lozano, JA, Larranaga, P & Inza, I 2001, ‘Dimensionality reduction in unsupervised learning of
conditional Gaussian networks,’ IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23,
no. 6, pp. 590-603.
[26] Raftery, AE & Dean, N 2006, ‘Variable Selection for Model-Based Clustering,’ Journal of the American
Statistical Association, vol. 101, no. 473,p. 168-178.
[27] Seth, S & Principe, JC. 2010, ‘Variable selection: A statistical dependence perspective,’ Proceedings of
Ninth International Conference on Machine Learning and Applications, pp. 931-936.
[28] Sharma, A, Imoto, S & Miyano, S 2012, ‘A Top-r Feature Selection Algorithm for Microarray Gene
Expression Data,’ IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 3,
pp. 754-764.
[29] Song, L, Smola, A, Gretton, A, Bedo, J & Borgwardt, K 2012, ‘Feature selection via dependence
maximization,’ Journal of Machine Learning Research, vol. 13, pp. 1393-1434.
[30] Srivastava, B, Srivastava, R & Jangid, M 2014, ‘Filter vs. wrapper approach for optimum gene selection of
high dimensional gene expression dataset: an analysis with cancer datasets,’ Proceedings of the
International Conference on High Performance Computing, pp. 1-6.
[31] Sun, Y, Todorovic, S., & Goodison, S 2010, ‘Local learning based feature selection for high dimensional
data analysis,’ IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1610
1626.
[32] Tusher, VG, Tibshirani, R & Chu, G 2001, ‘Significance analysis of microarrays applied to the ionizing radiation response,’ Proceedings of the National Academy of Sciences, pp. 5116-21.
[33] Wang, Y., Tetkoa, I. V., Hallb, M. A., Frankb, E., Faciusa, A., Mayera, K. F., Mewesa, H. W. 2005, ‘Gene selection from microarray data for cancer classification—a machine learning approach,’ Computational Biology and Chemistry, vol. 29, pp. 37-46.
[34] Least squares regression for multiclass classification and feature selection,’ IEEE Transactions Neural Networks Learning Systems, vol. 23, no. 11, pp. 1738-1754.
[35] Feature Wrappers,’ Genome Research, vol. 11, pp. 1878-1887.
[36] microarray data with repeated measurements: application to cancer,’ Genome Biology, vol. 4, no. 12, pp.83.1-83.19.
[37] Zhu, S, Wang, D, Yu, K, Li, T & Gong, Y 2010, ‘Feature Selection for Gene Expression using Model-Based Entropy,’ IEEE Transactions on Computational Biology and Bioinformatics, vol. 7, no. 1, pp. 1610-1626.
Downloads
Published
Issue
Section
License
Copyright (c) 2020 AUTHOR

This work is licensed under a Creative Commons Attribution 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
- Attribution — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation .
No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.