Feature Selection for Gene Expression Data Analysis – A Review

Authors

  • Dr.R. Prema Assistant Professor, Department of MCA, New Horizon College of Engineering, Bangalore Author

DOI:

https://doi.org/10.61841/k6renr51

Keywords:

Feature Selection Methods, Microarray Gene Expression Data, Gene Selection, Classification

Abstract

Gene selection in microarray data analysis is defined as the process of identifying a small number of informative and relevant genes that can find any sample from the dataset in the correct class. The feature selection process is categorized into three types: wrapper, embedded, and filter techniques. Filter methods use statistical ranking for feature selection by ordering the features individually. They select the relevant features independent of any supervised learning algorithm. The wrapper techniques use a number of search methods to evaluate the possible subset of important features. From that it selects the subset of features that gives the best classification accuracy. In embedded methods, feature selection methods are incorporated in the training process. This paper reviews several feature selection methods used to find significant features from gene expression data for use in classification. 

Downloads

Download data is not yet available.

References

[1] Ai-Jun, Y. & Xin-Yuan, S. 2009, ‘Bayesian variable selection for disease classification using gene expression data,’ Bioinformatics, vol. 26, no. 2, pp. 215-222.

[2] Arauzo, A., Aznarte, JL., Benítez, JM. 2011, ‘Empirical study of feature selection methods based on individual feature evaluation for classification problems,’ Expert Systems with Applications, vol. 38, no. 7, pp. 8170-8177.

[3] Battiti, R. 1994, ‘Using mutual information for selecting features in supervised neural net learning,’ IEEE Transactions on Neural Networks, vol. 5, no. 4, pp. 537-550.

[4] Bol´on-Canedo, V, Seth, S, Sanchez-Maro˜no, N & Alonso-Betanzos, A 2011, ‘Statistical dependence measure for feature selection in microarray datasets,’ Proceedings of nineteenth European Symposium on Artificial Neural Networks, pp. 27-29.

[5] Dai, J & Xu, Q 2013, ‘Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification,’ Applied Soft Computing, vol. 13, pp. 211-221.

[6] Deb, K, Agrawal, S, Pratap, A & Meyarivan, T 2000, ‘A fast elitist non-dominated sorting genetic algorithm for multiobjective optimization: Nsga-ii,’ Proceedings of International conference on parallel problem solving from nature, pp. 849-858

[7] Díaz-Uriarte, R & de Andrés, SA 2006, ‘Gene selection and classification of microarray data using random forest,’ BMC Bioinformatics, vol. 7, no. 3, pp. 1-13.

[8] Hall MA 1999, Correlation-based Feature Selection for Machine Learning. Ph.D. thesis, University of Waikato.

[9] Hoque, N, Ahmeda, HA, Bhattacharyyaa, DK & Kalitab, JK. 2016. A Fuzzy Mutual Information-based

feature selection method for classification,’ Fuzzy Information and Engineering, vol. 8, no. 3, pp. 355-384.

[10] Hoquea, N, Bhattacharyyaa, DK & Kalitab, JK 2014, ‘MIFS-ND:

[11] A mutual information-based feature selection method,’ Expert Systems with Applications, vol. 41, no. 14,pp. 6371-6385.

[12] Horng, JT, Wu, LC, Liu, BJ, Kuo, JL, Kuo, WH& Zhang, JJ 2009, ‘An expert system to classify microarray gene expression data using gene selection by decision tree,’ Expert Systems with Applications, vol. 36, no. 5, pp. 9072-9081.

[13] Hui, KH, Ooi, CS, Lim, MH, Leong, MS & Al-Obaidi, SM 2016, ‘An improved wrapper-based feature selection method for machinery fault diagnosis,’ PLoS ONE, vol. 12, no. 1, pp. 1-10.

[14] Ji, G, Yang, Z., & You, W 2011, ‘PLS-Based Gene Selection and Identification of Tumor-Specific Genes,’

IEEE Transactions on Systems, Man, and Cybernetics, vol. 41, no. 6, pp. 830-841.

[15] Kang, S & Song, J 2017, ‘Robust gene selection methods using weighting schemes for microarray data

analysis,’ BMC Bioinformatics, vol. 18, no. 1, pp. 1-15.

[16] Kennedy, J. & Eberhart, RC 1995. ‘Particle swarm optimization.’, Proceedings of IEEE International

conference on neural networks, pp. 1942-1948.

[17] Kira, K & Rendell L.A. 1992, ‘A practical approach to feature selection,’ Proceedings of ninth International

conference on machine learning, pp. 249-256.

[18] Li, W & Zhang, W 2006, ‘Gene selection using rough set theory,’ Proceedings of the first International

conference on rough sets and knowledge technology, pp. 778-785.

[19] Liu, JX, Wang, YT, Zheng, C, Sha, W, Mi J & XuY 2013, ‘Robust PCA based method for discovering

differentially expressed genes,’ Proceedings of International Conference on Intelligent, pp. 25-29.

[20] Maldonado, S., Weber, R., & Basak, J. 2011, ‘Simultaneous feature selection and classification using kernel penalized support vector machines,’ Information Sciences, vol. 181, no. 1, pp. 115-128.

[21] Maugis, C, Celeux, G., & Martin-Magniette, M 2009, ‘Variable Selection for Clustering with Gaussian

Mixture Models,’ Biometrics, vol. 65,no. 3, pp. 701-709.

[22] Meyer, PE, Schretter, C & Bontempi, G 2008, ‘Information-theoretic feature selection in microarray data

using variable complementarity,’ IEEE Journal of Selected Topics in Signal Processing, vol. 2, no. 3, pp.

261-274.

[23] Mishra, D. & Sahu, B 2011, ‘Feature selection for cancer classification: A Signal-to-Noise Ratio approach,’

International Journal of Scientific & Engineering Research, vol. 2, no. 4, pp. 1-7.

[24] Navarro, FG & Munoz, LB 2009, ‘Gene subset selection in microarray data using entropic filtering for

cancer classification,’ Expert Systems, vol. 26, no. 1, pp. 113-124.

[25] Pena, JM, Lozano, JA, Larranaga, P & Inza, I 2001, ‘Dimensionality reduction in unsupervised learning of

conditional Gaussian networks,’ IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23,

no. 6, pp. 590-603.

[26] Raftery, AE & Dean, N 2006, ‘Variable Selection for Model-Based Clustering,’ Journal of the American

Statistical Association, vol. 101, no. 473,p. 168-178.

[27] Seth, S & Principe, JC. 2010, ‘Variable selection: A statistical dependence perspective,’ Proceedings of

Ninth International Conference on Machine Learning and Applications, pp. 931-936.

[28] Sharma, A, Imoto, S & Miyano, S 2012, ‘A Top-r Feature Selection Algorithm for Microarray Gene

Expression Data,’ IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 3,

pp. 754-764.

[29] Song, L, Smola, A, Gretton, A, Bedo, J & Borgwardt, K 2012, ‘Feature selection via dependence

maximization,’ Journal of Machine Learning Research, vol. 13, pp. 1393-1434.

[30] Srivastava, B, Srivastava, R & Jangid, M 2014, ‘Filter vs. wrapper approach for optimum gene selection of

high dimensional gene expression dataset: an analysis with cancer datasets,’ Proceedings of the

International Conference on High Performance Computing, pp. 1-6.

[31] Sun, Y, Todorovic, S., & Goodison, S 2010, ‘Local learning based feature selection for high dimensional

data analysis,’ IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1610

1626.

[32] Tusher, VG, Tibshirani, R & Chu, G 2001, ‘Significance analysis of microarrays applied to the ionizing radiation response,’ Proceedings of the National Academy of Sciences, pp. 5116-21.

[33] Wang, Y., Tetkoa, I. V., Hallb, M. A., Frankb, E., Faciusa, A., Mayera, K. F., Mewesa, H. W. 2005, ‘Gene selection from microarray data for cancer classification—a machine learning approach,’ Computational Biology and Chemistry, vol. 29, pp. 37-46.

[34] Least squares regression for multiclass classification and feature selection,’ IEEE Transactions Neural Networks Learning Systems, vol. 23, no. 11, pp. 1738-1754.

[35] Feature Wrappers,’ Genome Research, vol. 11, pp. 1878-1887.

[36] microarray data with repeated measurements: application to cancer,’ Genome Biology, vol. 4, no. 12, pp.83.1-83.19.

[37] Zhu, S, Wang, D, Yu, K, Li, T & Gong, Y 2010, ‘Feature Selection for Gene Expression using Model-Based Entropy,’ IEEE Transactions on Computational Biology and Bioinformatics, vol. 7, no. 1, pp. 1610-1626.

Downloads

Published

31.07.2020

How to Cite

R. , P. (2020). Feature Selection for Gene Expression Data Analysis – A Review. International Journal of Psychosocial Rehabilitation, 24(5), 6955-6964. https://doi.org/10.61841/k6renr51