br Wrapper approaches evaluate the quality of
Wrapper approaches evaluate the quality of a feature subset through classification accuracy. These types of algorithms heuristically look for the important gene set within an exponential search space. Backward Feature Elimination (BFE), Ant Colony Optimization, Particle Swarm Optimization (PSO), and Genetic Algorithm (GA) [7-9] belong to wrapper approaches
GSA is a global search approach in which the Newton’s Second Law of gravity is used to search for an optimal solution for optimization problems. GSA looks for optimal feature subsets through a Artesunate function value and is able to obtain fast convergence toward a global optimum within limited iterative times [10,11]. Currently, there are lots of wrapper methods based on GSA proposed to select important features in real applications [12–15].
For microarray datasets, the filter approach usually has the advantage of costing less computational burden. Wrapper approaches have the issue of highly computational overhead in assessing candidate feature subsets, since wrapper methods use a certain learning algorithm on the dataset for each feature subset . Because of the embedding of a certain learning algorithm, wrapper approaches are able to get better performance as regards to classification accuracy than filter methods .
Recursive feature selection approaches have gained much more attention[18-20]. These methods recursively eliminated gene features from microarray cancer data. It is observed that these methods improve the overall accuracy slightly and reduce the computational cost using less features, however, they lose the optimal solution.
In this work, we develop a novel classification model called ReliefF -RBGSA-MNB(ReliefF- Recursive Binary Gravitational Search Algorithm-Multinomial Naive Bayes) to accurately classify test data by selecting most informative and discriminative gene subset. This model integrates ReliefF and RBGSA into an united approach thus simultaneously obtaining less computational cost and higher classification performance. In our wrapper method, we develop a recursive binary GSA(RBGSA) scheme motivated by [18-21] which gradually transforms a very row gene set to an optimized one through decreasing the gene set at each iteration. The RBGSA selects important features while not spoiling the accuracy and simultaneously reduces the computational cost.
In Section 2, we present the proposed ReliefF-RBGSA-MNB model in detail. Then the experimental setup and results are given in Section 3. Section 4 provide s conclusions.
2. Proposed cancer classification
2.1.The proposed ReliefF-RBGSA-MNB
We use a multivariate filter to remove redundant and irrelevant genes. Before gene pre -filtering, we first perform a data preprocessing step, we use average values to replace the losing values of gene data and all data is normalized using Eq.(1) .
Then we use the ReliefF to reduce the high dimensionality of gene space. We denote the training set with m samples as D= (, yi) ∈X, yi∈L, 1 ≤ i ≤ m , and use X to denote the original gene set which has m samples of gene expressions, and each sample (i= 1, 2, . . ., m) consists of n features of gene. We use ReliefF to explore X to get a small subset , whose dimensionality is s ( s ≤ n ), in , each sample consists of s features of gene. Given a randomly selected instance R from the training set D, k closest neighbour samples (called Hj) are obtained from the identical class and also another k closest neighbour samples are got from other classes (called Mj(C)). The weight of feature is used to distinguish different classes. The weight of each feature is calculated through Eq.(2):
in which function
represents the discrepancy between the instance