br To implement an e cient survivability prediction
To implement an e cient survivability prediction system for BC patients, a genetic algorithm-based online gradient boosting (GAOGB) model is proposed in this RGX-104 research . Boosting is an ensem-ble learning method for improving the accuracy of a given weak learner (Freund, Schapire, & Abe, 1999). Among multiple classifica-tion techniques, boosting has shown a great performance capabil-ity with high accuracy and minimal inclination to overfitting. For online boosting algorithms in the practical BC prognosis, the op-timality of parameters is a critical factor. The genetic algorithm is one of the typical methods applied to BC research for cancer clas-sification and image segmentation tasks in recent years, focusing on optimizing model structures (Bhardwaj & Tiwari, 2015) or pa-rameter determination (Pereira, Ramos, & Do Nascimento, 2014). Parameter selection has been a challenge to online learning, which makes hard to apply for practical problems such as BC research. Hence, GA as a classical optimizer has been applied to tackle pa-rameter selection problems. To formulate a practical BC progno-sis model, for example, the GAOGB model is developed. The pro-posed algorithm, together with other online learning techniques, such as the Adaptive Boosting (AdaBoost) based online learning al-gorithm (Parag, Porikli, & Elgammal, 2008) and the Online Sequen-tial Extreme Learning Machine (OSELM) (Liang, Huang, Saratchan-dran, & Sundararajan, 2006), are tested and compared on three UCI datasets as well as the Surveillance, Epidemiology, and End Results (SEER) Breast Cancer dataset. The GAOGB model is com-prehensively compared to other models in measurements such as accuracy and area under ROC curve (AUC). The outstanding perfor-mance of GAOGB model indicates the superiority of applying on-line learning to practical BC research, which expects to enhance both learning effectiveness and retraining e ciency. In addition, this research shows the potential to promote online learning re-search by focusing on issues of adaptiveness and parameter tuning.
The rest of this paper is organized as follows. Section 2 presents the current trends of BC research and online boosting research. Section 3 introduces the methodology of the proposed GAOGB model. The experimental design and results are discussed in Section 4. Section 5 presents conclusions and prospects for future extensions.
2. Literature review
Since the 1990s, basic data mining models have been applied to BC diagnosis and prognosis, which included the artificial neu-ral networks (Ravdin & Clark, 1992), linear programming (LP) (Mangasarian, Street, & Wolberg, 1995), the C4.5 and C5.0 classi-fiers (Delen, Walker, & Kadam, 2005; Quinlan, 1996), etc. Among the various model structures applied to BC research in recent years, ensemble learning methods have shown predominating ad-vantages over other models due to their ability to overcome the limitations and aggregate the strengths of a group of base mod-els. An ensemble ANN model achieved a 94.1% generalization ac-curacy on a BC dataset with 692 specimens of fine needle aspi-rates of breast lumps (FNAB) (Sharkey, Sharkey, & Cross, 1998). West, Mangiameli, Rampal, and West (2005) proposed a bagging ensemble model with a diverse collection of base learners, includ-ing the linear discriminant analysis (LDA), logistic regression (LR), multilayer perceptron (MLP), etc. Their model outperformed other pure-bagging ensembles and single models by achieving a 97.1% generalization accuracy on the WBCD. To improve the training ef-ficiency, Lavanya and Rani (2012) introduced a classification and regression tree (CART) classifier with feature selection and bagging technique. Evaluated on the WBCD, their model achieves a great balance between effectiveness and e ciency with a 97.9% testing accuracy and 1.38 s of training time. To reduce the diagnosis vari-ance and enhance the diagnostic effectiveness, Wang, Zheng, Yoon, and Ko (2017) proposed a SVM-based ensemble learning algorithm using the weighted area under the receiver operating characteristic (ROC) curve. Compared to the state-of-art SVM model, their model achieved an accuracy of 76.4% on the balanced SEER BC dataset, with 97.9% reduction on variance and 33.3% improvement on train-ing accuracy, which su ciently validates the effectiveness of en-semble modeling techniques.