# Dorsomorphin (Compound C) br for this purpose Learning proce

2010) for this purpose. Learning process usually starts by running K2 algorithm for finding an improved solution. If optimum net-work structure is obtained in a period of time, the process is stopped or returned; otherwise; one of the other methods (BB, IP or DP) are applied depending on the number of variables.

3.2. Predictive models

Decision tree is a supervised model and a useful, understand-able and simple method for classification. Its inputs and outputs are labeled training data and an organized sequential tree struc-ture, respectively. One of its basic advantages can be stated as decomposition of complex problems into smaller and simpler problems. Classification process consists of labeling the input instances by a traversing from the root Dorsomorphin (Compound C) to the leaf nodes with respect to the fact that the leaf nodes are class labels. There are two issues involved in building decision tree: (1) growing the tree as long as training set instances are accurately classified. (2) Pruning until the unnecessary nodes are eliminated in order to improve the overall accuracy (Zheng et al., 2014; Buyse, 2006; Jaakkola and Meila, 2010). C4.5 is a kind of decision tree which can be used for both discrete and continuous data. Hence, we benefited from this method for our classification task.

3.2.2. K- nearest neighbor classifier

This method is a well-known instance-based method. The label class of a new instance is predicted using the majority of k nearest neighbors labels which belong to training instances based on a cer-tain distance measure.

K-NN involves two important issues: (1) the number of neigh-bors (k), (2) distance measure (d). If the number of neighbors is low, the outlier examples may affect the results, while large num-ber of neighbors may face the interference of unrelated data (Zheng et al., 2014; Buyse, 2006; Clark and Niblett, 1989; Purwar and Singh, 2015; Cortes and Vapnik, 1995) .In this study, the opti-mal number of neighbors can be obtained using cross validation procedure.

Support vector machine (Cortes and Vapnik, 1995) is a kernel-based method that is also widely used for classification. If the training set is linear separable, SVM makes hyper planes with max-imum margin; otherwise, it is mapped to other space with greater dimension to be linearly separable (Vidyasagar, 2017). Though SVM particularly works for dataset containing two classes and its basic idea is finding the optimal discrimination between two classes, there are also ways to be extend it for multi-class datasets, i.e. one-against-all (OAA) and one-against-one (OAO). OAA approach requires k separators for k-class classification such that every separator is used to separate one class from all other classes. While OAO approach creates binary vector machines for each pos-sible combination of classes (a binary vector machine per possible combination of classes). OAA algorithm is generally considered a k (k 1)/2 binary vector machine (Choi and Jiang, 2010).

4. Proposed approach

As stated before, Bayesian network is a strong model for repre-senting conditional dependencies between variable, but this model lacks generalization to continuous variables due to the necessity of prior knowledge for the conditional densities. Due to this reason, Bayesian network is proper for imputing missing values when data is of finite domain variables. On the other hand, tensor factoriza-tion estimates missing features with the linear combination of others (not necessarily continuous or discrete) and therefore the estimated value is usually precise. However, constructing tensors in the presence of large amount of missing values is erroneous. Therefore, for better value estimation, in the proposed approach the categorical values are firstly estimated using Bayesian network and then the partially completed dataset is fed to the tensor factor-ization approach for imputing continuous missing values.