5.2
Impact Factor
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Filter by Categories
Corrigendum
Current Issue
Editorial
Erratum
Full Length Article
Full lenth article
Letter to Editor
Original Article
Research article
Retraction notice
Review
Review Article
SPECIAL ISSUE: ENVIRONMENTAL CHEMISTRY
5.3
Impact Factor
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Filter by Categories
Corrigendum
Current Issue
Editorial
Erratum
Full Length Article
Full lenth article
Letter to Editor
Original Article
Research article
Retraction notice
Review
Review Article
SPECIAL ISSUE: ENVIRONMENTAL CHEMISTRY
View/Download PDF

Translate this page into:

Original article
12 (
8
); 2141-2149
doi:
10.1016/j.arabjc.2014.12.021

QSAR study of CK2 inhibitors by GA-MLR and GA-SVM methods

Department of Chemistry, Payame Noor University (PNU), P.O. Box 19395-3697, Tehran, Iran
Laboratory of Analytical Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, Panepistimiopolis Zografou, 15771 Athens, Greece
Center of Excellence in Electrochemistry, Faculty of Chemistry, University of Tehran, P.O. Box 14155-6455, Tehran, Iran
Biosensor Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran

⁎Corresponding author. Tel.: +98 45 33515003; fax: +98 45 33513005. pourbasheer@ut.ac.ir (Eslam Pourbasheer)

Disclaimer:
This article was originally published by Elsevier and was migrated to Scientific Scholar after the change of Publisher.

Peer review under responsibility of King Saud University.

Abstract

In this work, the quantitative structure–activity relationship models were developed for predicting activity of a series of compounds such as CK2 inhibitors using multiple linear regressions and support vector machine methods. The data set consisted of 48 compounds was divided into two subsets of training and test set, randomly. The most relevant molecular descriptors were selected using the genetic algorithm as a feature selection tool. The predictive ability of the models was evaluated using Y-randomization test, cross-validation and external test set. The genetic algorithm-multiple linear regression model with six selected molecular descriptors was obtained and showed high statistical parameters (R2train = 0.893, R2test = 0.921, Q2LOO = 0.844, F = 43.17, RMSE = 0.287). Comparison of the results between GA-MLR and GA-SVM demonstrates that GA-SVM provided better results for the training set compounds; however, the predictive quality for both models is acceptable. The results suggest that atomic mass and polarizabilities and also number of heteroatom in molecules are the main independent factors contributing to the CK2 inhibition activity. The predicted results of this study can be used to design new and potent CK2 inhibitors.

Keywords

QSAR
Support vector machine
Genetic algorithm
Multiple linear regressions
CK2 inhibitors
1

1 Introduction

Protein kinase CK2 (casein kinase 2) is a ubiquitous serine/threonine protein kinase and located in cytoplasm and the nucleus (Meggio and Pinna, 2003). The protein consists of a heterotetrameric complex in which it includes two catalytic isoforms and regulatory subunits in different combination (Meggio and Pinna, 2003; Guerra and Issinger, 2008; Faust et al., 2000; Orlandini et al., 1998). CK2 plays a key role in proliferation (Guerra and Issinger, 2008, 1999), transformation (Ruzzene and Pinna, 2010), apoptosis (Guerra and Issinger, 2008; Ahmad et al., 2008), survival (Guerra and Issinger, 2008; Ruzzene and Pinna, 2010; Barata, 2011) and cell growth (Meggio and Pinna, 2003; Guerra and Issinger, 2008; Duncan et al., 2010). Besides the involvement of CK2 in various cellular functions (Meggio and Pinna, 2003; Ahmad et al., 2008; Canton and Litchfield, 2006), overexpression of CK2 could lead to different number of cancer diseases (Faust et al., 2000; Kramerov et al., 2006; Duncan and Litchfield, 2008; Landesman-Bollag et al., 2001), including breast (Drygin et al., 2011), renal (Landesman-Bollag et al., 2001), leukemias (Piazza et al., 2013), prostate and lung cancers (Guerra and Issinger, 2008). The increased level of CK2 can also result in several central nervous system diseases such as Alzheimer, Parkinson, brain ischemia and memory impairments (Meggio and Pinna, 2003; Landesman-Bollag et al., 2001; Sarno and Pinna, 2008). Owing to this issue, this protein can be targeted for the possible treatment of various cancers and nervous diseases (Sarno and Pinna, 2008). Designing new drugs requires screening of their estrogenic and biological activities; however, performing these experiments needs biological materials of human and rat trials where they are costly, time-consuming and may provide some toxic products. Therefore, this is of interest to employ a model for predicting the biological activities of newly designed compounds before synthesis.

There has been growing interest over computational methods to predict the biological activities of compounds, since designing new compounds with higher inhibitory activities cannot be done unless we get aware of their biological features. In this regard, there is a well-known method which could provide useful information based on biological activities and chemical structures of designed molecules (Habibi-Yangjeh et al., 2008). Quantitative structure–activity relationship (QSAR) (Pourbasheer et al., 2013; Timmerman, 1995) is a widely used method for predicting the biological activities of compounds using experimental data and chemical structures (Habibi-Yangjeh et al., 2009).

Since the QSAR model develops based on molecular descriptors, selecting the most appropriate descriptors is one of the essential steps in performing QSAR study. There are some effective and widely used methods as variable selection tool such as stepwise (SW) (Draper and Smith, 1981; Hocking, 1976), genetic algorithms (GAs) (Holland, 1975), and simulated annealing (Shen et al., 2003). Application of these methods would lead to selection of the most relevant descriptors and then, based on these descriptors, a predictive QSAR model can be built using different methods such as multiple linear regression (MLR) (Pourbasheer et al., 2017), partial least square (PLS) (Khajehsharifi et al., 2009), artificial neural network (ANN) (Habibi-Yangjeh et al., 2008), and support vector machine (SVM) (Pourbasheer et al., 2014c). In this present work, SVM was used as a nonlinear method based on genetic algorithm as a variable selection tool to construct the QSAR model, and then its outcomes were compared to MLR method as a linear QSAR model with the same selection tool employed in SVM method. The primary aim of this work was to develop a new QSAR model to correlate the quantitative relationship between the molecular structure and CK2 inhibition activity using GA-MLR and GA-SVM methods, and then compare the obtained results of each derived model.

2

2 Methodology

2.1

2.1 Data set

In this work, the data set consisted of 48 compounds such as CK2 inhibitors was taken from the literature (Pierre et al., 2010) with their inhibition activity data in terms of IC50 values. Activity data [IC50 (μM)] for each molecule were converted to logarithmic scale [pIC50 (M)] to give numerically larger data, and then used as a response for subsequent QSAR analysis. The data set was randomly split into training (38 compounds) and test set (10 compounds) considering the ratio of 80% and 20%, namely. However, in dividing step, the appropriate distribution of chemical structures as well as biological activities was considered for selecting the test set compounds. The chemical structures of studied molecules with their corresponding activity data were listed in Table 1.

Table 1 Chemical structures and the corresponding experimental and predicted pIC50 values by GA-MLR and GA-SVM methods.
No R1 R2 R3 X Y Exp. GA-MLR GA-SVM
1 —CO2H —CO —NH— 5.68 5.79 5.48
2 —CO2H —CO —NH((CH2)3OH) 5.82 5.94 6.02
3 —CO2H —(C—O(CH2)3OH) ⚌N— 6.00 6.00 5.84
4 —CO2H —(C—NH(CH2)3OH) ⚌N— 6.12 5.92 5.94
5 —CO2H H —(CH2)2OH 5.90 6.09 6.10
6 —CO2H H —(CH2)2NMe2 6.99 6.36 6.80
7a 5.75 5.74 5.84
8a —CO2H H Phenyl 7.04 6.56 6.64
9 —CO2H Me Phenyl 5.97 6.26 6.17
10a —CO2H H 2-Me-Phenyl 6.01 6.37 6.32
11 C-(1H-tetrazol-5-yl) H Phenyl 7.02 6.61 6.82
12 —CO2H H —(CH2)2Ph 6.29 6.34 6.49
13 —CO2H H —(4-F-phenyl) 6.66 6.77 6.86
14 —CO2H H —(3-F-phenyl) 7.17 7.26 7.07
15 —CO2H H —(4-Cl-phenyl) 6.75 7.03 6.95
16 —CO2H H —(3-Cl-phenyl) 7.50 7.55 7.48
17 —CO2H H —(3-MeO-phenyl) 7.11 6.89 7.00
18 —CO2H H —(3-acetyl phenyl) 7.55 7.13 7.35
19a —CO2H H —(3-(PhO)-phenyl) 6.40 6.76 7.10
20a —CO2H H —(3-(CONHMe)-phenyl) 6.89 6.75 7.11
21 C-(1H-tetrazol-5-yl) H —(3-Cl-phenyl) 6.89 7.32 6.86
22 C-(1H-tetrazol-5-yl) H —(3-F-phenyl) 7.12 7.01 6.92
23 8.22 8.01 8.02
24 —CO2H H —(CH2)2NMe2 7.60 7.73 7.60
25 —CO2H H —cyclopentyl 7.57 7.68 7.49
26 —CO2H H —OMe 8.10 8.42 7.90
27a —CO2H H —cyclopropyl 7.80 8.19 7.81
28 —CO2H H —(CH2)2O-i-Pr 7.96 7.98 7.76
29 —CO2H H —(CH2)phenyl 8.04 7.95 8.06
30 —CO2H H —(CH2)2phenyl 8.52 7.84 8.32
31 —CO2H H —(CH2)3phenyl 7.80 8.04 8.00
32 —CO2H H —(3-MeO-phenyl) 8.40 8.18 8.24
33a —CO2H H —(3-Cl, 4-F-phenyl) 8.40 8.46 8.13
34 —CO2H H —(3-F-phenyl) 8.30 8.38 8.26
35 —CO2H H —(2-Cl-phenyl) 8.10 8.19 7.90
36 —CO2H H —(3-Cl-phenyl) 9.00 8.49 8.79
37a —CO2H H —(4-Cl-phenyl) 8.15 8.40 8.44
38a —CO2H H —(3-acetyl phenyl) 8.52 8.40 8.25
39 —CO2H H —(3-CN-phenyl) 8.40 8.42 8.32
40 —CO2H H —(4-(PhO)-phenyl) 7.16 7.50 7.36
41 —CO2H H —(3-(PhO)-phenyl) 7.72 7.80 7.91
42a —CO2H H —(3-(SO2NH2)-phenyl) 7.37 7.25 7.05
43 H H C-(1H-tetrazol-5-yl) N CH 7.35 7.01 7.15
44 H H —CONH2 N CH 6.38 7.06 6.82
45 H Me —CO2H N CH 8.22 8.49 8.42
46 —CO2H H H N CH 6.19 6.11 6.12
47 H H —CO2H CH N 6.66 6.87 6.46
48 H H —CO2H N CH 8.15 8.00 7.95
Test set.

2.2

2.2 Descriptor calculation

All 2D chemical structures of compounds were drawn in Hyperchem 7.03 software (HyperChem, 2002) and then pre-optimized using molecular mechanics force field (MM+). The final optimization was performed using semi-empirical method (AM1) with the adjusted root mean square gradient of 0.01 kcal mol−1. The molecular descriptors for each molecule were derived using DRAGON v2.2 package (Todeschini et al., 2005). A total number of 1481 molecular descriptors were calculated for each molecule such as constitutional, functional groups, atom-centered fragments, topological, Burden eigenvalues, walk and path counts, autocorrelations, connectivity indices, information indices, topological charge indices, eigenvalue-based indices, Randic molecular profiles from the geometry matrix, geometrical, weighted holistic invariant molecular (WHIM) and geometry, topology, and atom-weights assembly (GETAWAY) descriptors. The calculated descriptors were then evaluated for existence of near constant and constant descriptors in which the detected descriptors were removed from the data set. The remained descriptors were then inspected by their correlation with inhibitory activities so as to decrease the redundancy in data. Consequently, the examined collinear descriptors (r > 0.9) were deleted. Finally, the 387 descriptors out of 1481 molecular descriptors were remained.

2.3

2.3 Variable selection

Selecting the most relevant descriptors for QSAR analysis is one of the important steps, since the model is given based on the selected variables. Generally, here the problem is to find a group of variables from available descriptors that the derived model can predict the inhibitory activity with minimum error in comparison with the experimental data. In this study, genetic algorithm technique was employed as a selection tool to select the most relevant descriptors with respect to an objective function (Waller and Bradley, 1999; Aires-de-Sousa et al., 2001; Ahmad and Gromiha, 2003; Hunger and Huttner, 1999). The initial step in performing genetic algorithm is the generation of large number of randomly selected variables in terms of chromosome where the variables included in each chromosome called gene (Ghasemi and Saaidpour, 2007; Holland, 1975). These selected subsets of variables are further evaluated by their fitness to predict inhibitory activity values. Here, the fitness function of used genetic algorithm was cross-validation correlation coefficient of leave-one-out (Q2LOO derived based on MLR) (Leardi et al., 1992). The next step is to exclude the worse subsets, and then breed the remaining subsets. Finally, the mutation is carrying out. Genetic algorithm technique was first developed by Leardi et al. (1992). Genetic algorithm method as a selection tool was written in Matlab 6.5 program (Mathworks, 2005) and used here. Both MLR and SVM, which are implemented in Matlab 6.5 program (Mathworks, 2005), as modeling tools were then employed to linearly and non-linearly correlate the selected descriptors (based on GA) with biological response, namely.

3

3 Results and discussion

First, the data set consisted of 48 compounds was divided into a training set of 38 compounds and a test set of 10 compounds with ratio of 80% and 20%, respectively. In this study, the split of data set was done randomly; however, the distribution of structure diversity and biological data is an objective to choose the test set compounds. The training set was used to build the model and then, the predictive ability of constructed model was evaluated by some series of compounds as test set.

3.1

3.1 Genetic algorithm-multiple linear regression method

Genetic algorithm was used to select the most appropriate descriptors. Based on the selected descriptors, multiple linear regression analysis was performed on the training set and then, evaluated by test set. Using the genetic algorithm technique, six descriptors were selected including; GATS6m, GATS8p, RDF080m, E3s, R8m and C-028 which contribute to the inhibition activity. Calculating the variation of inflation factors (VIF) (Agrawal and Khadikar, 2001) for six selected descriptors was carried out to inspect the multi-collinearity for each descriptor as below:

(1)
VIF = 1 1 - r 2

In this formula, ‘r’ is a correlation coefficient of multiple regressions between each variable and the other variables in the constructed QSAR model. VIF express different concept when it has different values in different range where if it equates to 1, it indicates that there is not any inter-correlation; if its value falls between 1.0 to 5.0, it shows the acceptance of model; and if the value of VIF becomes larger than 10.0, this denotes that the model is not acceptable, and it is unstable. The correlation coefficient and corresponding VIF values for selected descriptors based on GA-MLR were shown in Table 2. According to Table 2, VIF values for selected descriptors are less than 2.5.

Table 2 The correlation coefficient of selected descriptors and corresponding VIF values based on GA-MLR.
GATS6m GATS8p RDF080m E3s R8m C-028 (VIF)a
GATS6m 1 0 0 0 0 0 1.38
GATS8p 0.287 1 0 0 0 0 1.60
RDF080m −0.453 −0.112 1 0 0 0 2.37
E3s −0.155 −0.245 0.036 1 0 0 1.34
R8m −0.335 −0.248 0.715 0.240 1 0 2.44
C-028 0.231 0.550 −0.065 −0.362 −0.042 1 1.73
Variation inflation factors.

Using genetic algorithm-multiple linear regression (GA-MLR) analysis resulted in the development of a predictive QSAR model with six descriptors with the following equation:

(2)
pIC 50 = + 10.64 ( ± 0.4686 ) - 150.5 ( ± 32.15 ) GATS 6 m - 0.6157 ( ± 0.1548 ) GATS 8 p - 0.0869 ( ± 0.0328 ) RDF 080 m - 6.84 ( ± 0.7624 ) E 3 s + 5.831 ( ± 0.2971 ) R 8 m - 0.961 ( ± 0.1218 ) C - 028 N train = 38 , R train 2 = 0.893 , R test 2 = 0.922 , R adj 2 = 0.872 , F train = 43.18 , F test = 6.074 , Q LOO 2 = 0.844 , Q LGO 2 = 0.781 where N is the number of compounds in training set, and Q2LOO and Q2LGO are squared cross-validation coefficients for leave one out and leave group out (generally, 20% of compounds were being excluded, and here is 10 molecules) respectively. The obtained higher value for Q2LOO (0.844) indicates that the built model has striking reliability. R2 is the squared correlation coefficient, R2adj is adjusted R2 and F is Fisher F statistic. The statistical parameters of GA-MLR model are shown in Table 3. It is obvious that, the built model showed better results for the test set if referred to calculated R2 values in both sets. The higher R2 and F values with lower root mean square error (RMSE) values (RMSEtrain = 0.288 and RMSEtest = 0.272) show the predictive capability of the built model. The predicted inhibitory activities for whole molecules were listed in Table 1. The plot of predicted pIC50 values against the experimental pIC50 values was demonstrated in Fig. 1. To further evaluate the robustness of constructed model, Y-randomization test was performed. In this method as explained in our previous works (Pourbasheer et al., 2013, 2014a), pIC50 values are scuffled and then, the new model is building based on this randomized data. To validate the efficiency of the main derived model, the new built QSAR models should have lower R2 and Q2LOO values. The results of Y-randomization test were presented in Table 4. According to Table 4, it can be seen that the R2 and Q2LOO values were less than 0.3 meant that the goodness of the built model is not due to the chance.
Table 3 Statistical results of different QSAR models.
Training Test
R2 RMSE F R2 RMSE F
GA-MLR
0.893 0.288 43.18 0.922 0.272 6.07
GA-SVM
0.959 0.184 107.50 0.877 0.334 3.07
The plot of predicted vs. experimental pIC50 by GA-MLR.
Figure 1
The plot of predicted vs. experimental pIC50 by GA-MLR.
Table 4 The Q2LOO and R2training values after several Y-randomization tests.
No. Q2 R2
1 0.023 0.253
2 0.248 0.052
3 0.005 0.156
4 0.162 0.063
5 0.009 0.130
6 0.016 0.107
7 0.061 0.287
8 0.188 0.055
9 0.0007 0.164
10 0.0008 0.148

To evaluate the data set for any possible outliers, William plot (Eriksson et al., 2000) (the plot of cross-validated standardized residuals vs. hat values) was employed to visualize the applicability domain (Eriksson et al., 2000). The Williams plot was shown in Fig. 2. The details of Williams plot have been reported in our previous works (Pourbasheer et al., 2014b,d). As it can be seen, all compounds were inside the domain of built model and have the leverage lower than warning h value (the warning leverage limit is 0.55). As it is obvious from Fig. 2, all the compounds in the training and test sets have standardized residuals smaller than three standard deviation units (3δ). Therefore, there are no outliers for the developed model and prediction results of the developed model can be confirmed.

The Williams plot of GA-MLR model for the training and test sets.
Figure 2
The Williams plot of GA-MLR model for the training and test sets.

3.2

3.2 Genetic algorithm-support vector machine method

After developing the GA-MLR model as a linear model, SVM method was used to construct the nonlinear model based on the same selected descriptors and then the performance of this method was compared to ones obtained by GA-MLR method. The result of each method was summarized in Table 3. SVM regression relies on combination of different factors such as kernel function type, capacity parameter C, ε of ε-insensitive loss function and its corresponding parameters (Vapnik, 1998).

Kernel function type determines the sample distribution in space. Consequently, the kernel function type should be declared. The radial basis function (RBF) was applied due to the good performance (Pourbasheer et al., 2014d). The RBF is given as below:

(3)
exp ( - γ | u - v | 2 ) In this formula, u and v are independent variables, and γ is one of the kernel parameters. γ is controlling the RBF function and contributes to the SVM performance and training time directly. As our previous work, the γ parameter should be optimized and to obtain the optimal parameters, a grid search was performed via leave-one-out cross-validation on the original training set. To find the optimal value of γ, it was checked from 0.1 to 5 with incremental steps of 0.1. Along with optimization, the RMSE of cross-validation was obtained in each case. Fig. 3 shows the values of gamma (γ) parameter vs. obtained RMSE of cross-validation. As it is obvious from Fig. 3, the optimal value for gamma (γ) parameter is given when it equates to 1.5.
The gamma (γ) vs. RMSE for the training set.
Figure 3
The gamma (γ) vs. RMSE for the training set.

Parameter of ε-insensitive avoids the entire training set meeting boundary conditions, and so allows for the possibility of sparsity in the dual formulation’s solution. The optimal value for this parameter is associated with the type of noise available in the data. For the different ε values, RMSE of crossvalidation is varying from 0.01 to 0.3 with incremental steps of 0.01. Fig. 4 shows the values of ε-insensitive against the obtained RMSE of cross-validation, and as it is clear, the optimal value for ε-insensitive parameter is 0.2.

The epsilon (ε) vs. RMSE for the training set.
Figure 4
The epsilon (ε) vs. RMSE for the training set.

The last parameter in SVM modeling is parameter C which is to regulate and control the trade-off between maximizing the margin and minimizing the training error. To find an optimal value for parameter C, it was checked from 1 to 50 with incremental steps of 1, and the result was shown in Fig. 5. As it can be seen from Fig. 5, the optimal value for capacity parameter C is 6.

The capacity parameter (C) vs. RMSE for the training set.
Figure 5
The capacity parameter (C) vs. RMSE for the training set.

The results of predicting the inhibitory activities by SVM method are listed in Table 1. The predicted vs. experimental pIC50 values for both the training and test set based on SVM model are presented in Fig. 6. As it was investigated above, the optimal values for developing SVM model were obtained as C = 6, ε = 0.2, γ = 1.5. Statistical parameters for the optimal model for both training (R2 = 0.959, F = 107.5, RMSE = 0.184) and test (R2 = 0.877, F = 3.074, RMSE = 0.334) sets indicate the appropriate predictive ability of built model. As it can be seen from Table 3, the better prediction was performed for training set compounds if compared to GA-MLR. The lower RMSE, and higher F and R2 values for training set obtained by GA-SVM in comparison with GA-MLR show the superiority of GA-SVM over GA-MLR for training set; however, remarkable results for test set were given by GA-MLR if compared to GA-SVM. Upon the derived results by SVM based on genetic algorithm, the built GA-SVM method can also be used to predict the inhibitory activity of CK2 inhibitors.

The plot of predicted vs. experimental pIC50 by GA-SVM.
Figure 6
The plot of predicted vs. experimental pIC50 by GA-SVM.

3.3

3.3 Interpretation of descriptors

By interpreting the selected descriptors with their corresponding effects on inhibitory activities, some useful chemical insight can be provided to understand the mechanism of inhibitory activity, and consequently, the new drugs can be designed with higher inhibitory activities. Hence, an acceptable interpretation of the QSAR results is provided below.

The first and second descriptors are GATS6m (Geary autocorrelation – lag6/weighted by atomic masses) and GATS8p (Geary autocorrelation – lag8/weighted by polarizability). These descriptors are belonged to the 2D autocorrelations descriptors. In these descriptors, the Geary coefficient is a distance-type function that this function can be any physicochemical property calculated for each atom of the molecule, such as atomic mass, polarizability, and electronegativity. The physicochemical features in these two descriptors are atomic mass and polarizability, respectively. GATS6m and GATS8p display a negative sign in equation (2), which indicates that the pIC50 value is inversely correlated to these descriptors. Therefore, it can be concluded that by increasing the atomic mass and polarizability of compounds the value of these descriptors would increase, causing a reduction in pIC50 values.

The third descriptor is RDF080m (radial distribution function – 080/weighted by atomic mass) which is one type of RDF descriptors. The RDF is indicating the requirements for the 3D structure of compounds (Todeschini and Consonni, 2008). These kinds of descriptors are independent of the atom number such as size of a molecule. Additionally, the RDF descriptors can be referred to specific atom types or distance ranges to show the specific information in a certain 3D structure space. The RDF descriptors are based on the distance distribution in the molecule. In this descriptor, weighting schemes are the atomic masses. The negative sign of this descriptor (see Eq. (2)) suggests that the pIC50 value is inversely related to this descriptor, and when increasing the value of this descriptor by increasing the distribution and molecular mass of some specific group of atoms, the inhibitory activity is decreasing.

The fourth descriptor is E3s (3rd component accessibility directional WHIM index/weighted by I-state). This descriptor belongs to the WHIM directional descriptors that are based on the statistical indices calculated on the projections of atoms along principal axis (Todeschini et al., 1996). In this algorithm, principal components analysis applies on the centered Cartesian coordinates of a molecule using a weighted covariance matrix obtained from various weighing schemes for the atoms. In this descriptor, the atomic electrotopological state is one of the weighting schemes that is utilized for calculating the weighted covariance matrix (E3s). E3s displays negative sign and indicates that the pIC50 value is inversely related to this descriptor. This descriptor is expressed with the number of central symmetric atoms, and using such a case will lead to decrease of the inhibitory activity.

The fifth selected descriptor is R8m (R autocorrelation of lag 8/weighted by atomic masses). This descriptor is a kind of GETAWAY, R-indices descriptors. GETAWAY descriptors are presented for topology, geometry and atomic-weights assembly (Hall and Kier, 1995; Todeschini and Consonni, 2000). These descriptors are geometrical descriptors which provide the suitable position of substituents and fragments in molecule. Moreover, these descriptors can denote good information about molecular size and shape. R8m is related to the mass of the atoms in the molecule. This descriptor displays a positive sign, which indicates that the pIC50 is directly related to this descriptor, and increasing the atomic mass of some specific substituents and fragments in molecule would result in higher inhibitory activity.

The final descriptor is C-028 (R–CR–X). This descriptor belongs to the atom-centered fragment descriptors. This atom centered fragment descriptor is explained for each ring atom that has three neighbors. In this case, R–CR–X can be expressed as a central carbon atom (C) on an aromatic ring which has one carbon neighbor (R) and one heteroatom neighbor (X) on the same aromatic ring and the third neighbor outside this ring is a carbon (R). This descriptor displays a negative sign in Eq. (2) and indicates that the pIC50 is inversely related to this descriptor. Therefore, it was concluded that by increasing the number of heteroatom (with R–CR–X format) in molecules the value of this descriptor increased, causing a decrease of its pIC50 value.

4

4 Conclusion

In this work, the QSAR analysis of a series of compounds such as CK2 inhibitors was carried out using support vector machine and multiple linear regressions. The most relevant descriptors were selected based on algorithm genetic method. The performed validation methods (Y-randomization and cross-validation) demonstrate the accuracy and strength of the built model. Comparison between the obtained results indicates the superiority of the GA-SVM over the GA-MLR method for predicting the training set compounds; however, obtained GA-MLR model could give reasonable prediction for test set with higher statistical parameters if compared to GA-SVM. Using genetic algorithm as a selection tool presented six descriptors correlated with the inhibitory activity. By interpretation of the selected descriptors, it can be concluded that the activity of studied molecules increases by decreasing the atomic mass and polarizability of molecules and also number of heteroatom in molecules. In this study, the developed QSAR models can be useful to predict the activity of new compounds such as CK2 inhibitors, and can provide a better insight to design new potent CK2 inhibitors.

Acknowledgments

The authors would like to thank the State Scholarships’ Foundation of Greece (I.K.Y.) for financial support.

References

  1. , , . QSAR prediction of toxicity of nitrobenzenes. Bioorg. Med. Chem.. 2001;9:3035-3040.
    [Google Scholar]
  2. , , . Design and training of a neural network for predicting the solvent accessibility of proteins. J. Computat. Chem.. 2003;24:1313-1320.
    [Google Scholar]
  3. , , , , , . Protein kinase CK2 – a key suppressor of apoptosis. Adv. Enzyme Regul.. 2008;48:179-187.
    [Google Scholar]
  4. , , , . Prediction of 1H NMR chemical shifts using neural networks. Anal. Chem.. 2001;74:80-90.
    [Google Scholar]
  5. , . The impact of PTEN regulation by CK2 on PI3K-dependent signaling and leukemia cell survival. Adv. Enzyme Regul.. 2011;51:37-49.
    [Google Scholar]
  6. , , . The shape of things to come: an emerging role for protein kinase CK2 in the regulation of cell morphology and the cytoskeleton. Cell. Signal.. 2006;18:267-275.
    [Google Scholar]
  7. , , . Applied Regression Analysis (second ed.). New York: John Wiley & Sons Inc.; .
  8. , , , , , , , , , , , , , . Protein kinase CK2 modulates IL-6 expression in inflammatory breast cancer. Biochem. Biophys. Res. Commun.. 2011;415:163-167.
    [Google Scholar]
  9. , , . Too much of a good thing: the role of protein kinase CK2 in tumorigenesis and prospects for therapeutic inhibition of CK2. Biochim. Biophys. Acta (BBA) – Protein Proteom.. 2008;1784:33-47.
    [Google Scholar]
  10. , , , , , , . Regulation of cell proliferation and survival: convergence of protein kinases and caspases. Biochim. Biophys. Acta (BBA). Protein Proteom.. 2010;1804:505-510.
    [Google Scholar]
  11. , , , , . On the selection of the training set in environmental QSAR analysis when compounds are clustered. J. Chemomet.. 2000;14:599-616.
    [Google Scholar]
  12. , , , , , . Antisense oligonucleotides against protein kinase CK2-α inhibit growth of squamous cell carcinoma of the head and neck in vitro. Head Neck. 2000;22:341-346.
    [Google Scholar]
  13. , , . Quantitative structure–property relationship study of n-octanol–water partition coefficients of some of diverse drugs using multiple linear regression. Anal. Chim. Acta. 2007;604:99-106.
    [Google Scholar]
  14. , , . Protein kinase CK2 and its role in cellular proliferation, development and pathology. Electrophoresis. 1999;20:391-408.
    [Google Scholar]
  15. , , . Protein kinase CK2 in human diseases. Curre Med. Chem.. 2008;15:1870-1886.
    [Google Scholar]
  16. , , , . Prediction of melting point for drug-like compounds using principal component-genetic algorithm-artificial neural network. Bull. Korean Chem. Soc.. 2008;29:833-841.
    [Google Scholar]
  17. , , , . Application of principal component-genetic algorithm-artificial neural network for prediction acidity constant of various nitrogen-containing compounds in water. Monatsh. Chem.. 2009;140:15-27.
    [Google Scholar]
  18. , , . Electrotopological state indices for atom types: a novel combination of electronic, topological, and valence state information. J. Chem. Inf. Comput. Sci.. 1995;35:1039-1045.
    [Google Scholar]
  19. , . A biometrics invited paper. the analysis and selection of variables in linear regression. Biometrics. 1976;32:1-49.
    [Google Scholar]
  20. , . Adaptation in Natural and Artificial Systems. USA: University of Michigan Press; .
  21. , , . Optimization and analysis of force field parameters by combination of genetic algorithms and neural networks. J. Computat. Chem.. 1999;20:455-471.
    [Google Scholar]
  22. HyperChem., 2002. Molecular Modeling System. Hypercube, Inc., Gainesville, FL.
  23. , , , . Spectrophotometric simultaneous determination of creatine, creatinine, and uric acid in real samples by orthogonal signal correction–partial least squares regression. Monatsh. Chem.. 2009;140:685-691.
    [Google Scholar]
  24. , , , , , , , , , , , . Expression of protein kinase CK2 in astroglial cells of normal and neovascularized retina. Am. J. Pathol.. 2006;168:1722-1736.
    [Google Scholar]
  25. , , , , , , . Protein kinase CK2 in mammary gland tumorigenesis. Oncogene. 2001;20:3247-3257.
    [Google Scholar]
  26. , , , . Genetic algorithms as a strategy for feature selection. J. Chemomet.. 1992;6:267-281.
    [Google Scholar]
  27. Mathworks, 2005. Genetic Algorithm and Direct Search Toolbox Users Guide. The Mathworks Inc.
  28. , , . One-thousand-and-one substrates of protein kinase CK2? FASEB J.. 2003;17:349-368.
    [Google Scholar]
  29. , , , , , , . Protein kinase CK2α′ Is induced by serum as a delayed early gene and cooperates with Ha-ras in fibroblast transformation. J. Biol. Chem.. 1998;273:21291-21297.
    [Google Scholar]
  30. , , , . Novel players in multiple myeloma pathogenesis: role of protein kinases CK2 and GSK3. Leukemia Res.. 2013;37:221-227.
    [Google Scholar]
  31. , , , , , , , , , , , , , , , , , , , , , , , , , . Discovery and SAR of 5-(3-chlorophenylamino)benzo[c][2,6]naphthyridine-8-carboxylic Acid (CX-4945), the First clinical stage inhibitor of protein kinase CK2 for the treatment of cancer. J. Med. Chem.. 2010;54:635-654.
    [Google Scholar]
  32. , , , , , . QSAR study on hERG inhibitory effect of kappa opioid receptor antagonists by linear and non-linear methods. Med. Chem. Res.. 2013;22:4047-4058.
    [Google Scholar]
  33. , , , , , , . QSAR study of Nav1.7 antagonists by multiple linear regression method based on genetic algorithm (GA–MLR) Med. Chem. Res.. 2014;23:2264-2276.
    [Google Scholar]
  34. , , , , , . QSAR study of mGlu5 inhibitors by genetic algorithm-multiple linear regressions. Med. Chem. Res.. 2014;23:3082-3091.
    [Google Scholar]
  35. , , , , . QSAR study of α1β4 integrin inhibitors by GA-MLR and GA-SVM methods. Struct. Chem.. 2014;25:355-370.
    [Google Scholar]
  36. , , , , . QSAR study of IKKβ inhibitors by the genetic algorithm: multiple linear regressions. Med. Chem. Res.. 2014;23:57-66.
    [Google Scholar]
  37. , , , , . Quantitative structure activity relationship study of p38α MAP kinase inhibitors. Arabian J. Chem.. 2017;10:33-40.
    [Google Scholar]
  38. , , . Addiction to protein kinase CK2: a common denominator of diverse cancer cells? Biochim. Biophys. Acta (BBA) – Protein Proteom.. 2010;1804:499-504.
    [Google Scholar]
  39. , , . Protein kinase CK2 as a druggable target. Mol. BioSyst.. 2008;4:889-894.
    [Google Scholar]
  40. , , , , , . Quantitative structure–activity relationships (QSAR): studies of inhibitors of tyrosine kinase. Eur. J. Pharm. Sci.. 2003;20:63-71.
    [Google Scholar]
  41. , . New developments and applications: QSAR and drug design. In: , ed. Pharmacochemistry Library. Netherlands: Elsevier, Amsterdam; . p. :413-450.
    [Google Scholar]
  42. , , . Handbook of Molecular Descriptors. Weinheim: Wiley-VCH; .
  43. , , . Handbook of Molecular Descriptors. Weinheim: Wiley-VCH; .
  44. , , , , , , . Modeling and prediction by using WHIM descriptors in QSAR studies: submitochondrial particles (SMP) as toxicity biosensors of chlorophenols. Chemosphere. 1996;33:71-79.
    [Google Scholar]
  45. Todeschini, R., Consonni, V., Mauri, A., Pavan, M., 2005. DRAGON. In: software for the Calculation of Molecular Descriptors. Talete srl, Milan, Italy.
  46. , . Statistical Learning Theory. New York: Wiley; .
  47. , , . Development and validation of a novel variable selection technique with application to multidimensional quantitative structure−activity relationship studies. J. Chem. Inf. Comput. Sci.. 1999;39:345-355.
    [Google Scholar]
Show Sections