Translate this page into:
Using partial least squares and principal component regression in simultaneous spectrophotometric analysis of pyrimidine bases
⁎Corresponding author. khajeh_h@yahoo.com (Habibollah Khajehsharifi)
-
Received: ,
Accepted: ,
This article was originally published by Elsevier and was migrated to Scientific Scholar after the change of Publisher.
Peer review under responsibility of King Saud University.
Abstract
This study, for the first time, applies multivariate spectrophotometric calibration for simultaneous determination of three pyrimidine bases including uracil (URA), cytosine (CYT) and thymine (THY). Although determination of these bases is of great importance from a physiological and pharmaceutical perspective, it could be a difficult task since there is some sort of spectral overlapping. The principal component regression (PCR) and partial least-squares (PLS) model were used to overcome this problem and to construct the calibration sets containing URA, CYT and THY in the concentration range of 1.12–22.42, 1.11–27.78 and 1.26–25.22 μg mL−1, respectively. The absorption spectra were recorded from 220–320 nm. The results showed that the NPCs for URA, CYT and THY were 6, 5 and 4 by PCR and 4, 5 and 3 by PLS, respectively. In addition, the RMSEPs for URA, CYT and THY were 0.7067, 0.5093 and 0.6371 by PCR and 0.5469, 0.2700 and 0.5087 by PLS, respectively. The proposed method yielded recoveries ranging from 93.85 to 107.45 by PLS and 90.48 to 111.42 by PCR. The method was successful in simultaneous determination of the three bases in urine, serum and plasma.
Keywords
Uracil
Cytosine
Thymine
Spectrophotometry
Partial least squares
Principal component regression
1 Introduction
Pyrimidine bases are the building blocks in both DNA and RNA which in turn play important roles in cell metabolism. Base changes in DNA may seriously affect the structure and function of products of gene expression protein, which is considered the main cause of inherited diseases and most human cancers (Vnencak-Jones, 1999; Lane, 1999; Lindblom and Nordenskjold, 1999). Among the pyrimidine bases, uracil (URA), cytosine (CYT) and thymine (THY) can be mentioned.
URA, a common and naturally occurring pyrimidine derivative (Garrett and Grisham, 1997), originally was discovered in 1900 and was isolated by hydrolysis of yeast nuclein that was found in bovine thymus and spleen, herring, sperm, and wheat germ (Brown, 1994). URA can be used for drug delivery and also as a pharmaceutical drug. URA is used in the body to help carry out the synthesis of many enzymes necessary for cell functions through bonding with riboses and phosphates (Garrett and Grisham, 1997). It serves as an allosteric regulator and coenzyme for reactions in the human body and in plants. URA is also involved in the biosynthesis of polysaccharides and the transportation of sugars containing aldehydes (Brown, 1998). It can also increase the risk for cancer in cases where the body is deficient in folate (Mashiyama et al., 2004). URA derivatives containing a diazine ring are used in pesticides (Pozharskii et al., 1997).
CYT is one of the five main bases found in DNA and RNA. It is a pyrimidine derivative with a heterocyclic ring and two substituents attached (an amine group at position 4 and a keto group at position 2) (Kossel and Steudel, 1903). Recently, CYT has been used in quantum computation. CYT can be found as a part of DNA, RNA, or a nucleotide.
THY is one of the four bases in the nucleic acid of DNA and it is also known as 5-methyluracil. THY may be derived by methylation of URA at the 5th carbon. THY could also be a target for actions of 5-fluorouracil (5-FU) in cancer treatment. 5-FU can be a metabolic analog of THY (in DNA synthesis) or URA (in RNA synthesis). THY bases are often oxidized to hydantoins over time after the death of an organism (Hofreiter et al., 2001).
Many areas such as pharmacological studies, clinical diagnosis and DNA damage assay need a quick, inexpensive and accurate method for the determination of pyrimidine bases (Cadet and Weinfeld, 1993; Tseng et al., 1994; Lin et al., 1997; Ames, 1998; Atamna et al., 2000; Ames and Acad, 1999). Because of the central biological significance of nucleic acids, it is important to know as much as possible about their function, their structure, and their chemical composition.
High performance liquid chromatography (HPLC) is a commonly used method for the analysis of pyrimidine bases (Perrett and Simmonds, 1990; Grune et al., 1993; Minniti et al., 1998). Capillary electrophoresis (CE) is a powerful alternative to HPLC for the separation of charged and polar compounds (Yang et al., 1997; Altria, 1999; Fritz, 2000; Krylov and Dovichi, 2000). CE has been successfully used for the analysis of nucleic acids and nucleotides because they are negatively charged in neutral pH buffer (Boyce, 2001; Cohen et al., 1987; Deforce et al., 1996; Geldart and Brown, 1998). A spectrophotometric technique is always an acceptable alternative chemical analysis method, because of its acceptable precision and accuracy, associated with its lower cost compared to other techniques.
Multivariate spectral calibrations are also new standard methods for performing quantitative spectral analysis. Among the different regression methods existing for multivariate calibration, the factor analysis based methods including partial least squares (PLS) regression and principal component regression (PCR) have received considerable attention in chemometrics (Martens and Naes, 1989). PLS and PCR perform data decomposition into spectral loadings and scores before model building with the aid of these new variables. In PCR, the data decomposition is done using only spectral information, while PLS employs spectral and concentration data. These techniques are powerful multivariate statistical tools that have been successfully and widely used in the quantitative analysis of spectroscopic data. They are strong enough to overcome common statistical problems such as co linearity, band overlaps and interactions (Martens and Naes, 1989; Boyce, 2001; Cohen et al., 1987; Deforce et al., 1996; Geldart and Brown, 1998).
This study aims to use PCR and PLS to develop a suitable method for simultaneous spectrophotometric determination of CYT, THY and URA.
2 Experimental
2.1 Chemicals
All the used chemicals were of analytical reagent grade. Throughout the experiments, double distilled water was used. CYT, THY and URA were purchased from Fluka, while trichloroacetic acid was supplied from Merck. The stock solutions of CYT, THY and URA were prepared daily, by dissolving them in a buffer solution (pH = 7.0) that was prepared by KH2PO4 and NaOH (Merck).
2.2 Instrumentation and software
Electronic absorption measurements were carried out on a Jasco v-570 spectrophotometer (slit width: 1.0 nm, scan rate: 2000 cm/min) using 1.00 cm quartz cells. Measurements of pH were made with a Metrohm 692 pH meter using a combined electrode. All spectra were digitized and stored at wavelengths from 220 to 320 nm in steps of 1 nm and then transferred in TXT format to a Pentium 4, 2.4 GHz computer using MATLAB software, version 7 (The Math Works). PCR and PLS calculuses were carried out in the PLS Tool box (Eigenvector Company, Version 2.5).
2.3 Procedure
2.3.1 Calibration set
A mixture design for three components was used for calibration set. To provide good prediction in PCR and PLS method, a training set of 36 samples was taken (Table 1). The concentrations of CYT, THY and URA were varied between 1.11–27.78, 1.26–25.22 and 1.12–22.42 μg mL−1, respectively. The mixed standard solutions were placed in a 10 ml volumetric flask and completed to final volume with buffer solution (pH = 7.0). Finally the absorption spectra of all prepared solutions were recorded between 220 and 320 nm against a blank of universal buffer.
Mixtures
URA (μg/ml)
CYT (μg/ml)
THY (μg/ml)
Mixtures
URA (μg/ml)
CYT (μg/ml)
THY (μg/ml)
M1
22.42
1.11
1.26
M19
13.34
1.11
11.48
M2
19.39
4.89
1.26
M20
16.36
1.11
8.07
M3
19.36
7.78
1.26
M21
19.39
1.11
4.67
M4
13.34
12.56
1.26
M22
19.39
23.89
8.07
M5
10.2
16.33
1.26
M23
16.36
23.89
11.48
M6
7.17
20.11
1.26
M24
13.34
23.89
15.01
M7
4.15
23.89
1.26
M25
10.2
23.89
18.41
M8
1.12
27.78
1.26
M26
7.17
23.89
21.82
M9
1.12
23.89
4.67
M27
10.2
20.11
21.82
M10
1.12
20.11
8.07
M28
13.34
20.11
18.41
M11
1.12
16.33
11.48
M29
16.36
20.11
15.01
M12
1.12
12.56
15.01
M30
19.39
20.11
11.48
M13
1.12
7.78
18.41
M31
19.39
16.33
15.01
M14
1.12
4.89
21.82
M32
16.36
16.33
18.41
M15
1.12
1.11
25.22
M33
13.34
16.33
21.82
M16
4.15
1.11
21.82
M34
16.36
12.56
21.82
M17
7.17
1.11
18.41
M35
19.39
12.56
18.41
M18
10.2
1.11
15.01
M36
19.39
7.78
21.82
2.3.2 Prediction set
10 mixtures were prepared randomly for prediction set but due to employing as an independent test, concentrations were not present in the previous set. Table 2 depicts the solutions used for prediction set. The range added to be 3.33–24.45, 2.52–21.44 and 1.79–17.04 μg mL−1 for CYT, THY and URA respectively.
Mixtures
Add (μg/ml)
Found (μg/ml)
Recovery (%)
URA
CYT
THY
URA
CYT
THY
URA
CYT
THY
PLS
1
6.72
17.78
2.52
7.03
17.94
2.66
104.61
100.90
105.56
2
11.21
24.45
18.92
11.86
24.13
18.26
105.8
98.69
96.51
3
1.79
10.00
6.31
1.68
9.78
6.28
93.85
97.80
99.52
4
5.60
3.33
21.44
5.37
3.55
21.98
95.89
106.61
102.52
5
13.45
6.67
12.61
13.29
6.36
12.83
98.81
95.35
101.74
6
7.85
21.20
10.09
7.98
21.66
9.85
101.66
102.17
97.62
7
17.04
12.00
11.98
17.37
12.01
12.23
101.94
100.08
102.09
8
3.20
14.00
8.30
3.30
14.11
8.22
103.13
100.79
99.04
9
9.00
8.20
14.10
9.71
8.02
15.15
107.89
97.80
107.45
10
14.80
19.80
16.20
16.12
20.19
16.95
108.92
101.97
104.63
PCR
1
6.72
17.78
2.52
6.08
18.49
2.32
90.48
103.99
92.06
2
11.21
24.45
18.92
10.19
25.27
19.48
90.90
103.35
102.96
3
1.79
10.00
6.31
1.66
9.54
6.39
92.74
95.40
101.27
4
5.60
3.33
21.44
5.86
3.22
20.39
104.64
96.70
95.10
5
13.45
6.67
12.61
14.21
7.09
14.05
105.65
106.30
111.42
6
7.85
21.20
10.09
7.72
21.87
10.81
98.34
103.16
107.14
7
17.04
12.00
11.98
17.82
12.26
11.93
104.58
102.17
99.58
8
3.20
14.00
8.30
3.30
14.35
8.33
103.13
102.50
100.36
9
9.00
8.20
14.10
9.71
8.28
14.13
107.89
100.98
100.21
10
14.80
19.80
16.20
16.12
20.41
16.23
108.92
103.08
100.19
2.4 Real sample preparation
2.4.1 Serum and plasma samples
The serum and plasma samples were homogenized. For deproteinization, 1 ml of 24% w/v trichloroacetic acid was added to1 ml of serum and 1 ml of plasma. After 15 min, the resulting mixtures were centrifuged at 3000 rpm (Khajehsharifi and Eskandari, 2009). The pH of supernatant solution was fixed on pH = 7.0 by some amount of NaOH solution. Afterward, the appropriate amount from the stock solution of CYT, THY and URA was added to 0.5 ml of the final prepared serum and plasma. Then it was filled to the final volume (10 ml) with buffer solution to obtain the desired concentration. The electronic absorption spectrum was recorded in the range of 220–320 nm against a blank solution of serum and plasma.
2.4.2 Urine sample
The urine sample was diluted 1:3 with double distilled water. Then cell debris and the particulate matter were removed from the urine using low-speed centrifugation for 5 min at 1500 rpm. Afterward the pH of the sample was fixed on pH = 7.0 by some amount of NaOH solution. Then appropriate amount of the stock solution of CYT, THY and URA was added to 0.5 ml of the final prepared urine and completed to the final volume (10 ml) with buffer solution to obtain the desired concentration. The electronic absorption spectrum was recorded in the 220–320 nm against a blank of urine (Khajehsharifi and Eskandari, 2008).
3 Results and discussion
3.1 Spectral characteristics
The electronic absorption spectra of URA, CYT and THY are shown in Fig. 1. As it can be seen, the spectrum of each component is overlapped with each other. Thus, these compounds cannot be analyzed in the presence of each other by a simple calibration procedure without prior separation. Therefore multivariate calibration was used to resolve the spectra and to determine each component in the mixtures. The composition data of the solutions are listed in Table 1. On spectral data, it was recorded in the region between 220 and 320 nm (1.00 nm steps). The same method was performed for validation, artificial and unknown samples.Absorption spectra of URA (6.5 μg ml−1), CYT (6.5 μg ml−1), THY (6.5 μg ml−1), mixture (URA (19.4 μg ml−1), CYT (7.8 μg ml−1), THY (1.3 μg ml−1)) at pH 7.0 and T = 273 K.
3.2 Univariate calibration
Individual calibration curves were constructed with several points (Fig. 2), as absorbance versus pyrimidine bases concentration in the range 1.12–22.42, 1.11–27.78 and 1.26–25.22 μg mL−1 for URA, CYT and THY, respectively. The maximum wavelength of URA is 259, that of CYT is 267 and that of THY is 265. The wavelengths used to make calibration curves were 220–320 nm. Linear regression results, line equations and R2 are shown in Fig. 2.Analytical curves for univariate determination of URA, CYT and THY.
3.3 Multivariate calibration and prediction
Multivariate calibration methods such as PCR and PLS require a suitable experimental design of the standard that belongs to the calibration set to provide good prediction. In this study, the mixture design was used for experimental design. It is important to use a method of selection that does not create an underlying correlation among the concentrations of the components.
3.3.1 Selection of the optimum number of factors
The optimum number of factors (latent variables) to be included in the calibration model was determined by computing the prediction error sum of squares (PRESS) for cross validation models using a high number of factors (half the number of total standard + 1), which is defined as follows:
One reasonable choice for the optimum number of factors would be the number that yielded the minimum PRESS. Since there are a finite number of samples in the training set, in many cases the minimum PRESS value causes over-fitting for unknown samples that were not included in the model. A solution to this problem has been suggested by Haaland et al. in which the PRESS values for all previous factors are compared with the PRESS value at the minimum (Haaland and Thomas, 1988). The F-statistical test can be used to determine the significance of PRESS values greater than the minimum. The maximum number of factors used to calculate the optimum PRESS was selected and the optimum number of factors obtained by PCR and PLS model is summarized in Table 3. PLS and its relation to PCR and heuristic arguments are presented to explain that PLS needs fewer factors to give optimal prediction (Helland, 1988). In all cases, the number of factors for the first PRESS values whose F-ratio probability drops below 0.75 was selected as the optimum. Plots of PRESS vs. number of factors by PCR and PLS are shown in Fig. 3.
Method
Component
NPC
PRESS
RMSEP
RSEP (%)
PLS
URA
4
2.9915
0.5469
5.1480
CYT
5
0.7292
0.2700
1.7663
THY
3
2.5876
0.5087
3.7312
PCR
URA
6
2.7902
0.7067
6.6715
CYT
8
2.4827
0.5093
3.2528
THY
5
4.0570
0.6371
4.7094
Plots of PRESS vs. number of factors by PCR (□) and PLS (○).
3.3.2 Statistical parameters
To evaluate the predictive ability of a multivariate calibration model, the root mean square error of prediction (RMSEP) and relative standard error of prediction (RSEP) can be used Lin et al., 1997:
3.3.3 Resolution of synthetic mixtures
The predictive ability of the method was determined using 10 three-component mixtures (their compositions are given in Table 2). The results obtained by applying PCR and PLS algorithm to 10 synthetic samples are listed in Table 2 which also shows the recovery for the synthetic series of URA, CYT and THY mixtures. As it can be seen, the recovery was also acceptable. The plots of the prediction concentration versus actual values by PLS are shown in Fig. 4 for URA, CYT and THY (line equations R2 values are also shown).Plots of predicted concentration vs. actual concentration of URA, CYT and THY by PLS.
3.3.4 Determination of URA, CYT and THY in spiked real samples
To assess the reliability of the method, 6 real sample preparations were analyzed. Table 4 shows the results as well as the composition of the real samples. The validation of the method has been carried out by comparing with the labeled amounts. As it is clear, the recovery was quantitative and there were no significant differences between the amounts obtained from this method and the labeled amounts.
Mixtures
Added (μg/ml)
Found (μg/ml)
Recovery (%)
URA
CYT
THY
URA
CYT
THY
URA
CYT
THY
Serum samples
S1
7.85
15.56
6.31
7.96
16.59
6.13
101.40
106.62
97.15
S2
13.45
7.78
11.35
13.26
7.45
12.11
98.59
95.76
106.70
S3
4.48
11.11
17.66
4.26
11.39
17.34
95.09
102.52
98.19
S4
6.72
12.22
11.35
11.97
8.90
4.80
97.08
100.11
95.24
S5
12.33
8.89
5.04
8.18
13.12
12.51
97.27
98.42
99.21
S6
8.41
13.33
12.61
7.96
16.59
6.13
101.40
106.62
97.15
Plasma samples
P1
7.85
15.56
6.31
7.53
15.35
6.24
95.92
98.65
98.89
P2
13.45
7.78
11.35
13.74
7.64
11.30
102.16
98.20
99.56
P3
4.48
11.11
17.66
4.32
10.66
18.34
96.43
95.95
103.85
P4
6.72
12.22
11.35
13.05
9.50
5.01
105.84
106.86
99.40
P5
12.33
8.89
5.04
8.35
13.92
11.83
99.29
104.43
93.81
P6
8.41
13.33
12.61
7.53
15.35
6.24
95.92
98.65
98.89
Urine samples
U1
7.85
15.56
6.31
7.63
15.79
6.46
97.20
101.48
102.38
U2
13.45
7.78
11.35
13.18
7.61
11.04
97.99
97.81
97.27
U3
4.48
11.11
17.66
4.34
11.48
18.49
96.88
103.33
104.70
U4
6.72
12.22
11.35
11.64
8.52
5.06
94.40
95.84
100.40
U5
12.33
8.89
5.04
8.70
13.97
12.42
103.45
104.80
98.49
U6
8.41
13.33
12.61
7.63
15.79
6.46
97.20
101.48
102.38
4 Conclusions
This study aimed to determine the amount of pyrimidine bases (i.e., uracil, cytosine and thymine) while they are mixed together. The overlapping of signals corresponding to the mixture made it necessary to use multivariate analysis tools including principal component regression or partial least squares to determine each pyrimidine base separately in their mixture. PLS seems to get its optimal prediction with fewer factors than PCR. Besides, PLS reaches better values of the PRESS, RMSEP and relative RSEP. Also unlike PCR, the PLS method gives a unique way of choosing which factor to include next. The results on data sets of URA, CYT and THY mixtures demonstrate that the predictive ability of the models obtained was very good and satisfactory. It can be concluded that the spectrophotometric method which was used in this study is more simple and inexpensive. The good agreement clearly shows the utility of this procedure in simultaneous spectrophotometric determination of URA, CYT and THY in human serum, human urine and plasma samples.
Acknowledgment
We are grateful to the Yasouj University for supporting the research.
References
- J. Chromatogr. A. 1999;856:443-463.
- Toxicol. Lett.. 1998;102/103:5-18.
- Science. 1999;889:87-106.
- A Proc. Natl. Acad. Sci. U.S.A.. 2000;97:686-691.
- Electrophoresis. 2001;22:1447-1459.
- Heterocyclic Compounds Thy Pyrimidines. New York: Interscience; 1994.
- Ring Nitrogen and Key Biomolecules: The Biochemistry of N-Heterocycles. Boston: Lluwer Academic Publishers; 1998.
- Anal. Chem.. 1993;65:675A-682A.
- Anal. Chem.. 1987;59:1021-1027.
- Anal. Chem.. 1996;68:3575-3584.
- Anal. Chem.. 1997;69:3391-3399.
- J. Chromatogr. A. 2000;884:261-275.
- Principals of Biochemistry with a Human Focus. United States: Brooks/Cole Thomson Learning; 1997.
- J. Chromatogr. A. 1998;828:317-336.
- J. Chromatogr.. 1993;636:105-111.
- Anal. Chem.. 1988;60:1193-1202.
- Commun. Stat. Simulat.. 1988;17:581-607.
- Nat. Rev. Genet.. 2001;2:353-359.
- Drug Test. Anal.. 2008;55:163-170.
- Monatsh. Chem.. 2009;140:685-691.
- Z. Physiol. Chem.. 1903;38:49-59.
- J. Anal. Chem. 2000;72:111R-128R.
- Br. J. Cancer. 1999;80:1-5.
- J. Chromatogr. A. 1997;760:227-233.
- Acta Oncol.. 1999;38:439-447.
- Multivariate Calibration. Chichester: Wiley; 1989.
- Anal. Biochem.. 2004;330:58-69.
- Adv. Exp. Med. Biol.. 1998;431:843-848.
- Biomed. Chromatogr.. 1990;46:267-272.
- Heterocycles in Life and Society: Technology, Medicine and Agriculture. New York: John Wiley and Sons; 1997.
- J. Royal Stat. Soc.. 1974;1974(36):111-147.
- Anal. Biochem.. 1994;222:55-58.
- Am. J. Clin. Pathol.. 1999;112:S19-S32.
- J. Chromatogr. Sci.. 1997;35:358-373.