[Frontiers in Bioscience E2, 250-257, January 1, 2010] |
|
|
Comparison of methods for statistical analysis of combination studies School of Mathematics, University of Manchester, UK TABLE OF CONTENTS
1. ABSTRACT This paper is concerned with the statistical analysis of data obtained in studies of the joint action of drugs. The three methods that are compared are illustrated on real data (1), using the statistical package SAS. It is argued that while the results obtained using these methods do not differ substantially, the method allowing for estimating simultaneously all required parameters is to be preferred. It allows for a statistical test for the significance of the joint action of the drug combinations to be carried out. 2. INTRODUCTION Combinations of drugs have been known to be highly successful in treating different diseases for many years (2). There is a considerable amount of literature discussing related scientific issues as well as the experimental design and analysis of experiments aiming at identifying the existence of synergistic or antagonistic drug effects (2, 3, 4). Such experiments are a great deal more complex than studying the effect of the drugs separately, and typically take a long time to complete, and require substantial resources. Therefore, it is very important to develop a suitable statistical approach for analyzing such data. This paper discusses typical methods of analysing such data, demonstrating them on a dataset kindly provided by William Greco and based on his work (1). The dataset contains data from two studies. Greco's description of the experiments is as follows. "The first drug is trimetrexate (TMQ), a lipophilic inhibitor of the enzyme, dihydrofolate reductase; and the second drug is AG2034, an inhibitor of the enzyme, glycinamide ribonucleotide formyltransferase (GARFT). The first experiment included 2.3 m M folic acid in the medium, and the second experiment included 78 m M folic acid in the medium. All drug concentrations are in m M. The endpoint was the growth of HCT-8 human ileocecal adenocarcinoma cells, in wells of 96-well plates, as measured by the SRB protein stain. Treatments of cells in wells by drugs were randomized across the plates. Each 96-well plate included 8 wells as instrumental blanks (no cells; values are NOT subtracted from treated cell wells); thus 88 wells were used for drug treatments. Five replicate plates were used for each set of 88 treatments. Each of these two large data sets came from two 5-plate stacks (a maximum of 880 treated wells per experiment). There were 176 control wells per experiment (cells, but zero concentrations of both drugs). Each single agent was studied at 11 dilutions; the two drugs were combined in 12 different fixed ratios per experiment, and also studied in 11 dilutions. The recorded endpoint is an absorbance measurement (ranging about 0.0-2.0), recorded in an automated 96-well plate reader. Complete experimental details and mechanistic implications are included in Faessel et al (1998). The data set for the low (2.3 m M) folic acid case contains 871 data points (9 points were deleted for experimental problems); and the data set for the high (78 m M) case includes 879 data points (1 point was deleted). The former data set includes the complication of synergism dominating for some fixed ratios and antagonism dominating for other ratios. The latter data set includes the phenomenon of very large (super) synergy, and possible differential background effects (recorded absorbances at very high drug concentrations) for the single agents." We refer to the compounds TMQ and AG2034 as compounds A and B, to the response of interest as Y, and to the data collected at 2.3 m M and 78 m M folic acid as Study 1 and Study 2, respectively. A ray design (4) has been used in each of the studies, where the rays are defined by the constant ratios of the tested compounds for which the response has been measured - see Table 1. Each study is conducted using two stacks of plates. In order to allow for testing the significance of the differences between the stacks within each study, one combination in common was tested on each of the two stacks within each study. These combinations are rays 3 and 8. 3. STATISTICAL METHODOLOGIES The statistical analysis of data obtained in a combination study should follow the usual steps of any complex statistical investigations. The details of these steps depend on the aim of the study itself. We present some of the available methods and apply them to analyse the data described in Section 2. The computer package SAS is used to analyse the data. SAS has been previously reported to be a powerful tool in such calculations (5), however the code provided by the authors is incomplete and cannot be used. The SAS code implementing the statistical analysis in the current paper is available from the author on request. All figures in this paper were created using general graphical tools provided by ODS Statistical Graphics in SAS Software, Version 9.2. 3.1. Preliminary statistical analysis Figure 1 shows a histogram of the observations when no compounds have been used. It is clear that there are unusually small observations in the Study 1. Using such values in the subsequent statistical analysis can create various problems. On the basis that these values were not seen as representing genuine variability, they were excluded from the dataset. The remaining observations can be seen as approximately following the normal distribution, seen by the closeness of curves for the kernel estimation and the normal probability density function for the data. Statistical tests show no sufficient evidence for differences between the observations taken in the two studies and the two stacks of plates. The data for the latter are therefore pooled. Figure 2 shows all data for each ray of Study 1, while Figure 3 shows all data for each ray of Study 2. Careful inspection of these plots indicates possible problems in the subsequent analyses of the data. For example, it can be seen that the data for Ray 7 of Study 1 includes virtually no observations producing responses in the middle between the maximum and the minimum and without such observations the estimation of important parameters may not be possible. It can be also seen that in all plots the variability of the response increases with the response. This is an important observation which influences the definition of the model in Section 3.2.
3.2. Main Analysis The statistical analysis of bioassay data for a single compound action requires estimating the model < where Yi is the response observed at a dose whose logarithm to base 10 is xi, a
corresponds to the logarithm to base 10 of the IC50 of the drug; b
is the Hill slope, named after Hill (6) who proposed the model; g
is the mean response when no compound has been used and d
is the mean response for maximum possible drug effect. When a
is smaller the drug is more potent. In (1) < When a ray design has been used to study the joint effect of two drugs, say A and B, the data for each ray can be treated as a new drug, defined by the ratio in which the two drugs have been combined, and analysed by estimating model (1). Evidence that model (1) is not applicable would indicate that mechanistic considerations different from those assumed by Hill and Michaelis and Menten (10) to justify model (1) are present, and therefore the study of such a drug combination may require a different treatment. Typically the assessment of the joint action of two compounds is carried out by evaluating the evidence of its deviation from what would be expected if the joint action of the two compounds were additive. The definition of additivity proposed by Loewe (11) is most commonly used, though other definitions (12) also exist. Loewe's definition leads to the so called combination, also called interaction, index < where DA and DB are doses of the compounds A and B, respectively, producing the same mean response as a combination of doses Da and Db of the two compounds. If CI < 1, this provides evidence of synergistic action of the drugs, while if CI > 1, this suggests an antagonistic action. Some researchers provide general advice about how smaller or larger than one the CI value should be in order to declare the drug action as synergistic or antagonistic. However, as the combination index is a random variable following approximately lognormal distribution, a better approach is to test CI for its significance. Such a statistical test would take into account the variability in the experiment and the number of observations used to estimate the combination index. We estimate DA = IC50A and DB = IC50B as well as Da and Db for each ray using the following three methods, with SAS/STAT Software, Version 9.1.3.
In all cases the default setting for the SAS procedures are used. 4. COMPARISON OF METHODOLOGIES 4.1. Study 1 Tables 2-5 show the estimates of the parameters of model (1) α, β, g and δ when methods M1 and M2 are used. Not surprisingly the standard errors of the parameters' estimates when method M2 is used are smaller. However, the differences between the estimates for α for the studied combinations are relatively small. Both methods incurred difficulties in estimating α for ray 7, although this is shown in the computer output differently: when method M1 is used, the standard error is reported much higher than the other standard errors, while for method M2 a standard error for the estimate of α is not produced. The inspection of the data (Figure 2) discussed in Section 3.1 indicated the problem with the data for this drug combination - there are not enough points in the middle range of values for the response. The close similarity of the estimates of g
and δ (Tables 4 and 5) provides further evidence that there is no notable variability in the results between plates and raises a question whether separate estimates of the parameters for each ray in this study is indeed necessary. This concern can also be extended to the estimates of r
and < Method M3 does not have any of the shortcomings that methods M1 and M2 have. The model parameters estimates when this method is used are shown in Table 8. The estimates of α are similar to those obtained previously by methods M1 and M2, and their standard errors are in general smaller then those obtained with method M1, but they are larger then those obtained using method M2. The latter could be explained by the over-fitting that methods M1 and M2 do, i.e. providing different estimates for the parameters g
, δ, r
and <
Table 9 shows the estimates of the logarithms at base 10 of the combination index (2) for all rays obtained using method M3. The tests for significance show strong evidence of synergistic action for rays 3, 5, 8, 10, 12 and 14; some evidence for synergy for rays 4 and 6, but no evidence for synergy for rays 11 and 13. Furthermore, the strength of synergy appears to increase with the ratio of the drugs up to a ratio of 49.65, and then decreases as the ratio increases further. Clear illustration of this phenomenon is shown on Figure 4 which shows the 95% confidence intervals for the logarithm to base 10 of the combination index. If the distribution of the combination index, rather than of its logarithm, is assumed to be approximately normal, the tests for significance provide very similar results to those presented in Table 9. This may not be the case in the analysis of other data. 4.2. Study 2 The data for this study are shown in Figure 3. When their analysis is carried out in the same way as that used for Study 1, similar results are obtained. Therefore complete details are not provided. All methods failed to fit a sensible model to the data for ray 14 and the results for this drug combination are omitted. Method M2 also had difficulties estimating b for rays 7 and 10. Table 10 lists the logarithm to base 10 of the combination indices for the studied drug combinations, while Figure 5 shows their 95% confidence intervals. There is clear evidence for synergistic action of the drugs, and it appears to be largest at the drug ratio 49.65, the same ratio for which the synergy was largest in Study 1. Therefore, it seems the increase of the folic acid increases the strength of synergy between the two drugs, but does not appear to change the drug ratio at which the largest synergy occurs. However, this observation has to be checked experimentally. 5. CONCLUSIONS The results obtained during all stages of the statistical analysis are important for the understanding of the joint action of drugs. For example, the plots of the raw data (Figure 2 and Figure 3) help the motivation and understanding of the subsequent statistical analysis. The three compared methods showed consistent results. However, results obtained using method M1 ignored the heterogeneity of the variance of the response of interest, while those obtained with method M2 appear to underestimate the standard errors of the estimates of a and β for all drug combinations. Method M3 allows for all required estimates to be obtained using all data. A statistical test for the significance of the drug combinations is easy to carry out too. The data that were used to compare methods M1, M2 and M3 have some limitations and therefore various possible complications in the statistical analysis of combination data could not be illustrated. For example, no information is available about which data are collected on the same plates and whether the data are collected on one or on different occasions. Often many compounds are studied and the variability due to using different plates and running experiments on different occasions cannot be ignored. Evidence that such variability exists in the two studies analysed in Section 4 is obvious from the comparison of the results for rays 3 and 8 which correspond to identical drug combinations tested on different batches of plates. There is not enough data in this experiment to obtain estimates of the variability due to this and other sources of variability. Naturally this leads to the need to carefully consider the experimental design for a combination study. However, this issue will be addressed in a separate publication. There are situations when the distribution of the response may not be approximately normal, e.g. the response could be a binary or categorical variable. In such cases, method M3 is still capable of producing correctly the required statistical analysis. This also applies to the case when one or all of the drugs are not full inhibitors. Furthermore, the approach that we recommend, based on programs written in SAS, is flexible and can be tailored to the specific features of any particular combination study. Alternative approaches based on software specifically developed for analysis of combination studies may be a useful tool in the hands of an experienced statistician but should not replace him or her. 6. ACKNOWLEDGEMENT The author of this paper is grateful to Professor William Greco for the invitation to contribute to a timely discussion of the appropriateness of statistical methods that can be used in the analysis of combination data. He is also in debt to Randy Tobias who produced the figures in the paper. 7. REFERENCES 1. Faessel, H. M., H. K. Slocum, R. C. Jackson, T. J. Boritzki, Y. M. Rustum, M. G. Nair, and W. R. Greco: Super in vitro Synergy between Inhibitors of Dihydrofolate Reductase and Inhibitors of Other Folate-requiring Enzymes: The Critical Role of Polyglutamylation. Cancer Res. 58, 3036-3050 (1998) Key Words: Combination index, Drug combination, Interaction index, Ray design, Review Send correspondence to: Alexander N. Donev, School of Mathematics, University of Manchester, Oxford Road, Manchester, M13 9PL, UK; Tel: 44-0-1613699, Fax: 44-0-161275 5819, E-mail:a.n.donev@manchester.ac.uk |