by guest » Sat May 08, 2004 9:45 am
A Review of Clinical Trials: Design and Expected Outcomes
ACVIM 2003
Marlene Hauck, DVM, PhD, DACVIM
Raleigh, NC
INTRODUCTION
The advancement of knowledge in veterinary oncology relies, in part, upon the testing of new (or old) therapies in clinical trials. Just as important is our ability, as the consumer of the veterinary journals reporting these trials, to understand the limits of different types of clinical trials and to avoid the over interpretation of data presented in such studies. The goal of this presentation is to briefly review the different types of clinical trials that are published in the veterinary literature, discuss the basics of trial design and the types of outcomes that can be assessed.
FEASIBILITY STUDIES
These studies are often the first step in the introduction of a new method of treatment in veterinary oncology. The purpose of a feasibility study is not to demonstrate that a treatment is effective, but instead to show that it can be done. Feasibility studies (sometime called "proof of principle") typically report the new methods in detail, with emphasis on repeatability and achievement of endpoints. An example of this type of study is a recent publication by Thrall et al. entitled: Using Units of CEM 43°C T90, local hyperthermia thermal dose can be delivered as prescribed. (Int J Hyperthermia 16:415-428, 2000). This paper described the techniques used to deliver a prescribed heat dose to a tumor, how heat dose was measured and how precise was the delivery. Feasibility studies are often required before any type of clinical trial is undertaken. These types of studies make no effort to determine if a new treatment is beneficial.
DOSE-FINDING STUDIES
Often called phase I trials, dose-ranging or schedule finding, dose-finding studies are designed to determine the optimum dose or schedule of a new therapeutic agent. The classical definition is a study designed to identify the side effects and the maximum dose that can be safely administered. With traditional cytotoxic agents, this type of study is relatively straightforward--the maximum tolerated dose is that at which a predefined level of toxicity is seen. With newer agents that act via non-traditional pathways, the maximum biologically effective dose is the endpoint desired, and the means by which this is measured are more complex. But the goals of the dose-finding study are ultimately the same: determination of the dose to be used in future safety and efficacy trials, the types of toxicity that can be expected and the pharmacokinetics of the new agent. These studies do not have patient benefit as a goal or as an endpoint (other than toxicity). For ethical reasons, patients entered on dose-finding studies are typically those for whom no definitive treatment exists. In veterinary medicine, this sometimes includes patients for whom this is the only financially available treatment.
When reading a report of a dose-finding study, the reader should be able to clearly understand the rationale of the starting dose, the dose escalation scheme, the toxicity measurement scale, number of patients in a cohort (traditionally at least 3) and the means by which the MTD is determined (usually one dose below that which >2 of 3 or >2 of 6 patients experience DLT). Information on the pharmacokinetics and/or pharmacodynamics of the drug is also a critical component of this type of trial. The types of toxicities seen also need to be reported.
SAFETY AND EFFICACY TRIALS
These trials are also known as phase II clinical trials. The goals of these trials are to determine if a new treatment is effective and to further define the toxicity of this new treatment in a wider population. Pharmacokinetics may also be a part of safety and efficacy trials. The optimal dose and schedule may not be know for a new agent (despite dose-finding studies) so multiple safety and efficacy trials may be performed on a novel agent. The cumulative toxicities and effects of multiple doses on PKs are often unknown as well, and these are studied as a part of these trials.
Phase II trials are typically performed on a defined patient population (disease), with a specified intervention (treatment) and a defined outcome of interest (how efficacy is determined). Prior to the start of the trial, the definition of effective must be made--i.e., what response rate indicates a new agent has activity against a given disease. The patient population for safety and efficacy studies is typically patients who have failed front-line therapy and for whom definitive treatment is unavailable. Alternatively, safety and efficacy trials can be performed with patients with extremely poor prognoses. The types of outcomes used to assess efficacy include intermediate endpoints, such as tumor response, serum marker levels, 6-month survival etc. These are intermediate endpoints because they may or may not be predictive of increases in patient survival. Quality of life measurements are also used in phase II trials. The critical aspect of the intermediate endpoint used in a safety and efficacy trial is that it should be unambiguously associated with clinical improvement. Therefore short-term tumor shrinkage is not usually a valid endpoint for a safety and efficacy trial unless it absolutely results in clinical improvement of the patient.
In addition to the above parameters, a safety and efficacy study should be hypothesis based. Traditionally, this includes a null hypothesis (response rate is 'low' and further evaluation is not warranted) and an alternative hypothesis (response rate is 'high' and further evaluation is warranted). Low and high rates of response are defined based upon the patient population under study. Ideally, these studies also include early stopping rules in the instance of a low response rate. It is also important that the sample size tested be adequate to draw accurate conclusions about the efficacy of a new agent. There are alternative designs for phase II trials, including 'randomized' trials where patients are assigned to different treatment groups. The goal with this type of study is not the formal comparison of the different treatment groups, but the selection of the most promising agent for further study.
When reading a report of a safety and efficacy trial, the appropriate definitions are important--patient population, intervention tested, endpoint measured and definitions of low and high 'response' rates. If early stopping rules are employed, these should be clearly spelled out. Overinterpretation of the results of safety and efficacy trials is not unusual, and must be cautioned against. These types of trials do not predict the superiority of one treatment over another; their goal is to identify promising therapeutics for further study. In this type of study, the use of a historical control group is for the purpose of defining a 'high' rate of response, not to demonstrate superiority of the new treatment versus the old treatment. These types of questions are answered with the next type of trial:
COMPARATIVE EFFICACY TRIALS
These trials, also known as phase III trials, are exactly what they are entitled: they compare the efficacy of a new treatment versus a standard treatment (or no treatment, depending upon the disease). These types of trials are involve randomly assigning patients with a given to disease to two or more treatment groups, then comparing the impact of the treatment on survival. These trials are the only trials that can tell, with some degree of confidence, whether one treatment is better than another. These trials involve many patients (since improvements in survival are typically quite low, large numbers of patients must be treated in order to detect the differences), multiple institutions (to allow accrual at a reasonable rate) and long term follow-up. Due to the high numbers of patients and expense of performing these types of trials, they are currently rarely performed in veterinary medicine. This may change as veterinary oncology becomes more sophisticated in our approach to the development of new treatments, but the dearth of comparative efficacy trials should not allow us to claim that phase II trials are adequate to make treatment decisions regarding the "best" treatment choice (at least not without understanding the inaccuracies behind that choice).
THE MAGIC p < 0.05 LEVEL OF SIGNIFICANCE
Early in our training we are told that if something has a p-value <0.05 it is 'statistically significant', different from whatever it is being compared with and that statement of difference carries a 95% chance of being correct. This is true in a study designed to test exactly one pair of variables, but after that it gets somewhat less decisive. In a brief communication by Ian F. Tannock in the JNCI, Dr. Tannock suggests that there are at least 3 factors that result in a report of a clinical trial being a false positive: 1) publication bias (more likely that a positive trial will be published); 2) the low probability that a new treatment will result in significant therapeutic gain which implies that the prevalence of true positives is low; and 3) the performance of multiple significance tests. When you consider that, by chance alone, 1 in 20 tests for significance will be positive, one begins to understand why 'significant' results must be carefully evaluated.
Twenty tests may seem like a lot, but when you consider that many comparative efficacy trials have multiple endpoints, perform subgroup analyses or serially test trial participants, the numbers rise quickly. This is an even greater problem if the trial endpoints are not clearly defined initially, which may lead the author of a study to present the most 'impressive' results as the primary endpoint. In addition, without predefined stopping points and total accrual numbers, positive interim results may not be confirmed with additional patients. In the report by Tannock mentioned above, an assessment was made of the number of comparisons (both reported and unreported) made in 32 published human clinical trials. His estimate of the median number of statistical comparisons performed (and not necessarily reported) was 86 per publication. When he included serial reports (abstracts) of these trials as well as the published report, the median estimated number of statistical comparisons was 95.
In veterinary medicine, particularly with retrospective analyses, we are no better at resisting the impulse to test a large number of descriptors to see if they 'significantly' predict response, survival, toxicity etc. There is nothing inherently wrong in performing these tests--we need to remember, however, that these results should be used to generate new hypotheses that can then be tested prospectively. The reader is required to assess the degree of confidence that can be place in statistically significant results and ultimately what these results mean clinically.
SUMMARY
Publishing the results of clinical research is the only means by which the veterinary profession will see progress in medicine. As consumers of published reports, we must be cautious in the interpretation of results that is possibly beyond the nature of the study design. It is becoming much more common (and a great step forward) for statisticians to be involved in the design and analysis of clinical trials and their inclusion is likely to improve the quality of the data and reporting as well.
REFERENCES
1. Much of this information was presented by many clinical trialists in a workshop on clinical trial design presented by AACR/ASCO from July 29th-August 4, 2000.
2. Tannock, Ian F. (1996), "False-positive results in clinical trials: multiple significance tests and the problem of unreported comparisons", Journal of the National Cancer Institute, 88:206-207.
3. Tannock, Ian F. (1992), "Some problems related to the design and analysis of clinical trials", International Journal of Radiation Oncology, Biology, Physics, 22:881-5.
4. Pocock, Stuart J., Hughes, Michael D. and Lee, Robert J. (1987), "Statistical problems in the reporting of clinical trials", The New England Journal of Medicine, 317:426-32.
Speaker Information
(click the speaker's name to view other papers and abstracts submitted by this speaker)
Marlene L. Hauck, DVM, Ph.D., DACVIM
Clinical Sciences, CVM/Box 8401
North Carolina State University
4700 Hillsborough St.
Raleigh, NC 27606
Funded by National Institutes of Health