Absolute risk reduction
Dr. Ian Stiell November 2012
This is a very simple but important concept for interpreting the results of an intervention clinical trial. ARR tells us the difference in outcome proportion or percent between the control group and the intervention group. In the HES trial, Table 2 shows us that the primary outcome of death occurred in 18.0% of HES cases and in 17.0% of Saline cases. Hence the absolute difference was -1.0% [17.0 – 18.0 = -1.0] because HES did worse. Relative risk reduction is a little more complicated and we will do that another time.
Adjustment Analyses in Randomized Trials
Adjustment of Confidence Intervals for Interim Analyses
Allocation concealment is an important principle in RCT design as it helps ensure that study personnel and clinicians are unaware of how a study intervention or control is assigned. Historically, there have been instances where study personnel or clinicians have attempted to “guess” treatment allocation to ensure their patient gets assigned the “right” study group based on their own clinical biases. The robustness of an RCT is enhanced by clear reporting of how allocation was concealed, and even further if the adequacy of their concealment was evaluated.
Dr. Christian Vaillancourt
“Normal distribution” represents the probability of observing a given value for a continuous variable. “Binomial distribution” is used when observing dichotomous variables which can only take one of two values. It represents the probability of observing one value or the other among n attempts/trials, given a probability p for each attempt. Similarly to the “smooth” normal distribution bell curve, binomial distribution is represented with a series of stepped columns.
Blinding of Treatment Allocation
Clinical Diversity, Methodological Diversity and Statistical Heterogeneity
Clinically Important Outcomes
Cohen’s d effect size
Dr. Venkatesh Thiruganasambandamoorthy
Cohen’s d was used for continuous variables and is the difference of two population means and it is divided by the standard deviation from the data. Values between 0 to 0.3 is a small effect size, if it is between 0.3 and 0.6 it is a moderate effect size, and an effect size bigger than 0.6 is a large effect size. Cramer’s V was used for categorical data uses Chi-square to measure the effect size for nominal data.
Concealment versus Blinding
Dr. Ian StiellMarch 2015
These clinical trial terms have different meanings but are often confused. Concealment refers to the processwhereby the treatment allocation is made unknown or concealed prior to patient randomization. This helps prevent selection bias by ensuring that health providers and research staff are not tempted to include or exclude cases according to their views on the allocated treatment. Blinding refers to the methods employed after randomization to ensure that patients, health care providers, and research staff cannot determine whether the patient is receiving the study or the control treatment. This reduces ascertainment bias (the likelihood of differential assessment of outcome).
Contamination in Randomized Trials
Cluster Randomized Controlled Trials
Dr. Ian StiellMay 2012
A cluster randomized trial is a trial in which individuals are randomized in groups (i.e. the group is randomized, not the individual); for example, all patients treated by a particular EMS service or at a particular hospital. Reasons for performing cluster randomized trials vary. Sometimes the intervention can only be administered to the group, for example an addition to the water supply; sometimes the motivation is to avoid contamination amongst health care providers; sometimes the design is simply more convenient or economical. Such trials are often appropriate when the intervention is a psychomotor task (e.g. CPR) but not when the intervention is a drug. Specific sample size and data analytic approaches are required.
Determining Safety of a Therapeutic Agent
Difference between the groups used for sample size calculation in RCTs
By: Dr. Venkatesh Thiruganasambandamoorthy
When evaluating sample size calculation for a randomized controlled trial, a key step is to determine if the difference between the two study arms that is used for sample size calculation is clinically important one. Powering a study based on a difference that has no clinically significance will have no practice implication.
Disease-Specific Quality of Life Measurement Tools
Dr. Christian Vaillancourt
Clinicians and researchers often seek to measure “quality of life” in an objective manner. One example of such “Global” health measurement instrument is the SF12-Health Questionnaire. The Western Ontario Shoulder Instability Index is a 21-item, 4-domain “disease-specific” quality of life measure. A lot of work goes into the development of these quality measures, including: 1) clearly defining the population; 2) defining the disease (via literature review, interviews with clinicians and patients), its severity, and treatment options; 3) reducing the number of identified items; 4) pilot testing; and 5) examination of validity, reliability, responsiveness, etc. It is also customary to re-validate these tools when translated into a new language, or when considering their use with a different population.
Dr. Venkatesh Thiruganasambandamoorthy
We commonly evaluate the bivariate association (a.k.a. Univariate analysis) of groups of patients to a certain variable or outcome. E.g. what is the strength of association of age to syncope patients with and without arrhythmias? Since consecutive patients are enrolled, when we compare the age among patients with and without arrhythmia, usually there will be significant difference in age with older patients suffering arrhythmias. How will you compare two groups that could potentially be different at the outset (in the study by Cournoyer et al) it is likely that BLS and ACLS groups are likely to be very different. Then you can use effect size to evaluate the difference in their characteristics. (E.g. men are taller than women, the difference between the height of men and the height of women is known as the effect size)
Equivalence or Non-Inferiority Trials
Explanatory versus Pragmatic Clinical Trials
Drs Ian Stiell & Lisa CalderMarch 2012
Trials of healthcare interventions are often described as either explanatory or pragmatic. Explanatory trials generally measure efficacy – the benefit a treatment produces under ideal conditions, often using carefully defined subjects in a research clinic. Pragmatic trials measure effectiveness – the benefit the treatment produces in routine clinical practice. Pragmatic trials generally reflect the reality of how the intervention will perform in everyday care. For more, see
Investigators and editors developed the CONSORT Statement (revised 2010 ) to improve the reporting of randomized controlled trials (RCTs) by means of a checklist and flow diagram. The flow diagram is intended to depict the passage of participants through an RCT and depicts numbers and explanations from four stages of a trial (enrollment, intervention allocation, follow-up, and analysis). The diagram explicitly shows the number of participants, for each intervention group, included in the primary data analysis.
Hazard Ratio (HR)
Dr. Ian StiellFebruary 2013
The hazard ratio is akin to relative risk but is used for survival analyses such as Cox proportional hazards regression. It is most often used to describe the outcome of therapeutic trials where the question is, to what extent can treatment shorten the duration of an illness. The hazard ratio is an estimate of the ratio of the hazard rate in the treated versus the control group. For example if there are two groups, group 1 and group 2, HR = 4.5 for treatment means that the risk (of relapse) for group 2 is 4.5 times that of group 1.
Intention-to-treat (ITT) analysis
Drs. Ian Stiell & Lisa CalderJanuary 2015
Intention-to-treat (ITT) analyses are widely recommended as the preferred approach to the analysis of most clinical trials. The basic intention-to-treat principle is that participants in trials should be analysed in the groups to which they were randomized, regardless of whether they received or adhered to the allocated intervention. This particularly becomes a problem when patients are lost to follow-up and no outcome values are available. Authors must clearly indicate how many patients have such values missing. The alternate approach is a per protocol analysis which only includes patients for whom the protocol was followed.
Interim Analyses and Stopping Rules
Describing the Strength of Study Results Using “Levels of Evidence”
Dr. Christian Vaillancourt
Different methods of classifying levels of evidence have been proposed, most of them relying on the study design, their precision, or their endpoints (e.g. survival with good neurological outcome). The Oxford classification is one commonly quoted where, for e.g., Level 1a is a meta-analyses of RCTs; Level 1b is an individual high-quality RCT; Level 2 includes cohort and low-quality RCTs; Level 3 incudes case-control studies; Level 4 includes case series and low-quality cohort or case-control studies; and Level 5 is expert opinion.
By: Dr. Christian Vaillancourt
Rensis Likert was an American social psychologist. His scales continue to be used to measure attitudes, values, opinions, pain, agreement, etc. They can be odd-numbered (including a mid-point) or not (forcing respondents to pick a side). Provide more (0-10) or less (1-5) granularity. They can use “ordinal” variables (e.g. SES= Low-Medium-High) with no fixed interval between categories, “interval” variables (e.g. annual income=$5k-$10k-$15k), or be “in between” (e.g. strongly agree-agree-neutral-disagree-strongly disagree).
Loss to Follow-up
Evaluating loss to follow-up is a helpful tool when assessing RCT validity as it provides the reader with a sense of the integrity of the estimated difference in primary outcome. As a general rule of thumb, any loss to follow-up greater than 20% should lead the reader to become concerned about resulting bias of the main result. The reader should ask themselves: if all the patients who were lost to follow-up had the worst possible outcome, to what degree would this influence the statistical and clinical significance of the main result?
Minimal Clinically Important Difference in Clinical Trials
Modified Intention-to-treat (M-ITT) Analyses
Dr. Venkatesh Thiruganasambandamoorthy
Intention-to-treat (ITT) analyses are widely recommended as the preferred approach to the analysis of most clinical trials. The basic intention-to-treat principle is that participants in trials should be analysed in the groups to which they were randomized, regardless of whether they received or adhered to the allocated intervention, crossed over to other treatments, or were withdrawn from the study. Post-randomization exclusions may be acceptable when patients are inappropriately randomized into a clinical trial or when pre-randomization information on patients’ eligibility status is not available at the time of randomization. Such an approach is known as “modified intention-to-treat” analysis and must be pre-specified in the protocol. M-ITT is most likely to be seen in RCTs of critical situations, e.g. cardiac arrest.
Multiple Arm Clinical Trials
Dr. Ian StiellApril 2013
Multiple-arm randomized trials can be more complex in their design, data analysis, and result reporting than two-arm trials. In an RCT with three arms, there are seven theoretically possible comparisons so it is important that the investigators define a priori which comparisons are of primary interest and whether they will assess global differences between all arms and/or will assess pair-wise differences of 2 arms at a time.
Multiple Comparisons and Statistical Significance
It is not uncommon for a manuscript to report several secondary outcomes. The number of secondary comparisons is directly proportional to the chance that one of them will end-up being statistically significant by chance alone. To account for this, statisticians should make it proportionally more difficult to find such a statistical difference. The Bonferroni correction suggests that the level of significance (alpha error, 0.05) should be divided by the number of comparisons made i.e. 0.05/5 comparisons = new alpha of 0.01.
Non-inferiority trials are distinct from superiority trials such that they are designed to determine whether a given intervention is non-inferior by a pre-specified margin compared to a control. This is not the same as equivalence and a key section of the methods to examine is the sample size calculation where the non-inferiority margin is specified. Ideally, researchers explain how this margin was determined (based on previous placebo controlled trials, consensus of experts). The critical reader will ask themselves if they feel this margin is truly clinically significant.
Number Needed to Treat (NNT)
An Assessment of Clinically Useful Measures of the Consequences of Treatment
By: Dr. Christian Vaillancourt
Registries collect uniform/standardized/pre-defined demographic and clinical information on systems of care and patients suffering from a common condition. Registries are particularly helpful to conduct longitudinal observational studies. They can, on occasion, include additional data collected as part of an interventional trial. Examples of such registries include the Canadian Cancer Registry, the Canadian Cystic Fibrosis Registry, and the Canadian Resuscitation Outcomes Consortium (CanROC) Registry.
Phases of a clinical trial
Dr. Ian StiellNovember 2012
Clinical trials involving new drugs are classified into four phases with Health Canada and the FDA generally requiring a drug to have passed through Phase 3 before general approval. Phase 1 trials test the treatment in a small group of healthy people (20-80) to evaluate its safety, dosage range, and side effects. Phase 2 trials give the treatment to patients and in larger numbers (100-300) to evaluate effectiveness and safety. Phase 3 trials give the treatment to large groups of paitents (1,000-3,000) to confirm its effectiveness, monitor side effects, and compare to commonly used treatments. Phase 4 trials are post-marketing studies to determine additional information about side effects and risks. [2a trials studies focus on proving the hypothesized mechanism of action while the larger 2b trials seek to determine the optimum dose]
Pre-specified and Post-hoc Subgroup Analyses
Precision in RCTs
Propensity Score Matching
Random sampling vs Randomization
Dr. Lisa Calder June 2015
While in epidemiology we frequently use the word “random”, there is sometimes confusion about its application. When a clinical population is randomly sampled, the goal is to ensure a representative sample. If you pre-select a sample of patients using inclusion and exclusion criteria and randomize these to a given intervention and control, you do not necessarily have a representative sample. Selection bias can still occur pre-randomization.
Randomization by Pocock minimization algorithm
Dr. Lisa Calder March 2013
When reviewing a randomized trial, it is critical to determine how the randomization was conducted as not all randomization schemes are created equal. Proper randomization uses either computer generated randomization or random tables. Pseudo-randomization includes studies where patients are allocated based on alternating days of the week or date of birth for example. The reader can verify that randomization was conducted appropriately by examining table 1 of participant characteristics and determine whether the groups appear to be balanced.
RCT Sample Size Calculation
When calculating a sample size for a randomized controlled trial, a key step is to determine the MCID: minimally clinically important difference. By powering your trial towards this difference, not only will you seek a statistically significant difference in effect but also a clinically significant one. It is important as a critical appraiser to evaluate whether you agree that the MCID is truly clinically significant.
Sample Size in Clinical Trials
Selection Bias and Randomization
Stratification of Randomization by Timing of Enrolment
Block randomization offers the benefit of ensuring overall balance of groups when you have multiple centers or clinically defined subgroups. Another approach is to randomize by the timing of enrolment when this could influence the outcome. In this study, the authors stratified their enrolment to account for early and later enrolments given that sepsis is a time sensitive condition. The sensitivity analysis for these strata reassure the reader that the overall observed effect was not influenced by timing of enrolment.
Survival analyses are used in clinical trials that follow patients over time for primary outcomes such as death, relapse, adverse drug reaction, or development of a new disease. The follow-up time may range from hours to years and a different set of statistical procedures are employed to analyze the data. Terms frequently seen in papers with survival analyses include Cox proportional hazard model, hazard ratio, Kaplan-Meir curve.
Can be used as a measure of effect for specific treatments and might correlate with clinical outcomes. In the RE-VERSE AD (idarucizumab for dabigatran reversal) study the investigators used dilute thrombin time and ecarin clotting time as surrogate end points for reversal of dabigatran action by the study drug idarucizumab. The actual clinical end point of restoration of hemostasis was a secondary outcome. This was a small study. We need a large clinical study with a control group to confirm that clinical outcomes among patients treated with the reversal agent were better. Be wary of studies using surrogate outcomes when clinical outcomes could have been used.
Stopping Clinical Trials after Interim Analyses
Interim analyses are commonly planned in large studies. The overwhelming majority of researchers and clinicians discourage the concept of stopping clinical trial based on interim analysis. Studies that were ‘negative’ during interim analysis have later turned out to be positive and vice-versa. Using interim analysis increases the chance of type-1 error (concluding that there is treatment effect when there is none; false positive) in clinical studies. There are approaches to reduce type 1 error: O’Brien-Fleming, Haybittle-Peto, or Pocock. Essential each one of them increase the threshold of statistical significance (i.e. p-value must be much lower than 0.05) during the interim analysis for stopping the study.
Use of Continuous Data as Primary Outcome
Beware of studies that compare the effectiveness of interventions by using continuous data outcomes, such as pain scales (1-100), oxygen saturation values, and minutes to pain relief. These kinds of data can produce statistically significant differences between groups with relatively small sample sizes but often give you little information about clinical importance. Far better and almost always the norm are outcome measures given as proportions or percentages, such as % of patients who achieve: 20 points improvement in pain, an oxygen saturation of 90%, pain relief in less than 2 hours, or survival.