As part of our Journal Club summaries our JC Chairs (Drs. Lisa Calder and Ian Stiell @EMO_Daddy) have been tasked with explaining Epidemiological concepts so that everyone in our department can analyze the literature and appraise articles on their own. For this Blog post we have all the “Epi Lesson” as they relate to “Therapy Articles”. More to follow in the coming weeks.

Absolute risk reduction 

Dr. Ian Stiell  November 2012
This is a very simple but important concept for interpreting the results of an intervention clinical trial. ARR tells us the difference in outcome proportion or percent between the control group and the intervention group. In the HES trial, Table 2 shows us that the primary outcome of death occurred in 18.0% of HES cases and in 17.0% of Saline cases. Hence the absolute difference was -1.0% [17.0 – 18.0 = -1.0] because HES did worse. Relative risk reduction is a little more complicated and we will do that another time. 

Adjustment Analyses in Randomized Trials

Drs. Ian Stiell and Lisa Calder  April 2013
Clinical trials depend upon random allocation to ensure good balance of important baseline characteristics and thus allow a fair comparison between study arms, usually by unadjusted statistical analyses. Commonly, additional secondary analyses will adjust for all important measured covariates using multivariate techniques like logistic regression. These usually confirm the findings of the primary analyses but can only be considered hypothesis generating when the secondary analyses results are different.

Adjustment of Confidence Intervals for Interim Analyses

Dr. C Vaillancourt
Interim analyses are commonly planned in large studies. This can be useful, for example, to ensure patient safety (either by finding a significant benefit or harm early on), or as a cost saving measure by stopping the study early if there is no statistical probability to find a difference between groups later on. There are caveats to interim analyses. It is possible for a perceived “futile” study to become significant if allowed to continue on (lack of power to find a difference early on). It is also possible to find a significant difference early on by chance alone. Because the likelihood to find a difference by chance alone increases every time you analyze the data, there are strategies to account for this (such as the O’Brien Fleming Approach), most of which include raising the bar for statistical significance (i.e. p-value much smaller than 0.05 or using >95%CI instead).

Allocation concealment

Dr. Lisa CalderDecember 2012
Allocation concealment is an important principle in RCT design as it helps ensure that study personnel and clinicians are unaware of how a study intervention or control is assigned. Historically, there have been instances where study personnel or clinicians have attempted to “guess” treatment allocation to ensure their patient gets assigned the “right” study group based on their own clinical biases. The robustness of an RCT is enhanced by clear reporting of how allocation was concealed, and even further if the adequacy of their concealment was evaluated.

Binomial Distribution

Dr. Christian Vaillancourt
“Normal distribution” represents the probability of observing a given value for a continuous variable. “Binomial distribution” is used when observing dichotomous variables which can only take one of two values. It represents the probability of observing one value or the other among n attempts/trials, given a probability p for each attempt. Similarly to the “smooth” normal distribution bell curve, binomial distribution is represented with a series of stepped columns.

Blinding of Treatment Allocation

By: Dr. Lisa CalderJanuary 2013
Open label trials may be promoted as pragmatic trials but a lack of blinding to treatment allocation is a fundamental threat to internal validity. Blinding reduces ascertainment bias (the likelihood of differential assessment of outcome). It is not always possible to undertake blinding in a RCT, especially for surgical or procedural interventions. Drug studies can and must be blinded so readers should be very skeptical when this has not been done. For other studies, ask yourself if it was possible to blind and whether determination of outcome was free of bias.

Clinical Diversity, Methodological Diversity and Statistical Heterogeneity

A meta-analysis may attempt to address a compelling clinical dilemma. But one of the key questions to ask when appraising meta-analyses is whether the pooling of the included studies is appropriate. Clinical diversity (or clinical heterogeneity) reflects differences between study populations, the intervention, co-interventions and/or outcomes when pooling studies in meta-analysis. Methodological diversity (or methodological heterogeneity) is variability in the study designs used and/or risks of biases present. These are distinct from statistical heterogeneity which assesses the variability in the intervention effects being assessed in the included studies. This is a consequence of clinical and/or methodological diversity.  Statistical heterogeneity can be determined by: visually assessing the forest plot, measuring the I2 statistic or the Cochran’s Q. A meta-analysis should be done only when the studies are relatively homogeneous for participants, interventions and outcomes to provide a meaningful summary. Always ask yourself if the meta-analysis is combining apples with apples (good) or creating a fruit salad (bad).

Clinically Important Outcomes

Dr. Ian Stiell
Outcomes may include survival, clinical events (e.g. strokes or myocardial infarction), patient-reported outcomes (e.g. symptoms, quality of life), process outcomes (length of stay, intubation, imaging), adverse events, and economic outcomes (e.g. cost and resource use). Ideally the most important relevant outcome will be the primary outcome of the study, e.g. mortality for cardiac arrest, pain relief for analgesic studies. Be cautious of studies where processes are the primary outcome because these are of little interest to patients or their families. For example, it would be of little consolation to a family to hear that their loved one was intubated on the first pass but subsequently died.

Cohen’s d effect size

Dr. Venkatesh Thiruganasambandamoorthy
Cohen’s d  was used for continuous variables and is the difference of two population means and it is divided by the standard deviation from the data. Values between 0 to 0.3 is a small effect size, if it is between 0.3 and 0.6 it is a moderate effect size, and an effect size bigger than 0.6 is a large effect size. Cramer’s V was used for categorical data uses Chi-square to measure the effect size for nominal data.

Comparator Group

Dr. Jeff Perry April 2019
Without a comparator group, it is impossible to determine how effective a treatment option is. In this study, while mortality is high, this may be much lower than treating with PCC or with plasma. While outcomes can be assessed, in the absence of a comparator group neither associations nor causality relationships can be determined, only hypotheses may be generated.

Concealment versus Blinding

Dr. Ian StiellMarch 2015
These clinical trial terms have different meanings but are often confused. Concealment refers to the processwhereby the treatment allocation is made unknown or concealed prior to patient randomization. This helps prevent selection bias by ensuring that health providers and research staff are not tempted to include or exclude cases according to their views on the allocated treatment. Blinding refers to the methods employed after randomization to ensure that patients, health care providers, and research staff cannot determine whether the patient is receiving the study or the control treatment. This reduces ascertainment bias (the likelihood of differential assessment of outcome).

Contamination in Randomized Trials

Dr. Ian StiellMay 2014
This is a type of bias where there is a mixing of treatments between study groups such that the impact of the intervention is difficult to determine. This is most likely to occur in non-drug trials where the intervention cannot be blinded and relies upon physician involvement, e.g. choosing a treatment protocol. One increasingly popular solution to this problem is to randomize by ‘cluster’ e.g. by hospital site, rather by patient.

Cluster Randomized Controlled Trials

Dr. Ian StiellMay 2012
A cluster randomized trial is a trial in which individuals are randomized in groups (i.e. the group is randomized, not the individual); for example, all patients treated by a particular EMS service or at a particular hospital. Reasons for performing cluster randomized trials vary. Sometimes the intervention can only be administered to the group, for example an addition to the water supply; sometimes the motivation is to avoid contamination amongst health care providers; sometimes the design is simply more convenient or economical. Such trials are often appropriate when the intervention is a psychomotor task (e.g. CPR) but not when the intervention is a drug. Specific sample size and data analytic approaches are required. 

Determining Safety of a Therapeutic Agent

Dr. Lisa CalderMay 2013
Many therapeutic interventions can have rare but important and sometimes fatal adverse effects. The only way to determine the safety of interventions in this case is to conduct large population studies, often via administrative databases (hence phase 4 of clinical trials designed to evaluate drugs).The critical reader will be wary of small RCTs who purport to have demonstrated safety as an outcome.

Difference between the groups used for sample size calculation in RCTs

By: Dr. Venkatesh Thiruganasambandamoorthy
When evaluating sample size calculation for a randomized controlled trial, a key step is to determine if the difference between the two study arms that is used for sample size calculation is clinically important one. Powering a study based on a difference that has no clinically significance will have no practice implication.

Disease-Specific Quality of Life Measurement Tools

Dr. Christian Vaillancourt
Clinicians and researchers often seek to measure “quality of life” in an objective manner. One example of such “Global” health measurement instrument is the SF12-Health Questionnaire. The Western Ontario Shoulder Instability Index is a 21-item, 4-domain “disease-specific” quality of life measure. A lot of work goes into the development of these quality measures, including: 1) clearly defining the population; 2) defining the disease (via literature review, interviews with clinicians and patients), its severity, and treatment options; 3) reducing the number of identified items; 4) pilot testing; and 5) examination of validity, reliability, responsiveness, etc. It is also customary to re-validate these tools when translated into a new language, or when considering their use with a different population.

Effect size

Dr. Venkatesh Thiruganasambandamoorthy
We commonly evaluate the bivariate association (a.k.a. Univariate analysis) of groups of patients to a certain variable or outcome. E.g. what is the strength of association of age to syncope patients with and without arrhythmias? Since consecutive patients are enrolled, when we compare the age among patients with and without arrhythmia, usually there will be significant difference in age with older patients suffering arrhythmias. How will you compare two groups that could potentially be different at the outset (in the study by Cournoyer et al) it is likely that BLS and ACLS groups are likely to be very different. Then you can use effect size to evaluate the difference in their characteristics. (E.g. men are taller than women, the difference between the height of men and the height of women is known as the effect size)

Equivalence or Non-Inferiority Trials

Dr. Ian StiellMay 2012
Most RCTs aim to determine whether one intervention is superior to another (superiority trials). Often a non-significant test of superiority is wrongly interpreted as proof of no difference between the two treatments. By contrast, equivalence trials aim to determine whether one (typically new) intervention is therapeutically similar to another (usually existing) treatment. A non-inferiority trial seeks to determine whether a new treatment is no worse than a reference treatment. Because proof of exact equality is impossible, a pre-stated margin of non-inferiority (delta) for the treatment effect in a primary patient outcome must be defined a priori. Equivalence trials are very similar, except that equivalence is defined as being within pre-stated a two-sided treatment effect. True (2-sided) equivalence therapeutic trials are rare.  

Explanatory versus Pragmatic Clinical Trials

Drs Ian Stiell & Lisa CalderMarch 2012
Trials of healthcare interventions are often described as either explanatory or pragmatic. Explanatory trials generally measure efficacy – the benefit a treatment produces under ideal conditions, often using carefully defined subjects in a research clinic. Pragmatic trials measure effectiveness – the benefit the treatment produces in routine clinical practice. Pragmatic trials generally reflect the reality of how the intervention will perform in everyday care. For more, see

Flow Diagram

Dr. Ian Stiell December 2011
Investigators and editors developed the CONSORT Statement (revised 2010 to improve the reporting of randomized controlled trials (RCTs) by means of a checklist and flow diagram. The flow diagram is intended to depict the passage of participants through an RCT and depicts numbers and explanations from four stages of a trial (enrollment, intervention allocation, follow-up, and analysis). The diagram explicitly shows the number of participants, for each intervention group, included in the primary data analysis.

Geometric Mean

Christian Vaillancourt
A geometric mean is a type of mean or average which can be useful when combining items measured on a different scale. It is obtained by calculating the square root of the product of 2 numbers rather than dividing their sum. If we take the example of a survey where one answer is “3” on a 5-point Likert scale and another answer is “7” on a 10-point Likert scale, the usual/arithmetic “mean answer” would be (3+7)/2=5 whereas the geometric mean would be √(3×7)=4.6…which is slightly less influenced by the magnitude of the answer measured on a 10-point Likert scale.

Hazard Ratio (HR)

Dr. Ian StiellFebruary 2013 
The hazard ratio is akin to relative risk but is used for survival analyses such as Cox proportional hazards regression. It is most often used to describe the outcome of therapeutic trials where the question is, to what extent can treatment shorten the duration of an illness. The hazard ratio is an estimate of the ratio of the hazard rate in the treated versus the control group. For example if there are two groups, group 1 and group 2, HR = 4.5 for treatment means that the risk (of relapse) for group 2 is 4.5 times that of group 1.

Imputation of missing data

Dr. Christian Vaillancourt
Most statistical software discard cases where data is missing, potentially leading to biased results if missingness is not random. Imputation methods allow to keep all cases in the analyses by replacing missing data with an estimated value based on all other available information. Data imputation can be done for predictor variables, but it isn’t appropriate to impute missing data for main outcomes.

Intention-to-treat (ITT) analysis

Drs. Ian Stiell & Lisa CalderJanuary 2015 
Intention-to-treat (ITT) analyses are widely recommended as the preferred approach to the analysis of most clinical trials. The basic intention-to-treat principle is that participants in trials should be analysed in the groups to which they were randomized, regardless of whether they received or adhered to the allocated intervention. This particularly becomes a problem when patients are lost to follow-up and no outcome values are available. Authors must clearly indicate how many patients have such values missing. The alternate approach is a per protocol analysis which only includes patients for whom the protocol was followed.

Interim Analyses and Stopping Rules 

Dr. Ian Stiell  
In clinical trials, an interim analysis is one that is conducted before data collection has been completed to determine if there are safety issues or if the study should be stopped early. These interim analyses are evaluated by an independent Data Safety Monitoring Board that is at arm’s length from the investigators. The DSMB has the authority to recommend early termination if the study intervention is clearly better than control (for benefit) or if there is so little difference between groups that full enrolment will not show a difference (for futility). Statistical stopping rules should be used to adjust the interim P-values to a much severe level, e.g. <0.001 instead of <0.05 using methods described by Pocock and O’Brien & Fleming, among others.

Describing the Strength of Study Results Using “Levels of Evidence

Dr. Christian Vaillancourt
Different methods of classifying levels of evidence have been proposed, most of them relying on the study design, their precision, or their endpoints (e.g. survival with good neurological outcome). The Oxford classification is one commonly quoted where, for e.g., Level 1a is a meta-analyses of RCTs; Level 1b is an individual high-quality RCT; Level 2 includes cohort and low-quality RCTs; Level 3 incudes case-control studies; Level 4 includes case series and low-quality cohort or case-control studies; and Level 5 is expert opinion.

Likert Scales 

By: Dr. Christian Vaillancourt
Rensis Likert was an American social psychologist. His scales continue to be used to measure attitudes, values, opinions, pain, agreement, etc. They can be odd-numbered (including a mid-point) or not (forcing respondents to pick a side). Provide more (0-10) or less (1-5) granularity. They can use “ordinal” variables (e.g. SES= Low-Medium-High) with no fixed interval between categories, “interval” variables (e.g. annual income=$5k-$10k-$15k), or be “in between” (e.g. strongly agree-agree-neutral-disagree-strongly disagree).

Loss to Follow-up

Dr. Lisa CalderDecember 2012
Evaluating loss to follow-up is a helpful tool when assessing RCT validity as it provides the reader with a sense of the integrity of the estimated difference in primary outcome. As a general rule of thumb, any loss to follow-up greater than 20% should lead the reader to become concerned about resulting bias of the main result. The reader should ask themselves: if all the patients who were lost to follow-up had the worst possible outcome, to what degree would this influence the statistical and clinical significance of the main result?

Minimal Clinically Important Difference in Clinical Trials

Dr. Ian StiellSeptember 2014
The sample size of a clinical trial must be adequately powered to show a minimal clinically important difference (MCID) between the intervention and control arms. MCID is the absolute difference in outcome proportions that would have to be shown by the study intervention for clinicians to accept the new treatment as better. In an effort to keep sample size low, investigators sometimes estimate an MCID much larger than is reasonable or use an outcome that is not the most important, e.g. 4-hour survival rather than survival to discharge.


Dr. Christian Vaillancourt
The purpose of randomization is to minimize imbalance between groups. Sometimes, we know certain factors are likely to influence outcomes and ought to be equally distributed (e.g. male, female). A strategy to ensure this stratifies participants according to important factors, then uses separate randomization lists for each sub-groups. This becomes impractical when a large number of factors need to be taken into account. Minimization calculates the imbalance between groups that would result from a particular assignment, and uses a strategy favoring assignment to the group that would minimize this imbalance between comparison groups.

Modified Intention-to-treat (M-ITT) Analyses

Dr. Venkatesh Thiruganasambandamoorthy

Intention-to-treat (ITT) analyses are widely recommended as the preferred approach to the analysis of most clinical trials. The basic intention-to-treat principle is that participants in trials should be analysed in the groups to which they were randomized, regardless of whether they received or adhered to the allocated intervention, crossed over to other treatments, or were withdrawn from the study. Post-randomization exclusions may be acceptable when patients are inappropriately randomized into a clinical trial or when pre-randomization information on patients’ eligibility status is not available at the time of randomization. Such an approach is known as “modified intention-to-treat” analysis and must be pre-specified in the protocol. M-ITT is most likely to be seen in RCTs of critical situations, e.g. cardiac arrest.

Multiple Arm Clinical Trials

Dr. Ian StiellApril 2013
Multiple-arm randomized trials can be more complex in their design, data analysis, and result reporting than two-arm trials. In an RCT with three arms, there are seven theoretically possible comparisons so it is important that the investigators define a priori which comparisons are of primary interest and whether they will assess global differences between all arms and/or will assess pair-wise differences of 2 arms at a time.

Multiple Comparisons and Statistical Significance

Dr. Christian Vaillancourt
It is not uncommon for a manuscript to report several secondary outcomes. The number of secondary comparisons is directly proportional to the chance that one of them will end-up being statistically significant by chance alone. To account for this, statisticians should make it proportionally more difficult to find such a statistical difference. The Bonferroni correction suggests that the level of significance (alpha error, 0.05) should be divided by the number of comparisons made i.e. 0.05/5 comparisons = new alpha of 0.01.

Non-inferiority trials

Dr. Lisa CalderJanuary 2013
Non-inferiority trials are distinct from superiority trials such that they are designed to determine whether a given intervention is non-inferior by a pre-specified margin compared to a control. This is not the same as equivalence and a key section of the methods to examine is the sample size calculation where the non-inferiority margin is specified. Ideally, researchers explain how this margin was determined (based on previous placebo controlled trials, consensus of experts). The critical reader will ask themselves if they feel this margin is truly clinically significant.

Number Needed to Treat (NNT)

Dr. Ian Stiell  May 2015
The NNT concept was created by Canadian Clinical Epidemiologist Dr Andreas Laupacis in 1988 to quantify the benefit of a new intervention.NNT is the average number of patients who need to be treated to prevent one additional bad outcome (e.g. the number of patients that need to be treated for one to benefit compared with a control in a clinical trial). It is easily calculated as the inverse of the absolute risk reduction (1/ARR). The higher the NNT the less effective the treatment.
An Assessment of Clinically Useful Measures of the Consequences of Treatment

Patient Registries

By: Dr. Christian Vaillancourt
Registries collect uniform/standardized/pre-defined demographic and clinical information on systems of care and patients suffering from a common condition. Registries are particularly helpful to conduct longitudinal observational studies. They can, on occasion, include additional data collected as part of an interventional trial. Examples of such registries include the Canadian Cancer Registry, the Canadian Cystic Fibrosis Registry, and the Canadian Resuscitation Outcomes Consortium (CanROC) Registry.

Phases of a clinical trial

Dr. Ian StiellNovember 2012 
Clinical trials involving new drugs are classified into four phases with Health Canada and the FDA generally requiring a drug to have passed through Phase 3 before general approval. Phase 1 trials test the treatment in a small group of healthy people (20-80) to evaluate its safety, dosage range, and side effects. Phase 2 trials give the treatment to patients and in larger numbers (100-300) to evaluate effectiveness and safety. Phase 3 trials give the treatment to large groups of paitents (1,000-3,000) to confirm its effectiveness, monitor side effects, and compare to commonly used treatments. Phase 4 trials are post-marketing studies to determine additional information about side effects and risks. [2a trials studies focus on proving the hypothesized mechanism of action while the larger 2b trials seek to determine the optimum dose] 

Post-Randomization Exclusions

Dr. Ian StiellMay 2014
It is widely accepted that the primary analysis of data in a randomized clinical trial should compare patients according to the group to which they were randomly allocated, regardless of patients’ compliance, crossover to other treatments, or withdrawal from the study. Such an analysis is referred to as an intention to treat or an “as randomized” analysis. Exclusions, however, may be acceptable when patients are inappropriately randomized into a clinical trial or when pre-randomization information on patients’ eligibility status is not available at the time of randomization. Such an approach is known as “modified intention-to-treat” analysis and is most likely to be seen in RCTs of critical situations, e.g. cardiac arrest.

Pre-specified and Post-hoc Subgroup Analyses 

Dr. Ian Stiell May 2014
Subgroup analyses involve splitting all the participant data into smaller subsets of subjects (e.g. males and females), so as to make comparisons between them. A pre-specified subgroup analysis is one that is planned and documented before any examination of the data, preferably in the study protocol. Post-hoc analyses are those planned only after examination of the results. Such analyses are of particular concern because it is often unclear how many were undertaken and whether some were motivated by inspection of the data (data-dredging). However, both pre-specified and post-hoc subgroup analyses are subject to inflated false positive rates arising from multiple testing. Subgroup analyses are often under-powered and are best used to generate new hypotheses that can be tested in future trials.

Precision in RCTs

Dr. Lisa CalderOctober 2012
When assessing precision of estimates for RCTs, 95% confidence intervals are most helpful. Precision can be considered the range in which the best estimates of a true value approximate the true value. Interquartile ranges for medians tell you the spread of the data for the sample used for the study, but does not give you an estimate of the probability that the true estimate falls within the range obtained.

Propensity Score Matching

Dr. Ian Stiell   September 2012
 In the statistical analysis of observational datapropensity score matching is one of a family of multivariate statistical techniques that attempts to estimate the effect of a treatment, policy, or other intervention by accounting for the covariates that predict receiving the treatment. Compared to the gold standard of a randomized controlled trial, any observational analysis and interpretation of the usefulness of an intervention must be viewed with a large degree of healthy skepticism.

Random sampling vs Randomization

Dr. Lisa Calder   June 2015
While in epidemiology we frequently use the word “random”, there is sometimes confusion about its application. When a clinical population is randomly sampled, the goal is to ensure a representative sample. If you pre-select a sample of patients using inclusion and exclusion criteria and randomize these to a given intervention and control, you do not necessarily have a representative sample. Selection bias can still occur pre-randomization.

Randomization by Pocock minimization algorithm

Dr. Venkatesh Thiruganasambandamoorthy
A random allocation of patients to treatment-control group (by sealed envelopes etc.) generally leads to balanced groups but can lead to differences in the groups on some aspects (more males or obese patients in one arm than the other). If important factors of identified (e.g. sex, obesity) then the patients could be stratified based on sex and BMI. Using block randomization a list is created for a block of x patients to be equally assigned to the study arms based on the important factors identified. If there a large number of important factors, then the block randomization becomes extremely complex. Pocock and Simon adaptive stratified sampling algorithm can be used to calculate the imbalance between the groups based on each factor and add an additional random element to assignment of the next patient.

Randomization Procedures 

Dr. Lisa Calder   March 2013
When reviewing a randomized trial, it is critical to determine how the randomization was conducted as not all randomization schemes are created equal.  Proper randomization uses either computer generated randomization or random tables. Pseudo-randomization includes studies where patients are allocated based on alternating days of the week or date of birth for example. The reader can verify that randomization was conducted appropriately by examining table 1 of participant characteristics and determine whether the groups appear to be balanced.

RCT Sample Size Calculation  

Dr. Lisa Calder  October 2012
When calculating a sample size for a randomized controlled trial, a key step is to determine the MCID: minimally clinically important difference. By powering your trial towards this difference, not only will you seek a statistically significant difference in effect but also a clinically significant one. It is important as a critical appraiser to evaluate whether you agree that the MCID is truly clinically significant. 

Sample Size in Clinical Trials

Dr. Ian Stiell   June 2013
All intervention studies should indicate how the sample size was estimated including the desired alpha error (usually 0.05), power (usually 80-90%), and expected outcome rate in the control group. Most important is a statement of the minimal clinically important difference (MCID) that would have to be shown by the study intervention for clinicians to accept the new treatment as better. In an effort to keep sample size low, investigators sometimes estimate an MCID much larger than is reasonable

Selection Bias and Randomization  

Dr. Lisa Calder  April 2012 
Even though a clinical trial is randomized, this does not mean it cannot be subject to selection bias. Always look at the study flow figure (generally figure 1) to determine how many eligible patients were not included then assess whether the authors reasonably explain why these eligible but excluded patients were not systematically different from those who were randomized.

Stratification of Randomization by Timing of Enrolment 

Lisa Calder  April 2014
Block randomization offers the benefit of ensuring overall balance of groups when you have multiple centers or clinically defined subgroups. Another approach is to randomize by the timing of enrolment when this could influence the outcome. In this study, the authors stratified their enrolment to account for early and later enrolments given that sepsis is a time sensitive condition. The sensitivity analysis for these strata reassure the reader that the overall observed effect was not influenced by timing of enrolment.

Survival Analysis  

Dr. Ian Stiell      February 2013
Survival analyses are used in clinical trials that follow patients over time for primary outcomes such as death, relapse, adverse drug reaction, or development of a new disease. The follow-up time may range from hours to years and a different set of statistical procedures are employed to analyze the data. Terms frequently seen in papers with survival analyses include Cox proportional hazard model, hazard ratio, Kaplan-Meir curve.

Surrogate endpoints 

Dr. Venkatesh Thiruganasambandamoorthy
Can be used as a measure of effect for specific treatments and might correlate with clinical outcomes. In the RE-VERSE AD (idarucizumab for dabigatran reversal) study the investigators used dilute thrombin time and ecarin clotting time as surrogate end points for reversal of dabigatran action by the study drug idarucizumab. The actual clinical end point of restoration of hemostasis was a secondary outcome. This was a small study. We need a large clinical study with a control group to confirm that clinical outcomes among patients treated with the reversal agent were better. Be wary of studies using surrogate outcomes when clinical outcomes could have been used.

Stopping Clinical Trials after Interim Analyses

Interim analyses are commonly planned in large studies. The overwhelming majority of researchers and clinicians discourage the concept of stopping clinical trial based on interim analysis. Studies that were ‘negative’ during interim analysis have later turned out to be positive and vice-versa. Using interim analysis increases the chance of type-1 error (concluding that there is treatment effect when there is none; false positive) in clinical studies. There are approaches to reduce type 1 error: O’Brien-Fleming, Haybittle-Peto, or Pocock. Essential each one of them increase the threshold of statistical significance (i.e. p-value must be much lower than 0.05) during the interim analysis for stopping the study.

Use of Continuous Data as Primary Outcome  

Dr. Ian Stiell  June 2013
Beware of studies that compare the effectiveness of interventions by using continuous data outcomes, such as pain scales (1-100), oxygen saturation values, and minutes to pain relief. These kinds of data can produce statistically significant differences between groups with relatively small sample sizes but often give you little information about clinical importance. Far better and almost always the norm are outcome measures given as proportions or percentages, such as % of patients who achieve: 20 points improvement in pain, an oxygen saturation of 90%, pain relief in less than 2 hours, or survival.

Validation of Measurement Tools 

Dr. Lisa Calder    June 2014
When investigators state they used a validated tool, this means that the tool has been evaluated to determine whether it accurately measures what it aims to measure. Two important components of validity include face validity (experts endorse that the tool is logically designed to measure a given construct) and content validity (the tool comprehensively includes all possible dimensions of a given construct). These are distinct from reliability which indicates that the tool consistently measures a given construct (usually by more than one user e.g. inter-rater reliability)