Go to Blog

The Three Deadly Threats to a Clinical Trial
Dare to risk it?

The Three Deadly Threats to a Clinical Trial

Dare to risk it?

Introduction

Clinical trials require careful planning, and the first substantive discussions and preparations begin long before the first patients are recruited for the study. This period is called the study design phase.

Here, the major aspects of the study are discussed across multiple domains: medical, ethical, legal, financial, logical, technical, and - of course - statistical. After all, it is the statistical analysis of the study results that will constitute the last stage, yielding the basis for preparing the final clinical report from the study, and ultimately leading to the approval or rejection of the studied therapy by the regulatory authorities.

The key decisions are made about the study design (shape), type and schema. The number, formulation, and hierarchy of research questions are decided and translated into the language of statistical hypotheses. The clinical parameters (endpoints) are selected, necessary to assess the patient’s condition and to estimate the treatment effect and safety. The minimum number of patients necessary to obtain adequate statistical power is calculated and the method of randomization is decided, if applicable. Early considerations regarding the choice of statistical methods and some of their key parameters are discussed as well, along with applicable bias reduction approaches. Such details enable the estimation of the time frame and budget of the study, plan logistics, and prepare a number of mandatory supporting plans, such as a data management plan (DMP), statistical analysis plan (SAP), quality control procedures, and safety policy.

To consult the statistician after an experiment is finished is often merely to ask him to conduct a post-mortem examination

Mistakes made at this stage will affect every subsequent stage and consequently may contribute to the success or failure of the study. Seemingly innocent oversights, some minor errors, stumbles, and lack of ability to predict consequences (or even ignoring negative scenarios) can result in serious repercussions, including loss of reputation and trust, lawsuits, and other fateful consequences leading to the failure of a study worth a lot of money and carrying a great promise. So many things can go wrong when it comes to interventional medical studies, affecting the well-being and safety of patients. Problems can accumulate over time, and the longer the study lasts, the easier it is for the plot to turn. This is especially true for studies in challenging therapeutic areas such as cardiology or oncology, which are associated with significant patient suffering, high mortality, complex treatment processes, and unforeseen physiological reactions.

But, when faced with the vision of the success of a new, revolutionary therapy, it is tempting to be overly optimistic and push aside the dark visions of worst-case scenarios and miss important details that can later cause headaches at the end of the study, when it is too late to fight the fire. After all, everyone wants to prepare for the best, not the worst. Sir Ronald Fisher once said: “To consult the statistician after an experiment is finished is often merely to ask him to conduct a post-mortem examination.” - this is probably the best justification for why it is absolutely necessary to involve experts in statistics from the very first moments of the study. With good planning the risk of compromising the whole study can be significantly (but not completely) reduced.

With this series, we are going to describe several threats to clinical trials. Due to the size of the topic, we split them into three areas discussed in episodes: [study design negligence], [strategic blindness], and [false workbench confidence]. We invite you to stay tuned, because this knowledge will certainly be useful when planning further studies.

Now, let us briefly outline the topics covered in the episodes.

Episode I: Study Design Negligence

Objective-hypothesis mismatch

Even a perfectly conducted clinical trial can end up answering the wrong question if the research objective is misaligned with the statistical hypothesis. It is possible to obtain a technically and factually correct answer to a question that is different from the one asked. As absurd as this may sound, it does not happen infrequently that different perspectives, created by different combinations of endpoints definitions, different moments of observation (or measurement), or measures used to summarize endpoints, and other factors can make the research question inadvertently assigned to a hypothesis that does not accurately represent it.

Disregard for study power killers

Without sufficient statistical power, a study is doomed from the start and even the most promising treatments can appear ineffective, rendering a study meaningless. Common pitfalls include negligence of dropouts, multiplying objectives, mis-specifying statistical parameters and incorrectly or inefficiently addressing the problem of multiple comparisons. Ignoring dropouts is especially dangerous, because patients may withdraw easily, for a variety of reasons, such as serious side effects, lack of efficacy, logistical barriers, important protocol violations, futility, and so on. The worst part? These issues often go unnoticed until it’s too late. One should learn how to safeguard your study from these silent power killers.

Insatiable greed for objectives

Researchers often aim to answer multiple questions in a single study, which is scientifically appealing but risky. While evaluating both safety and efficacy is essential, overloading a trial with too many objectives can blur its primary focus, create internal inconsistencies, and complicate statistical significance control. A jungle of research goals can lead to contradictory findings, where individual analyses support treatment, but the combined result becomes unclear. Regardless of the number of objectives, regulatory agencies expect a well-structured distinction between primary, secondary, and exploratory objectives. Without careful planning, the study may end up lost in its own complexity. “Less is more”, therefore a balance must be struck to answer all relevant questions while keeping the objective structure as simple as possible.

Missing observations, computational difficulties, imbalances, actual lack of effect – all can turn a seemingly promising trial into a statistical disaster.

Excessive endpoint complexity

To simplify a study and limit the number of objectives, while still capturing important data, researchers often bundle multiple aspects into composite endpoints. While this approach seems efficient, it comes with a major risk: if just one component fails, the entire endpoint may collapse. Missing observations, computational difficulties, imbalances, actual lack of effect – all can turn a seemingly promising trial into a statistical disaster. A nuclear explosion starts with a single neutron… We have seen studies where initially expected treatment success rates were high, yet final results came down to just a handful of successes due to failures in individual sub-endpoints. Knowing why overly complex endpoints can backfire, you can avoid a cold shower. Once again, “Less"is more” - this is not trivia, this is the golden rule.

Missing plan for missing data

This is probably one of the most difficult topics in the life of a biostatistician and one of the deadliest threats to a study if it occurs in a primary endpoint (especially a composite one). Even if data is missing completely at random, it weakens statistical power (sometimes far below 80%) and may render the study futile. But the real danger arises when data are missing in a not-fully random way and depending on its values. It cannot be tested, because we cannot see the missing part. It cannot be easily addressed even with data imputation techniques – for the same reason. And yet these missing observations can horribly distort the results, e.g. by over-extending the effect or by suppressing it. The best a statistician can do is sensitivity analyses under different imputation scenarios. This, however, is expensive and time-consuming. While imputation techniques exist, they come with their own risks and challenges. Of course, it would be best to have no missing data at all, and the study should be designed in a way that limits the risk of data gaps. But this is a rare luxury. In most cases, missing data will exist and will be painful. Planning remedies ahead is essential - once the data are lost, fixing it may not be possible!

Episode II: Strategic Blindness

Ignoring domain expertise

In clinical research, the cooperation of statisticians and domain specialists is essential. Ignoring it can lead to technically sound but scientifically misguided approaches. Without insights from field experts, managing extreme values may be mishandled, and assumptions about data distributions (like normality) can misrepresent the true behaviour of clinical parameters. Understanding the nuances of the therapeutic field, from setting clinically meaningful thresholds to choosing the right imputation techniques, is essential for avoiding results that look impressive on paper but lack real-world validity!

Just as a statistician should be present at most stages of the study, from its earliest moments, so subject matter experts should be involved and validate statistical proposals, correcting misconceptions, and resolving uncertainties in advance. Moreover, it is also essential to have a mutual understanding and basic statistical/clinical knowledge between both parties involved in the study design.

Falling for fallacies

Old habits die hard! There are so many deeply ingrained yet indefensible and potentially harmful beliefs and practices that need to be addressed and educated! Any of them can easily derail even the most well-intentioned research. Requests for “post-hoc power analysis”, unjustified use of the overly conservative Bonferroni correction for multiplicity (in the presence of modern, much more powerful alternatives such as gatekeeping, fallback, or fixed sequence), unjustified tendency to correct for multiplicity in all possible analyses, focusing on within-arm changes in randomized controlled trials (RCTs), requesting statistical comparisons of variables at baseline in randomized trials, misunderstanding the role of adjustments for response at baseline and other covariates, equating adjustment with randomization, confusing statistical significance with clinical significance, misunderstanding standardized measures of effect size such as Cohen’s D and automatically assuming that a large effect is clinically important, misinterpreting statistical tests based on ranks, unjustified categorizing (dichotomizing) continuous variables, treating Likert items as numerical scores without an appriopriate rationale and assuming their equidistance, imputing missing data with the arithmetic mean or median. This list represents only a small subset of issues we found in statistical analysis plans written by others.

If you protect your car and home, if you diversify your income, why not secure something as complex as a clinical trial against all the things that can go wrong?

Missing plan “B”

Even the best plans can collapse at the least expected moment. The assumptions of planned statistical methods fail, statistical models used to test hypotheses do not converge to a (unique) solution, edge cases make estimates unstable, distributions take on the strangest shapes making it difficult to draw conclusions and interpret results, and outliers only deepen the problems. The lack of a plan “B” (and sometimes even “C”) may keep one stuck in a place with no chance to move forward, wasting time and money in passive waiting for a lifeline in the form of proposals for remedies of analyses given by statistical reviewers from the side of regulatory agencies (which may not happen). Depriving ourselves of emergency routes of analysis will easily lead to frustration, where money has been spent, and data has been collected, but nothing can be done with it. Plan B is your insurance against a major or minor disaster. If you protect your car and home, if you diversify your income, why not secure something as complex as a clinical trial against all the things that can go wrong?

Remedies that do not remedy

So you have designed alternative scenarios for different adversities and now you can be sure that they will help you in a critical situation? What if the “alternative” does not lead to the same goal as the “planned” one? There is nothing worse than relying on a remedy that gives the illusion of helping you with your problems, while at the same time it will lead you to even bigger problems and expose you to difficult questions from statistical reviewers!

For example, nonparametric alternatives based on ranks or quantiles are often proposed when the assumptions of planned parametric methods fail. Such methods deal with different null hypotheses. Your analysis is no longer about comparing arithmetic means, but rather about stochastic superiority. The transition from comparing means to medians is also not without its interpretative challenges. Flexible tests comparing survival curves in the presence of disproportionate hazards, such as the Max-Combo test, will certainly detect some differences, but what does that mean to you? Can you easily translate the result into your original question? Would comparing restricted mean survival times (RMST) instead of hazard ratios yield the same conclusions? Perhaps yes, but this cannot be taken for granted without at least some research and maybe simulations. Beware of false remedies. They do not remedy!

If we recognize the importance of security and procedure audits, why not also audit the statistical analysis plan to prevent costly mistakes?

Methodological self-sabotage

Sometimes failure is built into the process from the very beginning. Planning statistical analyses requires appropriate knowledge and experience in translating research questions into statistical hypotheses, selecting appropriate procedures (tests, models), considering the low data size typical for early clinical trials, defining necessary input parameters, and addressing potential violations of statistical assumptions. Critical topics such as dealing with missing observations, detecting and treating outliers (remembering that these may be either data entry errors or completely valid clinical observations), dealing with group imbalances, choosing adjustments, and many other issues fall into that category. What if some of these plans are made wrong? What if we put a time bomb in our study but rest assured that all important points have been properly addressed?

That is why you should review your plans with the help of external experts. Two heads are better than one. Do not risk treating your mistakes leniently and leaving them until it is far too late to change. Yes, some shortcomings can still be fixed post hoc with some justification, and maybe statistical reviewer will go easy with them. But it may not be so easy for other flaws. If we recognize the importance of security and procedure audits, why not also audit the statistical analysis plan to prevent costly mistakes?

Episode III: False Workbench Confidence

Wishful trust in validation

Trusting software solely based on its price (commercial) or philosophy (open-source) can be deceiving. While commercial tools often have strong financial and scientific support, their proprietary nature prevents external review of the implementation (code). On the other hand, open-source software is created by enthusiasts (both recognized experts and novices) usually for free, often in their spare time. By its nature it allows for transparency, so everyone can read the code and propose improvements or error fixes, but the quality and numerical correctness cannot be guaranteed, it’s highly dependent on one’s knowledge and experience, skills, and… willingness to do it properly.

Regardless of whether it costs thousands of dollars or is free, has been on the market for just a few years or whole decades, it is still created by people, who can, and make mistakes. Sure, we can believe the marketing assurances that their product is of top quality, but when something bad happens, will the manufacturer compensate for damages, contractual penalties for exceeding deadlines, lost contracts, profits and fines? Will it restore our reputation? Well, maybe it totally will, but it is always wise to read the license carefully.

Does free always mean bad? After all, if it did, it would not have been so widely used for decades in the most demanding applications! On the other hand, the ad populum argument is rather weak. Trust should not be determined solely by 5-star ratings on GitHub, thousands of library downloads, and nicely done vignettes. It should be earned through evidence, not marketing or popularity. It means something, but marketing is not a substitute for quality. “Trust but verify” should be our credo here. Code review, browsing the “Issues”, “News” and “Changes” sections on GitHub or CRAN (or a similar one on the manufacturer’s website) should become our daily practice.

A common misconception is that older, well-established, or more expensive software is automatically better.

Overlooked software variability

It very often happens that procedures sharing the same name in several statistical packages give different results. It can happen due to errors, differences in default settings, numerical optimizations methods, different numerical optimizations yielding better performance and smaller numerical errors, different implementations of the random number generator (causing differences in Monte Carlo methods despite the same “seed” value), different forms of some estimator used, different conventions, differently defined stopping conditions in iterative calculations, use of rounding rather than exact values (also with different rounding methods), to name just a few. Some variations may seem minor, while others can lead to drastically different conclusions. Some discrepancies can be easily eliminated by aligning appropriate parameters, while others require deeper research and careful studying of the manuals.

A common misconception is that older, well-established, or more expensive software is automatically better. However, every tool has trade-offs that must be understood to ensure methodological accuracy. Researchers should not assume consistency across tools but rather investigate discrepancies and align settings where possible to maintain reliability.

Toolset availability illusion

The choice of statistical methods should be dictated by research needs, not by what is readily available in a given software package. In other words, this choice should be software-agnostic. On the other side, what is immediately available allows for immediate work. While missing statistical methods can theoretically be programmed ad hoc from scratch, in more complex cases it may require specialized expertise, lots of additional time (exceeding both schedules and budget), and rigorous validation, which may be even impossible due to lack of possibility to compare against some validated reference. Sometimes one can simply use another (e.g. open source) statistical package provisionally, but quite often the required procedure has no working implementation in available ones.

Currently available statistical packages, both commercial and open-source, offer a strong arsenal of statistical methods, but it should always be remembered that the person who will perform the analysis based on the plan that have been prepared may not have access to the proposed tools. To avoid disruptions, researchers should plan ahead, ensuring the chosen methodology aligns with the available toolset without compromising scientific integrity.

Lack of integrity controls

- It worked yesterday but today it doesn’t!
- The calculation results have changed strangely but nothing has been changed in the procedure!
- My results are a bit different from yours! We both did it well, so why the difference?

Unexpected discrepancies in results often stem from unattended, accidental or planned software updates, environmental changes, or modifications in dependencies. Even minor updates to the already validated computing environment can introduce errors, alter default values, modify numerical approximations, or introduce new optimization techniques, leading to inconsistencies.

Without proper version control, researchers may struggle to reproduce past analyses. To maintain consistency, it is crucial to audit modifications, record key computational settings (such as random seeds, iteration counts, values of parameters, even default – as they can change over time), and implement strict integrity maintenance policies, ideally covered by appropriate Standard Operating Procedures. A well-maintained and controlled computing environment ensures reproducibility and prevents unnecessary downtime needed to clarify discrepancies, reduce the associated stress, and avoid additional errors.

Failure in clinical trials doesn’t require malice - just negligence. Avoiding these pitfalls requires rigorous planning, statistical foresight, and flexibility in execution. Otherwise, you might find yourself writing post-mortem reports instead of regulatory submissions, wasting time, money, trust, market advantage and company image.

Dare to risk it?

To be continued…

3Threats - introduction 3Threats - part 1 3Threats - part 2 3Threats - part 3

If you’re thinking of your study design audit or rescue action contact us at or discover our other CRO services .

Share this post: