The title of this post is from one of my favourite statistical
thinkers (don’t judge me!), Jacob Cohen1, in an article arguing that
the logic of statistical hypothesis testing (the p<0.05 bit) simply doesn’t
fit many important research questions, and that making a fetish of statistical hypothesis
testing may actually impede sensible research that can make a useful difference
to the world.
“Consider the
following: A colleague approaches me with a statistical problem. He believes
that a generally rare disease does not exist at all in a given population,
hence H0: P=0. He draws a more or less random sample of 30 cases
from this population and finds that one of the cases has the disease, hence Ps=1/30=.033.
He is not sure how to test H0, chi-square with Yates’s (1951)
correction or the Fisher exact test, and wonders whether he has enough power.
Would you believe it? And would you believe that if he tried to publish this
result without a significance test, one or more reviewers might complain? It
could happen.”
Why am I telling you this? Well, with the announcement of a
rapid expansion of personal health budgets have come repeated and influential
calls for randomised controlled trials (RCTs) to test whether personal health
budgets ‘work’ or not. My concern is that the RCT is becoming increasingly fetishised
as the only valid research methodology, and applying an RCT methodology to the
expansion of personal health budgets may not be the most effective use of
scarce research funding to address the essential research questions. I’m
writing not as a person who is opposed to RCT methodologies generally (I’m
involved in doing one at the moment), and not as a person who thinks the
expansion of personal health budgets shouldn’t be accompanied by rigorous
evaluation. I should make clear that I’ve been involved in the collection of
practice-based evidence concerning the early development of personal health
budgets3, but as the report makes clear I see this as complementary
to rigorous research on the issue.
The NHS Choices website, in their news glossary, provides a
useful short definition of an RCT (and a judgement too):
“This is a study where people are randomly allocated to
received (or not receive) a particular intervention (this could be two
different treatments or one treatment and a placebo). This is the best type of
study design to determine whether a treatment is effective.”
The logic of RCTs is simple, and has its roots in testing
drugs but has much wider scope. You have a treatment X, and you want to see if
it’s better, worse or the same in its impact than another treatment Y. Because
we often (secretly or openly) want one of the treatments to ‘work’ better than
the other one, we can’t trust ourselves to allocate people fairly across the
two treatments (without even realising it, we might allocate all the people who
we think might be more receptive to the treatment into our favoured condition).
So we randomise people instead (in essence on the flip of a computerised coin)
so that we can have no influence on who gets what treatment. We hope that this
will result in two groups of people that will, on average, turn out to be
evenly matched on anything that matters, although randomisation doesn’t
guarantee this.
Sounds simple, and indeed the logic of the RCT is
straightforward. So why don’t I think it’s the right way to evaluate a rapid
expansion of personal health budgets?
First, RCTs assume that there is a more or less standard ‘intervention’,
with an agreed standard ‘outcome’ (what the intervention is trying to have a
positive impact upon). For me, there are a number of problems with applying
this logic to personal health budgets:
·
The way I see them, personal health budgets in
themselves are not an ‘intervention’ or ‘treatment’: they’re a different way of
commissioning. This means that the ‘treatment’ is in principle not separable
from the complex systems within which personal health budgets are embedded.
This is important as RCTs typically start with what are called ‘efficacy’
trials (does the intervention work under ideal and controlled circumstances).
If these show that the intervention is effective, then RCTs should (although
these often don’t happen) to ‘effectiveness’ trials (does the intervention work
in the real world), where unsurprisingly the effectiveness of the intervention is
much smaller than under ideal conditions. For me, the nature of personal health
budgets mean that an ‘efficacy’ trial is either in principle impossible, or at
the very least would denature the personal health budgets so severely (in the
name of creating standardised, controlled conditions) to make the findings
meaningless. So, arguments that ‘proof of concept’ for personal health budgets
needs to be established before their application in the real world are
misplaced.
·
Related to this, personal health budgets will be
a moving target. Because they involve complex changes (and challenges) to
existing systems, the ways in which people can use them (and how receptive
systems are to them) will be very different in 2015 than in 2020, say.
Conducting an RCT that will be out of date by the time it’s completed, as the
personal health budget ‘treatment’ will be quite different by then, severely
reduces its utility.
·
Personal health budgets will be applied to hugely
diverse groups of people with hugely diverse health needs who will use personal
health budgets in hugely diverse ways, with hugely diverse parts of health and
other systems. Such diversity is a real issue for RCTs – is a personal health
budget for a person with mental health issues the same ‘treatment’ as one for a
person with COPD, for example? (and variation usually means needing bigger and
bigger numbers of participants)
·
Personal health budget holders will be using
personal health budgets to achieve highly personalised outcomes for them. Any ‘primary
outcome’ measure will have to be a measure of how people are meeting their
individualised goals, rather than a standard health or quality of life type
outcome measure, especially as for many people the goal of a personal health
budget may be to preserve/maintain valued aspects of their lives for as long as
possible rather than seeking to improve health/quality of life outcomes (back
to diversity again).
·
The logic of the RCT requires that the form of
the treatment must be standard for all research participants across the course
of the project – this means that any learning about how they could be done
better must be resolutely ignored by/withheld from participants while they are involved
in the RCT.
·
The RCT requires a fairly standard
comparison/control group that is completely separate to the treatment group. In
the case of personal health budgets, what would this be, particularly as ‘treatment
as usual’ is likely to change as a result of the impact of personal health
budgets on health systems, even if a person isn’t getting a personal health
budget.
·
For me, the RCT research question ‘Does
Treatment X work better than Treatment Y?’ is not the most urgent or important
question. When, how, why, in what ways, for whom do personal health budgets
work (or not), and what can we learn as we go to improve the scope and coverage
of positive impacts of personal health budgets, seem a much more relevant set
of questions to me.
Second, having been involved in conducting a relatively
small RCT, I’m acutely aware of the time RCTs take, the (potentially
necessary?) bureaucracy that surrounds them, and how expensive they are. Even
if my concerns above were ill-founded, there are good pragmatic reasons why an
RCT would not be the best use of scarce research resources:
·
RCTs take a long time! Setting them up takes a
while, decisions would need to be taken on what would be a sensible timescale
for personal health budgets to ‘work’, and there would possibly be follow-ups
too? As I’ve said above, I think this would mean that RCT findings would be
assessing a historical moment in time by the time they came out.
·
The standardised nature of the RCT method would
denature the actual application of personal health budgets over time, both for
trial participants and for the service systems with which people engage.
·
As personal health budgets expand, I’m unclear
how people would react to being randomised, particularly if they’re randomised
to the ‘treatment as usual’ condition.
·
Methodologically, the risk of ‘contamination’
(charming word) across conditions is huge - people (even health professionals!)
talk to each other which could compromise the RCT trial design in all sorts of
ways.
·
With the diversity of people’s health needs, how
big would any RCT have to be? Across how many areas of the country? How
expensive would this get? This seems to me like we’re entering into the realms
of ‘waste’ in research (http://www.thelancet.com/series/research).
So, what are the alternatives to RCT methods in this
context? Personally, I think the research funding that is available could be
better spent on a programme of implementation research4, with people
using personal health budgets and those supporting them at the heart of
decision-making about what the essential research questions are, and which
research methods will yield relevant, timely findings that can feed into
ongoing learning about how to do personal health budgets well. Too many good
ideas have suffered when they have attempted to be scaled up – why not design a
research programme that rigorously investigates such an attempt to scale up,
while simultaneously supporting it?
References
1. Cohen, J.
(1994). The earth is round (p<.05). American
Psychologist, 12, 997-1003. http://ist-socrates.berkeley.edu/~maccoun/PP279_Cohen1.pdf
2. NHS
Choices. News glossary. http://www.nhs.uk/news/Pages/Newsglossary.aspx
3. Waters,
J. & Hatton, C. (2014). The POET surveys 2014. Personal health budget holders and family carers. London: Think
Local, Act Personal. http://www.thinklocalactpersonal.org.uk/_library/Resources/SDS/POET_health_FINAL_24_Oct.pdf
4. Peters,
D.H., Tran, N.T. & Adam, T. (2013). Implementation
research in health: A practical guide. Geneva: Alliance for Health Policy
and Systems Research, World Health Organization. http://who.int/alliance-hpsr/alliancehpsr_irpguide.pdf