The title of this post is from one of my favourite statistical thinkers (don’t judge me!), Jacob Cohen1, in an article arguing that the logic of statistical hypothesis testing (the p<0.05 bit) simply doesn’t fit many important research questions, and that making a fetish of statistical hypothesis testing may actually impede sensible research that can make a useful difference to the world.
“Consider the following: A colleague approaches me with a statistical problem. He believes that a generally rare disease does not exist at all in a given population, hence H0: P=0. He draws a more or less random sample of 30 cases from this population and finds that one of the cases has the disease, hence Ps=1/30=.033. He is not sure how to test H0, chi-square with Yates’s (1951) correction or the Fisher exact test, and wonders whether he has enough power. Would you believe it? And would you believe that if he tried to publish this result without a significance test, one or more reviewers might complain? It could happen.”
Why am I telling you this? Well, with the announcement of a rapid expansion of personal health budgets have come repeated and influential calls for randomised controlled trials (RCTs) to test whether personal health budgets ‘work’ or not. My concern is that the RCT is becoming increasingly fetishised as the only valid research methodology, and applying an RCT methodology to the expansion of personal health budgets may not be the most effective use of scarce research funding to address the essential research questions. I’m writing not as a person who is opposed to RCT methodologies generally (I’m involved in doing one at the moment), and not as a person who thinks the expansion of personal health budgets shouldn’t be accompanied by rigorous evaluation. I should make clear that I’ve been involved in the collection of practice-based evidence concerning the early development of personal health budgets3, but as the report makes clear I see this as complementary to rigorous research on the issue.
The NHS Choices website, in their news glossary, provides a useful short definition of an RCT (and a judgement too):
“This is a study where people are randomly allocated to received (or not receive) a particular intervention (this could be two different treatments or one treatment and a placebo). This is the best type of study design to determine whether a treatment is effective.”
The logic of RCTs is simple, and has its roots in testing drugs but has much wider scope. You have a treatment X, and you want to see if it’s better, worse or the same in its impact than another treatment Y. Because we often (secretly or openly) want one of the treatments to ‘work’ better than the other one, we can’t trust ourselves to allocate people fairly across the two treatments (without even realising it, we might allocate all the people who we think might be more receptive to the treatment into our favoured condition). So we randomise people instead (in essence on the flip of a computerised coin) so that we can have no influence on who gets what treatment. We hope that this will result in two groups of people that will, on average, turn out to be evenly matched on anything that matters, although randomisation doesn’t guarantee this.
Sounds simple, and indeed the logic of the RCT is straightforward. So why don’t I think it’s the right way to evaluate a rapid expansion of personal health budgets?
First, RCTs assume that there is a more or less standard ‘intervention’, with an agreed standard ‘outcome’ (what the intervention is trying to have a positive impact upon). For me, there are a number of problems with applying this logic to personal health budgets:
· The way I see them, personal health budgets in themselves are not an ‘intervention’ or ‘treatment’: they’re a different way of commissioning. This means that the ‘treatment’ is in principle not separable from the complex systems within which personal health budgets are embedded. This is important as RCTs typically start with what are called ‘efficacy’ trials (does the intervention work under ideal and controlled circumstances). If these show that the intervention is effective, then RCTs should (although these often don’t happen) to ‘effectiveness’ trials (does the intervention work in the real world), where unsurprisingly the effectiveness of the intervention is much smaller than under ideal conditions. For me, the nature of personal health budgets mean that an ‘efficacy’ trial is either in principle impossible, or at the very least would denature the personal health budgets so severely (in the name of creating standardised, controlled conditions) to make the findings meaningless. So, arguments that ‘proof of concept’ for personal health budgets needs to be established before their application in the real world are misplaced.
· Related to this, personal health budgets will be a moving target. Because they involve complex changes (and challenges) to existing systems, the ways in which people can use them (and how receptive systems are to them) will be very different in 2015 than in 2020, say. Conducting an RCT that will be out of date by the time it’s completed, as the personal health budget ‘treatment’ will be quite different by then, severely reduces its utility.
· Personal health budgets will be applied to hugely diverse groups of people with hugely diverse health needs who will use personal health budgets in hugely diverse ways, with hugely diverse parts of health and other systems. Such diversity is a real issue for RCTs – is a personal health budget for a person with mental health issues the same ‘treatment’ as one for a person with COPD, for example? (and variation usually means needing bigger and bigger numbers of participants)
· Personal health budget holders will be using personal health budgets to achieve highly personalised outcomes for them. Any ‘primary outcome’ measure will have to be a measure of how people are meeting their individualised goals, rather than a standard health or quality of life type outcome measure, especially as for many people the goal of a personal health budget may be to preserve/maintain valued aspects of their lives for as long as possible rather than seeking to improve health/quality of life outcomes (back to diversity again).
· The logic of the RCT requires that the form of the treatment must be standard for all research participants across the course of the project – this means that any learning about how they could be done better must be resolutely ignored by/withheld from participants while they are involved in the RCT.
· The RCT requires a fairly standard comparison/control group that is completely separate to the treatment group. In the case of personal health budgets, what would this be, particularly as ‘treatment as usual’ is likely to change as a result of the impact of personal health budgets on health systems, even if a person isn’t getting a personal health budget.
· For me, the RCT research question ‘Does Treatment X work better than Treatment Y?’ is not the most urgent or important question. When, how, why, in what ways, for whom do personal health budgets work (or not), and what can we learn as we go to improve the scope and coverage of positive impacts of personal health budgets, seem a much more relevant set of questions to me.
Second, having been involved in conducting a relatively small RCT, I’m acutely aware of the time RCTs take, the (potentially necessary?) bureaucracy that surrounds them, and how expensive they are. Even if my concerns above were ill-founded, there are good pragmatic reasons why an RCT would not be the best use of scarce research resources:
· RCTs take a long time! Setting them up takes a while, decisions would need to be taken on what would be a sensible timescale for personal health budgets to ‘work’, and there would possibly be follow-ups too? As I’ve said above, I think this would mean that RCT findings would be assessing a historical moment in time by the time they came out.
· The standardised nature of the RCT method would denature the actual application of personal health budgets over time, both for trial participants and for the service systems with which people engage.
· As personal health budgets expand, I’m unclear how people would react to being randomised, particularly if they’re randomised to the ‘treatment as usual’ condition.
· Methodologically, the risk of ‘contamination’ (charming word) across conditions is huge - people (even health professionals!) talk to each other which could compromise the RCT trial design in all sorts of ways.
· With the diversity of people’s health needs, how big would any RCT have to be? Across how many areas of the country? How expensive would this get? This seems to me like we’re entering into the realms of ‘waste’ in research (http://www.thelancet.com/series/research).
So, what are the alternatives to RCT methods in this context? Personally, I think the research funding that is available could be better spent on a programme of implementation research4, with people using personal health budgets and those supporting them at the heart of decision-making about what the essential research questions are, and which research methods will yield relevant, timely findings that can feed into ongoing learning about how to do personal health budgets well. Too many good ideas have suffered when they have attempted to be scaled up – why not design a research programme that rigorously investigates such an attempt to scale up, while simultaneously supporting it?
1. Cohen, J. (1994). The earth is round (p<.05). American Psychologist, 12, 997-1003. http://ist-socrates.berkeley.edu/~maccoun/PP279_Cohen1.pdf
2. NHS Choices. News glossary. http://www.nhs.uk/news/Pages/Newsglossary.aspx
3. Waters, J. & Hatton, C. (2014). The POET surveys 2014. Personal health budget holders and family carers. London: Think Local, Act Personal. http://www.thinklocalactpersonal.org.uk/_library/Resources/SDS/POET_health_FINAL_24_Oct.pdf
4. Peters, D.H., Tran, N.T. & Adam, T. (2013). Implementation research in health: A practical guide. Geneva: Alliance for Health Policy and Systems Research, World Health Organization. http://who.int/alliance-hpsr/alliancehpsr_irpguide.pdf