The Palpable Prostate: Can most studies be wrong?

[Updated May 31, 2013]

It is a common observation that the results of many studies are ultimately contradicted by later studies leading many to question how well one can trust various studies in the medical literature. Amgen scientists who tried to replicate 53 published studies found that they were only able to replicate 6 (i.e. 11%). See [nature]

If the treatment group is in some way healthier than the control group then a study comparing the two groups may simply be measuring the difference between those populations rather than the treatment effect. Randomization can help to avoid this but it is not always feasible to perform such randomization.

Meta Analyses

The worry is that financial incentives might sway researchers to show something where no effect exists. Another concern is that just wishing to be able to say you found something may be sufficient incentive. Suppose that researchers systematically tend to suppress results which show no or negative effect in meta-analyses. Then we would expect that those meta-analyses would involve fewer component studies so that the meta-analyses with fewer studies would be more likely to suffer from such bias than meta-analyses with more component studies. If that were the case then we would expect that meta-analyses with more component studies would have less overall bias and therefore show smaller effects than meta-analyses with a smaller number of component studies. Interestingly enough Furukawa et al in the Feb 7, 2007 JAMA, p 468 found precisely that! Studying 156 meta analyses the investigators found that the odds ratio of meta-analyses with less than 20 component studies were 2.67x but were only 1.87x for meta-analyses with over 80 component studies.

Publication bias can be detected in meta analyses using funnel plots. See [link].

A particularly interesting and accessible [article] on publication bias appeared in the New Yorker in 2010.

For those who wish to read about meta analyses in depth see the book Michael Borenstein et al, Introduction to Meta-Analysis, 2009. It has 400+ pages and has all the formulas but is written in a very easy to read style so that if you skip them you will still come away with a significant understanding.

Why its Plausible that Most Research Findings are False

Another interesting approach to this is Ioannidis' paper entitled Why Most Published Research Findings Are False [Full Text] [PMID: 16060722].

He develops a mathematical model that gives the probability that a claimed research finding is true. His model is analogous to the models used in medical testing. The medical testing models are discussed on this blog in posts [here] and [here] and the reader may wish to review those posts first. Here we see how the concepts of medical testing and research findings are parallel. (Readers who want the non-mathematical version of this should skip directly to Part 2.)

Term	Medical Testing	Research Findings
unit of study	patient	research study
PPV	prob of having disease given positive test	prob that finding is true given that an effect is found
True positive	sensitivity(`Sn`)	power (`1-beta`)
False negative	specificity (`Sp`)	significance level (`a = 1-alpha`)
	disease prevalence (`p`)	fraction of hypotheses that are true (`p`)
	lab reports a -ve test as +ve	bias (`u`)

Example

We will go through an example of the model using specific numbers which correspond to the line in this figure from the paper marked "Adequately powered exploratory epidemiological study".

The power is the probability of finding an effect if the hypothesis is true. It depends on the number of patients (more patients make it more likely you can detect an effect) and other aspects of the study design but for purposes of example we will use power = 0.80.
The signficance level is the probability of finding no effect if the hypothesis is false. Medical studies are typically done at a 95% significance level. In terms of the paper we use a = 1-alpha = 95%.
The fraction of hypotheses that are true varies by field. It may be very small in genomics but higher in clinic studies where there has been some pre-investigation. According to Table 4 of this FTC report drugs in Phase I, II and III have probabilities of succeeding to the point they enter the US market of 12%, 17% and 38%, respectively, although the probabilities are somewhat larger for big pharma and for biological based ones. We will use p = 0.09 in our example which corresonds to odds of 1:10.
In medical testing we do not usually deal with bias but here it is an important effect. It refers to the fraction of studies which no effect would normally be shown but for which an effect does appear due to misreporting, manipulation or any other bias. We will use a bias of u = 0.30 in our example but as we show later even a much smaller bias can result in very large fractions of false results.

Assume c = 1000 studies and form a 2x2 table under the assumption of no bias, i.e. u = 0. Since 9% of the 1000 studies are true the first column sums to 90 and the second column sums to the remaining 910. Since the power is 80% and the significance level is 95% we can fill in the two diagonal entries as 80% and 95% times their column totals. Thus 80% x 90 = 72 and 95% x 910 = 865. The remaining numbers are not yet filled in.

No bias


            Hypothesis
            TRUE           FALSE   Tot
Finding
    Yes     72             .       .
    No       .             865     .
    Tot     90             910     1000

Now we can fill in the remaining cells in the body, i.e. the two off diagonal entries, by ensuring that the columns total correctly. With all 4 elements of the body filled in we can get the row totals by summing.

No bias


            Hypothesis
            TRUE           FALSE   Tot
Finding
    Yes     72              45      117 
    No      18             865      883
    Tot     90             910     1000

Now assume we have 30% bias. In other words 30% of the No findings in each column are moved from the No row to the Yes row in the same column.

That is each cell in the second row has 30% removed from it and an equal amount is added to the cell just above it.

Thus 30% x 18 = 5 are moved from the 18 cell to the 72 cell and 30% x 865 = 260 are moved from the 865 cell to the 45 cell and the row totals are recomputed. Since we are moving entries within columns the column totals are unaffected:


            Hypothesis
            TRUE           FALSE   Tot
Finding
    Yes     77             305      382
    No      13             605      618
    Tot     90             910     1000

Now of the 382 hypotheses which showed an effect only PPV = 77 / 382 = 20% are true.

As mentioned, this corresponds to the row in Ioannidis Table 4 labelled "Adequately powered exploratory epidemiological study".

Breakeven Bias

By repeating the above with symbols we arrive at the following equation where, as before,

ppv is the positive predictive value which is the probability that a study which shows an effect actually corresponds to a true hypothesis,
p is the pre-study probability, i.e. the fraction of hypotheses that are true,
power is the probability that a true hypothesis shows an effect and is referred to 1-beta in Ioannidis
a is the probability that a false hypothesis does not show an effect, and is referred to as 1-alpha in Ioannidis
u is the fraction of those hypotheses that would have shown no effect that do show an effect because of bias


ppv = (power * p + u * (1-power)*p) / (power * p + u * (1-power)*p + (u * a + (1-a)) * (1-p))

where in our example ppv = .2, power = .80, p = .09, u = .3 and a = .95 .

In order to avoid having to do any mathematical manipulations ourself we will use the free mathomatic software which we can either download from the indicated link or go to that page to find an online version which we can use right from our browser without downloading or installing anything. We can enter this equation into mathomatic (the lines that begin with 1-> are the ones we entered and the others are output):


1-> ppv = (power * p + u * (1-power)*p) / (power * p + u * (1-power)*p + (u * a + (1-a)) * (1-p))

                        ((power*p) + (u*(1 - power)*p))
#1: ppv = -----------------------------------------------------------
          ((power*p) + (u*(1 - power)*p) + (((u*a) + 1 - a)*(1 - p)))

1-> calc
Enter power: .8
Enter p: .09
Enter u: .3
Enter a: .95
 ppv = 0.20248528449967

The breakeven bias is the bias for which we have the same number of true and false studies. It corresponds to a ppv = .5 so solve for u and use the same numbers as before. Looking below we see that this shows that for any bias greater than 3% we will have more false studies than true studies (assuming the power is 80%, the significant level is 5% and 9% of all hypotheses are true):


1-> solve u

           ((ppv*((p*(power + a - 1)) + 1 - a)) - (power*p))
#1: u = -------------------------------------------------------
        ((p*(1 + (power*(ppv - 1)))) + (ppv*((a*(p - 1)) - p)))

1-> calc
Enter ppv: .5
Enter power: .8
Enter p: .09
Enter a: .95
 u = 0.031305375073833

As we can see, even a relatively small bias can be sufficient to move enough studies from the No Effect to Effect rows and that can swamp the true studies with false ones.

The Palpable Prostate

Sunday, August 5, 2007

Can most studies be wrong?

No comments:

Welcome

Key Posts

Links

Blog Archive

Labels