The following is an archived copy of a message sent to the CASI Analysis List run by Cambridge Solidarity with Iraq.

Views expressed in this archived message are those of the author, not of Cambridge Solidarity with Iraq (CASI).

[Main archive index/search] [List information] [CASI Homepage]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[casi-analysis] Summary of criticisms of the Lancet study of Iraqi mortality

From: "Per Klevnäs" <d4jn5g602@DELETETHISsneakemail.com>
Subject: [casi-analysis] Summary of criticisms of the Lancet study of Iraqi mortality
Date: 15 Nov 2004 10:56:27 -0000

[ This message has been sent to you via the CASI-analysis mailing list ]

Dear All,

The below summary and evaluation of criticisms of the Lancet study of mortality in Iraq is very
good.

The original text is better rendered as a web page than as an email and can be found on:

http://www.crookedtimber.org/archives/002780.html

Regards,

Per Klevnas

======================================

November 11, 2004

Lancet roundup and literature review

Posted by Daniel

Well, the Lancet study has been out for a while now, and it seems as good a time as any to take
stock of the state of the debate and wrap up a few comments which have hitherto been buried in
comments threads. Lots of heavy lifting here has been done by Tim Lambert and Chris Lightfoot; I
thoroughly recommend both posts, and while I’m recommending things, I also recommend a short
statistics course as a useful way to spend one’s evenings (sorry); it really is satisfying to be
able to take part in these debates as a participant and I would imagine, pretty embarrassing and
frustrating not to be able to. As Tim Lambert commented, this study has been “like flypaper for
innumerates”; people have been lining up to take a pop at it despite being manifestly not in
possession of the baseline level of knowledge needed to understand what they’re talking about.
(Being slightly more cynical, I suggested to Tim that it was more like “litmus paper for hacks”;
it’s up to each individual to decide for themselves whether they think a particular argument is an
innocent mistake or not). Below the fold, I summarise the various lines of criticism and whether
they’re valid or (mostly) not.

Starting with what I will describe as “Hack critiques”, without prejudice that they might in
isolated individual cases be innocent mistakes. These are arguments which are purely and simply
wrong and should not be made because they are, quite simply, slanders on the integrity of the
scientists who wrote the paper. I’ll start with the most widespread one.

The Kaplan “dartboard” confidence interval critique

I think I pretty much slaughtered this one in my original Lancet post, but it still spread;
apparently not everybody reads CT (bastards). To recap; Fred Kaplan of Slate suggested that because
the confidence interval was very wide, the Lancet paper was worthless and we should believe
something else like the IBC total.

This argument is wrong for three reasons.

1) The confidence interval describes a range of values which are “consistent” with the model1. But
it doesn’t mean that all values within the confidence interval are equally likely, so you can just
pick one. In particular, the most likely values are the ones in the centre of a symmetrical
confidence interval. The single most likely value is, in fact, the central estimate of 98,000
excess deaths. Furthermore, as I pointed out in my original CT post, the truly shocking thing is
that, wide as the confidence interval is, it does not include zero. You would expect to get a
sample like this fewer than 2.5 times out of a hundred if the true number of excess deaths was less
than zero (that is, if the war had made things better rather than worse).

2) As the authors themselves pointed out in correspondence with the management of Lenin’s Tomb,

“Research is more than summarizing data, it is also interpretation. If we had just visited the 32
neighborhoods without Falluja and did not look at the data or think about them, we would have
reported 98,000 deaths, and said the measure was so imprecise that there was a 2.5% chance that
there had been less than 8,000 deaths, a 10% chance that there had been less than about 45,000
deaths,….all of those assumptions that go with normal distributions. But we had two other pieces of
information. First, violence accounted for only 2% of deaths before the war and was the main cause
of death after the invasion. That is something new, consistent with the dramatic rise in mortality
and reduces the likelihood that the true number was at the lower end of the confidence range.
Secondly, there is the Falluja data, which imply that there are pockets of Anbar, or other
communities like Falluja, experiencing intense conflict, that have far more deaths than the rest of
the country. We set that aside these data in statistical analysis because the result in this
cluster was such an outlier, but it tells us that the true death toll is far more likely to be on
the high-side of our point estimate than on the low side.”

That is, the sample contains important information which is not summarised in the confidence
interval, but which tells you that the central estimate is not likely to be a massive overestimate.
The idea that the central 98,000 number might be an underestimate seemed to have blown the mind of
a lot of commentators; they all just seemed to act like it Did Not Compute.

3. This gave rise to what might be called the use of “asymmetric rhetoric about a symmetric
confidence interval”, but which I will give the more catchy name of “Kaplan’s Fallacy”. If your
critique of an estimate is that the range is too wide, then that is one critique you can make.
However, if this is all you are saying (“this isn’t an estimate, it’s a dartboard”), then
intellectual honesty demands that you refer to the whole range when using this critique, not just
the half of it that you want to think about. In other words, it is dishonest to title your essay
“100,000 dead – or 8,000?” when all you actually have arguments to support is “100,000 dead – or
8,000 – or 194,000?”. This is actually quite a common way to mislead with statistics; say in
paragraph 1 “it could be more, it could be less” and then talk for the rest of the piece as if
you’ve established “it’s probably less”.

The Kaplan piece was really very bad; as well as the confidence interval fallacy, there are the
germs of several of the other fallacious arguments discussed below. It really looks to me as if
Kaplan had decided he didn’t want to believe the Lancet number and so started looking around for
ways to rubbish it, in the erroneous belief that this would make him look hard-headed and
scientific and would add credibility to his endorsement of the IBC number. I would hazard a guess
that anyone looking for more Real Problems For The Left would do well to lift their head up from
the Bible for a few seconds and ponder what strange misplaced and hypertrophied sense of
intellectual charity it was that made Kaplan, an antiwar Democrat, decide to engage in hackish
critiques of a piece of good science that supported his point of view.

The cluster sampling critique

There are shreds of this in the Kaplan article, but it reached its fullest and most widely-cited
form in a version by Shannon Love on the Chicago Boyz website. The idea here is that the cluster
sampling methodology used by the Lancet team (for reasons of economy, and of reducing the very
significant personal risks for the field team) reduces the power of the statistical tests and makes
the results harder to interpret. It was backed up (wayyyyy down in comments threads) by people who
had gained access to a textbook on survey design; most good textbooks on the subject do indeed
suggest that it is not a good idea to use cluster sampling when one is trying to measure rare
effects (like violent death) in a population which has been exposed to heterogeneous risks of those
rare events (ie; some places were bombed a lot, some a little and some not at all).

There are two big problems with the cluster sampling critique, and I think that they are both so
serious that this argument is now a true litmus test for hacks; anyone repeating it either does not
understand what they are saying (in which case they shouldn’t be making the critique) or does
understand cluster sampling and thus knows that the argument is fallacious. The problems are:

1) Although sampling textbooks warn against the cluster methodology in cases like this, they are
very clear about the fact that the reason why it is risky is that it carries a very significant
danger of underestimating the rare effects, not overestimating them. This can be seen with a simple
intuitive illustration; imagine that you have been given the job of checking out a suspected
minefield by throwing rocks into it.

This is roughly equivalent to cluster sampling a heterogeneous population; the dangerous bits are a
fairly small proportion of the total field, and they’re clumped together (the mines). Furthermore,
the stones that you’re throwing (your “clusters”) only sample a small bit of the field at a time.
The larger each individual stone, the better, obviously, but equally obviously it’s the number of
stones that you have that is really going to drive the precision of your estimate, not their size.
So, let’s say that you chuck 33 stones into the field. There are three things that could happen:

a) By bad luck, all of your stones could land in the spaces between mines. This would cause you to
conclude that the field was safer than it actually was.

b) By good luck, you could get a situation where most of your stones fell in the spaces between
mines, but some of them hit mines. This would give you an estimate that was about right regarding
the danger of the field.

c) By extraordinary chance, every single one of your stones (or a large proportion of them) might
chance to hit mines, causing you to conclude that the field was much more dangerous than it
actually was.

How likely is the third of these possibilities (analogous to an overestimate of the excess deaths)
relative to the other two? Not very likely at all. Cluster sampling tends to underestimate rare
effects, not overestimate them2.

And 2), this problem, and other issues with cluster sampling (basically, it reduces your effective
sample size to something closer to the number of clusters than the number of individuals sampled)
are dealt with at length in the sampling literature. Cluster sampling ain’t ideal, but needs must
and it is frequently used in bog-standard epidemiological surveys outside war zones. The effects of
clustering on standard results of sampling theory are known, and there are standard pieces of
software that can be used to adjust (widen) one’s confidence interval to take account of these
design effects. The Lancet team used one of these procedures, which is why their confidence
intervals are so wide (although, to repeat, not wide enough to include zero). I have not seen
anybody making the clustering critique who as any argument at all from theory or data which might
give a reason to believe that the normal procedures are wrong for use in this case. As Richard
Garfield, one of the authors, said in a press interview, epidemics are often pretty heterogeneously
distributed too.

There is a variant of this critique which is darkly hinted at by both Kaplan and Love, but neither
of them appears to have the nerve to say it in so many words3. This would be the critique that
there is something much nastier about the sample; that it is not a random sample, but is
cherry-picked in some way. In order to believe this, if you have read the paper, you have to be
prepared to accuse the authors of telling a disgusting barefaced lie, and presumably to accept the
legal consequences of doing so. They picked the clusters by the use of random numbers selected from
a GPS grid. In the few cases in which this was logistically difficult (read: insanely dangerous),
they picked locations off a map and walked to the nearest household). There is no realistic way in
which a critique of this sort can get off the ground; in any case, it affected only a small
minority of clusters.

The argument from the UNICEF infant mortality figures

I think that the source for this is Heiko Gerhauser, in various weblog comments threads, but again
it can be traced back to a slightly different argument about death rates in the Kaplan piece. The
idea here is that the Lancet study finds a prewar infant mortality rate of 29 per 1000 live births
and a postwar infant mortality rate of 54 per 1000 live births. Since the prewar infant mortality
rate was estimated by UNICEF to be over 100, this (it is argued) suggests that the study is giving
junk numbers and all of its conclusions should be rejected.

This argument was difficult to track down to its lair, but I think we have managed it. One weakness
is similar to the point I’ve made above; if you believe that the study has structurally
underestimated infant mortality, then isn’t it also likely to have underestimated adult mortality?
The authors discuss a few reasons why the movement in infant mortality might be exaggerated
(mainly, issues of poor recall by the interview subjects), though, and it is good form to look very
closely at any anomalies in data.

Which is what Chris Lightfoot did.

Basically, the UNICEF estimate is quoted as a 2002 number, but it is actually based on detailed,
comprehensive, on-the-ground work carried out between 1995 and 1999 and extrapolated forward. The
method of extrapolation is not one which would take into account the fact that 1999 was the year in
which the oil-for-food program began to have significant effects on child malnutrition in Iraq. No
detailed on-the-ground survey has been carried out since 1999, and there is certainly no systematic
data-gathering apparatus in Iraq which could give any more solid number. The authors of the study
believe that the infant mortality rates in neighbouring countries are a better comparator than
pre-oil for food Iraq, and since one of them is Richard Garfield, who was acknowledged as the
pre-eminent expert on sanctions-related child deaths in the 1990s, there is no reason to gainsay
them.

I’d add to Chris’ work a theory of my own, based on the cluster sampling issue discussed above.
Infant mortality is rare, and it is quite possibly heterogeneously clustered in Iraq (not least,
post-war, a part of the infant mortality was attributed to babies being born at home because it was
too dangerous to go to hospital). So it’s not necessarily the case that one needs to have an
explanation of why they might have been undersampled in this case. Since this undersampling would
tend to underestimate infant mortality both before and after the war, it wouldn’t necessarily bias
the estimate of the relative risk ratio and therefore the excess deaths. I’d note that my theory
and Chris’s aren’t mutually exclusive; I suspect that his is the main explanation.

We now move into the area of what might be called “not intrinsically hack” critiques. These are
issues which one could raise with respect to the study which are not based on either definite or
likely falsehoods, but which do not impugn the integrity of the study, and which are not themselves
based on evidence strong enough to make anyone believe that the study’s estimates were wrong unless
they thought so anyway.

There are two of these that I’ve seen around and about.

The first might be called the “Lying Iraqis” theory. This would be the theory that the interview
subjects systematically lied to the survey team. In fact, the team did attempt to check against
death certificates in a subsample of the interviews and found that in 81% of cases, subjects could
produce them. This would lead me to believe that there is no real reason to suppose that the
subjects were lying. Furthermore, I would suspect that if the Iraqis hate us enough to invent
deaths of household members to make us look bad in the Lancet, that’s probably a fairly serious
problem too. However, the possibility of lying subjects can’t be ruled out in any survey, so it
can’t be ruled out in this one, so this critique is not intrinsically hackish. Any attempt to
bolster it either with an attack on the integrity of the researchers, or with a suggestion that the
researchers mainly interviewed “the resistance” (they didn’t), however, is hack city.

The second, which I haven’t really seen anyone adopt yet, although some people looked like they
might, could be called the “Outlier theory”. This is basically the theory that this survey is one
gigantic outlier, and that a 2.5% probability event has happened. This would be a fair enough thing
to believe, as long as one admitted that one was believing in something quite unlikely, and as long
as it wasn’t combined with an attack on the integrity of the Lancet team.

Finally, we come onto two critiques of the study which I would say are valid. The first is the one
that I made myself in the original CT post; that the extrapolated number of 98,000 is a poor way to
summarise the results of the analysis. I think that the simple fact that we can say with 97.5%
confidence that the war has made things worse rather than better is just as powerful and doesn’t
commit one to the really quite strong assumptions one would need to make for the extrapolation to
be valid.

The second one is one that is attributable to the editors of the Lancet rather than the authors of
the study. The Lancet’s editorial comment on the study contained the phrase “100,000 civilian
deaths”. The study itself counts excess deaths and does not attempt to classify them as combatants
or civilians. The Lancet editors should not have done this, and their denial that they did it to
sensationalise the claim ahead of the US elections is unconvincing. This does not, however, affect
the science; to claim that it does is the purest imaginable example of argumentum ad hominem

Finally, beyond the ultra-violet spectrum of critiques are those which I would classify as “beyond
hackish”. These are things which anyone who gave them a moment’s thought would realise are
irrelevant to the issue.

In this category, but surprisingly and disappointingly common in online critiques, is the attempt
to use the IBC numbers as a stick to beat the Lancet study. The two studies are simply not
comparable. One final time; the Iraq Body Count is a passive reporting system[4], which aims to
count civilian deaths as a result of violence. Of course it is going to be lower than the Lancet
number. Let that please be an end of this.

And there are a number of odds and ends around the web of the sort “each death in this study is
being taken to stand for XXYY deaths and that is ridiculous”. In other words, arguments which, if
true, would imply that there could be no valid form of epidemiology, econometrics, opinion polling,
or indeed pulling up a few spuds to see if your allotment has blight. This truly is flypaper for
innumerates.

I would also include in this category attempts like that of the Obsidian Order weblog to chaw down
the 98,000 number by making more or less arbitrary assumptions about what proportion of the excess
deaths one might be able to call “combatants” and thus people who deserved to die. This is exactly
what people accuse the Lancet of doing; it’s skewing a number by means of your own subjective
assessment. Not only is there no objective basis for the actual subjective adjustments that people
make, but the entire distinction between combatants and civilians is one which does not exist in
nature. As a reason for not caring that 98,000 people might have died, because you think most of
them were Islamofascists, it just about passes muster. As a criticism of the 98,000 figure, it’s
wretched.

Finally, there is the strange world of Michael Fumento, a man who is such a grandiose and
unselfconscious hack that he brings a kind of grandeur to the role. I can no more summarise what a
class A fool he’s made of himself in these short paragraphs than I could summarise King Lear. Read
the posts on Tim’s site and marvel. And if your name is Jamie Doward of the Guardian, have a word
with yourself; not only are you citing blogs rather than reading the paper, you’re treating Flack
Central Station as a reliable source!

The bottom line is that the Lancet study was a good piece of science, and anyone who says otherwise
is lying. Its results (and in particular, its central 98,000 estimate) are not the last word on the
subject, but then nothing is in statistics. There is a very real issue here, and any pro-war person
who thinks that we went to war to save the Iraqis ought to be thinking very hard about whether we
made things worse rather than better (see this from Marc Mulholland, and a very honourable mention
for the Economist). It is notable how very few people who have rubbished the Lancet study have
shown the slightest interest in getting any more accurate estimates; often you learn a lot about
people from observing the way that they protect themselves from news they suspect will disconcert
them.

Footnotes:

1This is not the place for a discussion of Bayesian versus frequentist statistics. Stats teachers
will tell you that it is a fallacy and wrong to interpret a confidence interval as meaning that
“there is a 95% chance that the true value lies in this range”. However, I would say with 95%
confidence that a randomly selected stats teacher would not be able to give you a single example of
a case in which someone made a serious practical mistake as a result of this “fallacy”, so I say
think about it this way.

2Pedants would perhaps object that the more common mines are in the field, the less the tendency to
underestimate. Yes, but a) by the time you got to a stage where an overestimate became seriously
likely, you would be talking not about a minefield, but a storage yard for mines with a few patches
of grass in it and b) we happen to know that violent death in Iraq is still the exception rather
than the norm, so this quibble is irrelevant.

3And quite rightly so; if said in so many words, this accusation would clearly be defamatory.

4That is, they don’t go out looking for deaths like the Lancet did; they wait for someone to report
them. Whatever you think about whether there is saturation media coverage of Iraq (personally, I
think there is saturation coverage of the green zone of Baghdad and precious little else), this is
obviously going to be a lower bound rather than a central estimate, and in the absence of any hard
evidence about casualties there is no reason at all to suppose that we have any basis other than
convenient subjective air-pulling to adjust the IBC count for how much of an undersample we might
want to believe they are making.

_______________________________________
Sent via the CASI-analysis mailing list
To unsubscribe, visit http://lists.casi.org.uk/mailman/listinfo/casi-analysis
All postings are archived on CASI's website at http://www.casi.org.uk

Prev by Date: [casi-analysis] 'insurgents': who are they and what do they want
Next by Date: [casi-analysis] Justice for Iraq's Detainees speaking tour plus Fallujah briefing
Prev by thread: [casi-analysis] Justice for Iraq's Detainees speaking tour plus Fallujah briefing
Next by thread: [casi-analysis] 'insurgents': who are they and what do they want
Index(es):
- Date
- Thread

[Campaign Against Sanctions on Iraq Homepage]