I was asked last week by Sense about Science to comment on a paper that was recently published in something called Jacobs Journal of Epidemiology and Preventive Medicine. Never heard of that journal? Indeed, neither did I. It turns out that this is one of the open access journals that features prominently on Beall’s list of predatory, or in other words if-you-pay-we-will-publish-anything, journals (link). That’s not a great start for any scientific paper, and usually implies it was rejected by any number of better, peer-reviewed, scientific journals. But let’s not be too hasty to judge the paper before reading it, after all, through some miraculous mechanism of science dissemination this paper got picked up by the Daily Mail (here) and thus read by quite a lot of people. It’s a paper by Busby and de Messieres entitled ‘Cancer near Trawsfynydd Nuclear Power Station in Wales,UK: A Cross-Sectional Cohort Study” and, according to the Daily Mail, proves that living close to a nuclear power station, or more specifically downwind from one, is associated with massively increased cancer risks. The strongest risk found for female breast cancer with a five times higher risk than expected. So that’s quite serious….if it is true of course.Go and enjoy, have a look; it’s open access (here). My initial and short response can be found on the Sense about Science website and essentially covers two issues: problems with the study design and the very small number of cases (see the comment here). Now that I have a bit more time for writing (the new Fun Police website you are now looking at took up most of my time last weekend instead), let’s have a closer look at the paper together and see what we think…. ….so what do we think? ….What the hell is a cross sectional cohort study???? Let’s start there because it is in the title. Let me summarize two different epidemiological study designs for you:- A cross-sectional study is, as the name implies, a study done at one point in time to get information of the population of interest by, figuratively (obviously…literally is unethical), cutting through the population and see how many cancers you find. It is, therefore, a ‘snapshot’. - A cohort study on the other hand is a study where you take a clearly defined group of people and then follow them over time to see who develops the disease of interest. It’s called a cohort study because, again figuratively, it looks like a roman army cohort that starts as a clearly defined group of soldiers which is then followed through until the end of the battle. What I described here is a prospective cohort study, but you can also conduct a retrospective cohort study in which you, for example, get information about everyone who was living in a certain area at a specific point in time (say 1970) and you see what happened to them until today. So in other words, a cross sectional study looks at one point in time and a cohort study follows a set of people over time. On the other hand, a cross-sectional cohort study as it turns out is a design that has been proposed previously by Hudson et al. (link), but never gained much traction. The reason for it is that, basically, it is not very good. It’s an amalgamation of a cross-sectional cohort study and a badly conducted cohort study. Looking at the paper we can conclude that it is in fact a cross sectional study. There is nothing wrong with this approach, but in terms of determining the causal relation between living downwind from a nuclear power station and cancer risk, it is a pretty weak epidemiological design. The people that currently live in this area could have lived there for their whole life, or could have just moved into the area for example, or they could have smoked for twenty years but stopped five years ago (eg now being non-smokers), or for example, they do not want to answer the door, just to name a few problems. Indeed, a retrospective cohort study would have been a much better idea. Anyway, the investigators went to the houses of everyone living in the study area, eg downwind from the nuclear power station, and asked whether they had cancer (plus presumably other questions, but that is not very well described). That is fine, and you then get an estimate for the current prevalence of cancers in that area. The word prevalence here is very crucial, because the investigators then compared this to the expected incidence rate of cancer (based on the population age distribution). Notice how that is a different word? With respect to this particular study, it’s quite obvious where the trouble lies. A cross-sectional study is used to determine the prevalence (eg all current cases) of a certain disease in a population. It cannot be used to determine the incidence since this is the number of new cases in a certain amount of time and you don’t have any time elapsing because you took a snapshot when you collected the data. What you can do is, and what was done in the study by Busby, is to only include the new cases that have emerged in the last three years. That’s ok, you then just have the prevalence of the new cases. To get the incidence rate however, you will need to divide it be the total population denominator. It gets a bit technical here, but essentially because you haven’t included all the people who lived in that area in the past three years (some will have moved away, some will have moved in, some may have died from other diseases, etcetera) you cannot determine the denominator so you don’t know what number to divide your number of new cases by to get the correct incidence rates. If you had done a proper retrospective cohort study, you could have obtained the rate because you have clearly defined the population for the denominator. I hope all this makes sense; the basic idea is that because the cancer incidence rates in the paper are incorrect, the comparison with the population expected incidence rates is, by definition, pointless. However, for the sake of argument, let’s assume not a single person moved out of the area and not a single person moved into the area. Also, nobody died in the years prior to the cross-sectional snapshot, and let’s also assume that the interviewers interviewed everyone (do you see how unlikely that is…). This would mean that the calculated incidence ratios are correct and we can compare them with the expected incidence rates. In this study, these expected rates have been obtained for England and Wales for 5-year age groups to account for the increased cancer risk with age (in fact, age is the biggest risk factor we know). Would this give us the correct expected rates? Unfortunately not….It is very unlikely that this population living under a nuclear power station in a remote, non-urban area in Wales is very representative of the general population. Cancer rates differ between socio-economic classes, are dependent on the prevalence of smoking in the population, and there various other factors that could imply the rates are likely different. So that’s not great… It would have been quite easy to check this though; you could have applied the exact same methodology to a comparable control population (for example those anywhere in the vicinity, but not downwind, of the nuclear power station would in this case have been a good idea) and see if you do not observe any increased risks. Additionally, you could have compared both populations and you would not have to have dealt with a somewhat dodgy comparison to expected national rates (note that this still would not have solved the problem with the calculation of the incidence rates from this cross-sectional data!). It baffles me that the researchers have not done this to be honest… So in summary, the researchers have calculated the wrong incidence rates and then compared these to the wrong expected rates, and did not double check all this with a control population. That is pretty sloppy. But since all this is pretty technical, would it have mattered? Actually not at all! We could not have bothered after reading the abstract……the study is based on 22 cancer cases only, with the main finding for breast cancer based on six female breast cancers. Five of these are below sixty years of age (this is important since, you may remember, age is by far the biggest risk factor for the development of cancer), and of these only 1 was a non-smoker (3 are smokers and 1 is unknown because she died), which is a known and important risk factor for breast cancer as well. In other words, any other analysis would probably have found smoking to increase cancer risk…which is not something we did not know yet. Since epidemiology is based on statistical methods relying on big numbers any population studies based on so few cases is considered very weak. Statistically significant findings will occur, but just one or two additional or fewer cases will completely change the result. Would you trust a finding for whole population based on just one or two people? So does this mean that living downwind of that nuclear power station is not associated with increased cancer risk? Not at all…like for a lack of proof of increased risk there is also a lack of proof for the opposite. In fact, this study does not tell us anything... Indeed, we could not have bothered with reading the whole paper, and could have just stuck with the abstract. I am therefore, profoundly sorry for wasting your time….…although you have now learned not to trust articles in the Daily Mail…..at least not blindly.