“There are lies, damned lies and statistics.”
Mathematicians and statisticians know that with a little bit of dishonesty, you can make numbers say almost anything you want.
The whole debate over sea lice and salmon farming has been a numbers issue. Aquaculture opponents say their mathematical modelling studies, which amplify weak correlations from their observational data is enough proof to claim that sea lice from salmon farms kills wild salmon.
Aquaculture supporters say the numbers do not support that conclusion, and point out their own studies, which tend to be based strictly on observational data than mathematical modelling, show no meaningful correlations.
And as we all know, correlation does not equal causation, but the lack of correlation doesn’t mean there’s not something going on.
Clear as mud, right?
Perhaps this recent article in Scientific American can help explain. Do we eat more food when we are given bigger plates, or smaller plates? Mathematician John Allen Paulos shows how with just slight manipulation of the numbers, you can argue both.
A recent study by researchers at the University of Utah suggested that the amount of food diners in a restaurant consumed was influenced by fork size. I haven’t seen details of the study, but it does remind me that people can draw diametrically opposite conclusions from the same raw data by altering definitions ever so slightly.
If only such contradictory results were contrived and isolated phenomena, but they’re not. When dealing with weakly correlated quantities, we often can come up with spurious trends and associations by artfully defining the size of the categories we use. This has been done recently in studies of violent crime to show that certain categories of crime were changing in the desired direction, and I intend to illustrate the point here with a similar story.
Using the fork study for inspiration only, let’s see how small variations in definitions can make all the difference. Imagine 10 diners at a buffet and consider the possible influence of plate size on how much they consume. Three diners were provided with plates that were deemed small, say, less than 8 inches in diameter, and they consumed 9, 11 and 10 ounces of food, for an average of 10 ounces. Now further assume that four diners were provided with medium-size plates, say, between 8 and 11 inches in diameter, and they consumed 18, 7, 15 and 4 ounces of food, for an average of 11 ounces.
Finally, we’ll assume that the remaining three diners were provided with plates deemed large, say, larger than 11 inches in diameter, and they consumed 13, 11 and 12 ounces, for an average of 12 ounces.
Spot the trend? As the plate sizes increased from small to medium to large, the average amount consumed increased from 10 to 11 to 12 ounces. Aha, a nice result!
But wait. What if the medium-size plates were very slightly redefined to be between 8.2 and 10.8 inches, and the small and large plates were redefined accordingly? And what if this redefinition resulted in the misclassification of two diners? The diner who ate 18 ounces of food was actually provided with a small plate (say, 8.1 inches in diameter), and the diner who ate only 4 ounces was actually provided with a large plate (say, 10.9 inches in diameter).
Let’s do the numbers once again under this assumption. Four (rather than three) diners were provided with small plates, and they consumed 9, 11, 10 and 18 ounces of food, for an average of 12 ounces. Two (rather than four) diners were provided with medium-size plates, and they consumed 7 and 15 ounces of food, for an average of 11 ounces. Four (rather than two) were provided with large plates, and they consumed 4, 13, 11 and 12 ounces of food, for an average of 10 ounces.
Spot the trend? As the plate sizes increased from small to medium to large, the average amount consumed decreased from 12 to 11 to 10 ounces. Aha, a nice result!
Moreover, small samples are not the problem here. A large number of data points make this sleight of hand even easier because it provides more opportunity to fiddle with the categories. Anyone for sunspot intensity or Super Bowl outcomes?
The take-home point here is that numbers can be manipulated. Be careful.
When making decisions about things like what size plate to take at the buffet or whether or not salmon farms are the root of all evil, a level head, good observational data and consistent, transparent mathematics are best.
And above all, be skeptical. Look for good science, and use your common sense.