Tuesday, April 14, 2015

Statistically, it's mayhem*

Below is a letter to the editor I recently wrote in response to the potential flaws in the analysis that forms the basis of the Waterloo Region Record's recent article "Police call records reveal region's trouble hot spots" which can be read here ->http://goo.gl/DDQEg0

One of the first things that I emphasize to the students in my biostatistics class at Wilfrid Laurier University is that statistics are a powerful tool. Used carefully and properly, statistics can provide valuable insight into the factors that shape the world around us - but used or interpreted incorrectly, statistics can potentially lead to conclusions that are unjustified or altogether incorrect​. Your recent "analysis" of police call data seems to fall into the latter category due to problems with your data set, and in the conclusions drawn from them.

First, let's consider your data set. Of the ~903,000 calls in your initial data set almost half were excluded from the analysis for a variety of reasons. Whenever data is dropped, there is the strong possibility that what remains is a non-random (and thus biased) set of data. Furthermore, the remaining data points "do not measure crime" (as belatedly stated in the 30th inch of the story) -but instead capture a wide variety of incidents (including "enforcement of traffic laws" and "attend at collisions" that are not necessarily linked to the residents of that region). It should go without saying that if your data does not contain variables are relevant to the question, then the conclusions drawn from them will be suspect. 

Using this questionable data set, the conclusion "the poorer the zone, the more often people call police and the more time police spend there, responding to distress" is drawn, without any thought of potentially confounding effects. There are potentially dozens of other factors besides average household income that differ between the patrol zones that may be ultimately responsible for the observed patterns. For instance, a cursory search on Google Maps seems to indicate that the regions with the highest frequencies of calls to the police also have a greater density of Tim Hortons locations - but you would not (hopefully) conclude that their presence is responsible for "where trouble lives". 

Generations of statisticians have warned that "correlation does not imply causation", but that message seems to have been ignored in the construction of this article, to the detriment of your readership. 


Tristan A.F. Long

*The title for this post is taken from one of the hyperbolic statements made in the article. I think that, ironically, this statement is an apt description of the statistics used in the analysis.