Query: Explain, briefly, with the aid of an example, what is meant by a ‘spurious’ or ‘nonsense’ statistical correlation.
Response: High values of the correlation coefficient can sometimes be found between two variables for no apparent reason. This is a particular danger when vast amounts of multi-variate data are mined and analysed.
Sometimes there is a causal link between each variable producing the correlation and a third variable which triggers the large coefficient.
A simple example can be found in crime statistics of a local authority. If you run an analysis between the variables (1) the number of police arrests for street drunkeness (2) the number of places of worship within one mile of the arrest, you'll probably find a significant correlation, but where's the likely causal link?
Each of these variables is likely to be positively correlated with the population of the area in which they occur. High population areas are often a focus for drunkeness as there are a large number of bars and clubs to service the demand and will also have a higher density of places of worship, which would probably account for the 'spurious' correlation found.
I remember reading a paper by a researcher from Cardiff University in which he analysed a large amount of quantitative schools' results and student personal characteristics data and found a significant statistical correlation between the Mathematics results of individual schools and the altitude (i.e. the number of metres above sea level) of the school buildings.
His tongue-in-cheek conclusion was that, if possible, schools in low altitude areas should adopt the strategy of improving their student scores at Mathematics by ensuring that all lessons took place upstairs!