Big Data, and The Laws of Statistical Analyses

The work I described in yesterday’s post has got me thinking lots about statistics. I used to hate statistics in school and at university. During his talk at NAM, Mark Thompson, the astronomer at the University of Hertfordshire whose recent work was the basis for mine, proclaimed that “Hey! I discovered I like statistics”. I laughed when he said that, because this recent paper had exactly the same effect on me. I wear my Histogram Girl badge with much pride!

(I should have mentioned, by the way, that you really shouldn’t go looking for pretty pictures of bubbles in the paper. There’s only one.)

One man who says sensible things about statistics is neuroscientist Bradley Voytek. I really enjoyed the post he wrote today on O’Reilly Radar entitled “Automated science, deep data and the paradox of information”, on the potential of Big Data and its pitfalls. He states the following three laws of statistical analysis based on Arthur C. Clarke’s well known “Three Laws”:

  1. The more advanced the statistical methods used, the fewer critics are available to be properly skeptical.
  2. The more advanced the statistical methods used, the more likely the data analyst will be to use math as a shield.
  3. Any sufficiently advanced statistics can trick people into believing the results reflect truth.

After spending the last few months knee-deep in histograms, correlation functions and statistical tests, these three points feel very relevant. Statistics, and the mathematical methods associated with them, are an immensely powerful tool for turning data into information – indeed when datasets become so large or multi-dimensional that one mind can’t gain an overview of them, and the Milky Way Project bubbles are perhaps just at that limit, it’s the only tool.

But it’s incredibly easy to “over-statisticise”, to venture so far away from the data that you lose sight of what is being measured. You can still produce clever-looking plots and  numbers that would convince all but the most pedantic of readers (which may or may not be your peer reviewer). It’s important to stay as close to the data as possible and find the right method to answer the question at hand – and that is the difficult bit with a statistical analysis.

 

Review: The Illustrated Guide to Astronomical Wonders

51rjVROJCuL._SL160_

Once upon a time in a country far away I was a young girl who loved looking at the stars. I didn’t know any other keen stargazers and the internet was still in its infancy, so I relied on books to help me work out what I was looking at.

These days, sadly, stargazing doesn’t feature very heavily in my life anymore given my light polluted dwellings but my earliest experiences of looking through telescopes did inspire me to get into helping design them myself, as a profession. So I was keen to have a read of one of O’Reilly‘s publications, The Illustrated Guide to Astronomical Wonders, by Robert and Barbara Thompson, a copy of which found its way to my desk (h/t to Alasdair).

[Read more...]