Wednesday, April 17, 2013

Big Victorian data

Some academic statisticians are feeling a bit worried about the future of their profession in an age of Big Data, Data Science, Data Analytics & Analysts

Big Data can mean working with billions of pieces of data - the initial challenge involves computer science rather than statistical formulae.

But is the challenge any more formidable than that which faced our statistical forebears?

In 1841 The Registrars General organised the first modern census of the population of the United Kingdom. Forms were delivered to every household to record personal details of over 25 million people. These were transcribed into notebooks by an army of local enumerators who collected & delivered the forms & helped those who could not read or write to fill them in.

I do not know if there were any kind of calculating machines available to help in the production of the rich variety of tables & analyses which were then typeset, printed & reported to Parliament. I suspect that most of the labour involved good old mental arithmetic undertaken by rooms full of Bob Cratchits. Even so, the logistical & organisational challenge must have been formidable.

A census – a complete enumeration of the population of interest – is not a sample survey whose results require the application of tests of significance & questions about how to make inferences to the population which has been measured. The challenge, before the vast modern increase in computer power & storage, was to specify which cross tabulations would be required out of the infinitely many which could be made.

In today’s world even a billion data points may be only a sample of the population of interest, which poses a new & different set of challenges.

Link
Simply Statistics: Data science only poses a threat if we don’t adapt
Significance Special Issue: Big Data
The UK 1841 Census
Related post
The first computer error