Thursday, April 17, 2008

Missingness

Another lovely new word. As soon as I saw it I wondered how statisticians had ever managed without it

New to me anyway. I got it from the otherwise mundanely titled A flexible marginal modelling strategy for non-monotone missing data by Ivy Janssen & Geert Molenberghs from Hasselt University [RSS(A) 171 pt2 2008]

Missing data provide a fascinating & frustrating challenge, of which most outside the world of statistics are unaware. That includes the designers of many computer applications such as databases & spreadsheets which do not allow you to code, let alone deal with them

An oversimplified version of the problem. We have the ages of 3 children - 3,4,5 - so the average is 4. But suppose we had only ?,4,5. The best we could do is to use the 4 and 5 & calculate the average as 4½. Beware computer packages which would just assume ?=0 & give you the answer 3

At least we can see what is going on here. But what if we are dealing with a survey of thousands of people or a census of 60 million?

Data can be missing for a whole variety of reasons, just as it can with any kind of form filling

Some people just refuse, or fail to return the form. Things go missing in the post

Some individual items of information might be missing, because of Don’t knows, sensitivity to the particular question or oversight

Some answers may be illegible

Or impossible - such as a height of 6.5 metres for a human being

When we want to analyse say health status by age, we have to decide what to do about those cases where we know only one or the other or neither because of missing values. If you then add income & education to the analysis the complexity multiplies alarmingly


Counting is no longer a simple process, which any 5 year old can do. Serious mathematics are required

Then suppose we are doing a so-called longitudinal study – going back to the same people to see how their circumstances have changed. We get a whole new set of possibilities for missingness

Think of that tv series Seven Up which has been following a group of English children every 7 years since they were 7, & the nightmare of trying to put together a programme of manageable length when you go back to them at age 42. One which will remind the viewers of the back stories & bring them up to date with those who are still willing to co-operate

In the jargon drop outs are said to produce monotone non-response (the number can only go up). Non-monotone missing data means all the other bits & pieces of data collected over time but of which we have no useable record

Postscript: A Google search produces 30k results for missingness so the word has been in use for a while



Related post: I love Radio Vlaanderen