In the original business of statistics – the collection & presentation of state figures to describe the wealth of the nation or the condition of the people - the emphasis was, & to a large extent remains, on providing answers to the question of ‘How many?’, be that people, £ or things, in the various categories of interest or concern.
Puzzles such as that described by Simpson arise when there is an imbalances in the way the members of two subgroups of the population are distributed across other, different subgroups or classifications. As a purely hypothetical example we might find that the female unemployment is lower than the male rate in both manual & non-manual jobs, but when the whole workforce is considered there is a lower unemployment rate for men. This would happen if men are much more likely to work in the sector which has lower unemployment.
If one were concerned, as a matter of policy, to rectify these imbalances one would have to consider the process(es) which brought them about. These in turn might not be obvious, but related to cultural attitudes to working mothers combined with different class & employment structures across different areas of the country.
It would be rare for those carrying out his kind of analysis to feel that the answers represented some underlying & universal law which was necessarily applicable in all other states. And, because the numbers are very large, the question of statistical significance as measured by P values is also unlikely to arise.
But what if the very process by which the data are collected is itself a source of imbalance – for example it comes from a non-probability sample of an ill-defined population
Link
Software Testing Paradoxes
Related post
The statistical significance of height
Software Testing Paradoxes
Related post
The statistical significance of height