Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Ecological Fallacy (statwing.com)
60 points by glaugh on Dec 20, 2012 | hide | past | favorite | 20 comments


The notion that rich people voted for Romney while rich states voted for Obama may also be misleading. While you can attempt to prove this by comparing the incomes of voters at the 100k+ breakpoint, I'm fairly certain the correlation disappears if you set a breakpoint at 250k+. You can see this trend illustrated in the 2008 election exit polls[1]. Obama won the lower income ranges and McCain won the 100k - 200k ranges. However, Obama also won the 200k+ income level. In other words, if you look at those with incomes greater than 100k, it appears that McCain won the rich; however, if you look at those with incomes greater than 250k, the rich seem to favor Obama.

While I suspect something similar holds true for the 2012 election, such granular breakdowns weren't reported in this cycle's exit poll summaries. The answer is probably hidden away in Edison Research's database, but the raw data hasn't been released yet. For now, you can get a good feel for the income breakdown by looking at Reuters polls[2], and doing the cross-tabs yourself, but there are quite a few undecideds and the sample-size is small.

And I agree that the 250k+ breakdown is also arbitrary, though slightly less so, since it's the lower bound for what many politicians define as "rich". But who knows who those with incomes of 1 million+ voted for? I suspect it was probably Romney. But how about billionaires? The point is that setting such broad and arbitrary breakpoints can be misleading. I'm sure this fallacy has a special name, I just don't know what it is.

[1] http://www.cnn.com/ELECTION/2008/results/polls/#USP00p1

[2] http://elections.reuters.com/#poll


It's important when discussing this fallacy to be precise in your terminology, because phrases like "more likely" have different meanings depending on context.

>U.S. states with proportionally more immigrants have proportionally more households with income above $100k.[1] Ergo, immigrants are more likely than non-immigrants to have household incomes above $100k.

Whether or not that's a fallacy really depends on how you interpret that statement. If my only information about the world is what is stated above, then finding out that someone is an immigrant should increase my estimation of the likelihood that their household income is above $100k. Being an immigrant is evidence of living in a state where it's more common to have a household income over $100k. Living in a state where it's more common to have a household income over $100k is evidence of having a household income over $100k. When I learn more about the world, my model will change, and I'll stop being wrong about this particular thing.

The fallacy comes when you say that the group correlation implies a correlation at the individual level.


As my first introduction to the ecological fallacy, I thought it did a good job concisely stating the fallacy, with good examples to illustrate it (both intuitive and non-intuitive).

The next question that would inevitably come up is: how do you know? I'm guessing there isn't a way short of looking at the data for individuals. It would probably be safe to always assume group data does not imply individual data.

And, of course, this is another way that people can use statistics to lie to you. I would not be surprised at all to find people intentionally using this fallacy to their benefit.


The trick is to always keep in mind what the data explicitly says. There is a correlation between states with low average income and high percentage votes for Romney. The sample in this set is the state. It says nothing about individual people. You cannot make any conclusions whatsoever about how individual income relates to voting based on that correlation. It is a fallacy to even attempt to do so without more data.


Here's another image for some intuition about what is going on: http://blog.statwing.com/wp-content/uploads/2012/12/ecologic...

You wouldn't be able to say anything with confidence about individuals without the individual data but this image helps me think about what individual data would be needed to back up or refute an inference based on group data.

If you can find data within some groups and they show very little to no relationship for X and Y then it's a little more likely that you can use the group inference. But in any case watch out!


Unfortunately, I'm not aware of any good heuristics for identifying when the ecological fallacy is or is not an issue. As you suggested, that implies that one should always consider it to be an issue if the individual level data isn't available.

Does anyone know of anything I'm missing here?


Recommended reading "Proofiness: The Dark Arts of Mathematical Deception" by Seife. Talks about this fallacy and others in depth.

The best approach is multi-variate analysis. Which is to say for each correlation you find, identify another correlation that would be true (and is measurable) if the cause was what you hypothesize is the cause. Its a great way to write a paper too.

You start with "look at this correlation", we hypothesize that cause is "Q" and now we go look at the following correlations to prove or disprove our hypothesis, ... analysis and graphs ..., as you can see our hypothesis is thoroughly {proven | disproved} by these correlations.

If the data isn't available to do additional analysis then you are stuck trying to collect that data somehow. You can end up with inconclusive results in that case.


One example of this that I recently read about was the measurement of development in india, that for a long time happened on a family level and so missed a lot of inequality and lacking of basic capabilities and freedoms for women. Not only did this mean that the information was wrong, it also lead to it taking longer to acknowledge just how important empowerment of women is in fighting poverty.


For the chart showing "% Vote for Romney vs. Median Household Income (2010 data)" isn't that actually a positive correlation? It looks like the income is moving in lock step with the vote for Romney percentages.


Thanks for the question.

The fact that income moves very tightly with vote for Romney means it's a strong correlation. Since one variable goes up while the other goes down, it's a negative correlation.

You can have any combination of strong/weak and negative/positive correlation.

The image on wikipedia is good for wrapping your head around the distinction between the strength and the direction of the correlation (and the distinction between strength and the actual slope of the line of best fit). http://en.wikipedia.org/wiki/Correlation_and_dependence

Apologies if I misunderstood your question.

Cheers!


OP here. Thoughts/questions/comments?


I would recommend toning down the advertising a bit. Leave it in the beginning, or at the end, but not both.


Duly noted. Thanks for the feedback.

edit: Deleted the ad at the end, replaced with a link to the front page.


Ugh, it ended just when I was getting really excited. I wish you'd incorporated more rules (or at least rules of thumb) on when to use and trust group level data.


Why is it called the "ecological" fallacy?


Ecology is the scientific study of the relationships that living organisms have with each other and with their natural environment.

An ecological fallacy (or ecological inference fallacy) is a logical fallacy in the interpretation of statistical data where inferences about the nature of individuals are deduced from inference for the group to which those individuals belong.


The article was relatively short. Perhaps you could expand (or follow it up) with some consideration of how to spot the ecological fallacy and how to avoid it?

Also you don't go into much detail explaining the smoking cancer example. Is it the case that the data turned out to be right 'in spite of' the fallacy or was it a case where the data was immune to the problems of other cases?


Can you describe how the ecological fallacy is related to Simpson's paradox?


Simpson's paradox is an example of the ecological fallacy. I think specifically when you have the signs of the group relationship being the opposite of the individual relationship like in http://blog.statwing.com/wp-content/uploads/2012/12/ecologic....

Wikipedia has more specific math for this at http://en.wikipedia.org/wiki/Ecological_fallacy#Simpson.27s_...


You make a lot of counter arguments without supporting evidence.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: