Regressions in The Atlantic (who would have thought?)

Interesting article in The Atlantic about how unions affect state economies.  Some scatterplots with regression lines (but no regression statistics, alas) and some correlation coefficients.  Geek stuff, for sure.

Some observations, though.  First scatterplot shows union participation rate as the independent variable and then the “income level” as the dependent variable.  It seems that this is per capita income, but it is not clear if it is mean or median.  The choice could be quite important.

If you look at the states on the left (lower income) part of the graph versus the right (higher income) part of the graph, what might you wonder about?  How about the cost of living in each of the states?  As our business school students quickly realize, the same salary offer in NYC versus other locations means a potentially significant difference in discretionary income.

The article does a clever academic bait-and-switch (I learned how to do this in graduate school).  The article starts talking about public sector unions, but then quickly drops the “public sector” part and does all of the statistical work about unions (both private and public).  Why do this?  One, it might be easier to get the data.  Another is that using the data set that is more appropriate does not tell the right story.  (note: I am not saying that is what the author is doing, but that alarm goes off in my head when I see data that really doesn’t fit being used in an argument.)

Here’s a quote from the paper:

To put it baldly, unions are associated with the country’s economic winners, not its losers.   And it’s not that unionized states work more–unionization is negatively correlated with hours worked (-.36). States with higher levels of union membership work less hours per week but make more money–higher levels of union memberships are positively correlated with wage per hour (.48).

Now suppose this is the union for a successful private company.  We are describing a classic win-win here, right?  Everyone is sharing in the success of the company (although I know some will wonder whether management is getting more than their share – that’s for another discussion).  And less hours/more pay only works if customers of the company will continue to buy the product at the price the company tries to charge.

Now suppose we say the same thing for public workers.  Less hours, paid more.  Who is paying?  And are they getting value for that?  Could be.  But maybe not.  So do these two correlation coefficients allow us to make “bald” assertions like this?  Maybe not…

Can you see the problems with the other two scatterplots?  What does the author claim they “show?”  Are there reasonable other explanations for the relationships?  Hidden variables (like state cost of living I mentioned above)?