Complex software

One of the areas I have worked in is thinking about modeling complex manufacturing systems to try to analyze the cost functions that could be used to approximate the behavior of the system. This required lots of machinery, and reviewers were at a huge disadvantage, wondering what was going on with all my software. I tried to think of ways to “open” the code and the methods, but it was quite difficult. Getting the paper published was not picnic.

The global warming software debate is very similar. I don’t know whose software is more complicated (maybe not mine?). Here’s an article that talks about the lack of (and need for) software reviewing. Academe needs to confront this issue seriously.

Simpson’s paradox

Averages seem like simple things, but not always. Consider this simple baseball example:

Tony and Joe are competitive friends and so they compare batting averages. At the All-Star break, Tony is batting .300 and Joe is only batting .290. Joe mentions that batting in the second half of the season is more important, and so he and Tony agree to compare their batting averages for the second half of the season (and only the second half). When they finally meet, it turns out that Tony batted .390 in the second half of the season. Joe did better, too, but only batted .375. Tony wins both halves of the season.

Question: who’s batting average was higher for the entire year? Turns out we don’t know, and it could very easily be Joe! (I’ll post an example later)

What the paradox states is that averages for subgroups can demonstrate relationships that are inconsistent with averages for different subgroups or the overall averages. So Tony could win both halves of the season, but have a lower batting average for the entire season.

I do not know the details about Climategate, but it is very interesting to me, with all the statistics. Are average temperatures going up or down or whatever. Here’s an article that about midway through mentions this batting average paradox. I wonder if I have a new example of the paradox involving averages.