29 November 2011

What's The Difference?

To continue with the Triage project, I've spent a day or two with more graphics texts (about which I'll be musing anon), and getting more familiar with the mapping scenarios.

Separate from the scatterplot matrix data shown in Triage, which would be used to measure the micro components of a campaign, is the question of displaying national trend, twixt Us-uns and Them-uns. For that one turns to map graphics, which is a whole other world. Still in R, mind, but not statistical in nature.

What I have recently found is this site, which replicates a US map with 2004 election results. Now, our Apparatchiks won't be downloading zip files from outside sources, of course. On the other hand, the files make for a perfect dive board for the PoC. Load them into PG, swapping Republican for Bush and Democrat for Kerry and Other for Nader (that's not much of a stretch!). Just for completeness, I'd found much earlier (but can't find that I'd cited), this map exercise, but as of now, the author has been too embarrassed to post the R that does it. While only some form of income data (not specified), it is a follow-on (linked to) to an election stream map set, also not supplied with the R that made it. Nevertheless, one can conclude that with enough time, this is a task suited to R. As mentioned in an earlier post, the animation bits are likely via googleVis.

I'll be using his data, since it provides a basis and I don't have to concoct some, though not the R he used (still using the stock R from Wickham). It's not clear how the numbers were derived.

What is really useful about the 2004 map posting is the data source: a county level count. Get these into a PG table, and we have a surrogate for data which our Apparatchiks would have, and which we can further expand with relatively simple SQL; just to see how a map would change. The notion for this part of the Triage effort is to measure the effect of national campaign spending, post some event/ad/debate/foo, at the POTUS/party level; a RNC/DNC (or 501/527/foo group) view of the country.

Here's the new PG table where we load:

CREATE TABLE public.election (
state varchar(25) NULL,
county varchar(25) NULL,
tot_precincts int4 NULL,
precincts_reporting int4 NULL,
republican int4 NULL,
democrat int4 NULL,
other int4 NULL,
constraint pk_election unique(state, county)
)
WITHOUT OIDS
TABLESPACE pg_default


And we get it loaded thus (concated from the state/county files in the zip):

copy public.election from '/databases/rawdata/2004election/output.txt' using delimiters ';' csv header

Note that column names are underscored, rather than camelCase, since PG forces quoting to use anything in the database if there are Caps in names. Yuck.

And here's the PG + PL/R (I've left it as is; comment/uncomment to generate each of the maps, this is the difference map, shown last. The first set are for the two event maps, while the other is for the diff map):



CREATE OR REPLACE FUNCTION "public"."us_graph" () RETURNS text AS
$BODY$
X11(display=':5');
pdf('US_graph_diff.pdf');
library(maps)
library(plyr)
library(proto)
library(reshape)
library(grid)
library(ggplot2)
library(mapproj)
states <- map_data("state")
#elections <- pg.spi.exec ('select state, sum(republican) as "Republican", sum(democrat) as "Democrat" from election where event_number = 2 group by state order by state');
elections <- pg.spi.exec ('SELECT a.state, sum(a.republican - (SELECT b.republican FROM election b WHERE b.event_number = a.event_number - 1 and a.state = b.state and a.county = b.county)) as Republican FROM election a where a.event_number = 2 group by a.state ORDER BY a.state ')
elections$state <- tolower(elections$state)
elections$republican <- elections$republican/10000
choro <- merge(states, elections, sort = FALSE, by.x = "region", by.y = "state")
choro <- choro[order(choro$order), ]
#p <- qplot(long, lat, data = choro, group = group, fill = Republican / Democrat, geom="polygon", asp=.6)
p <- qplot(long, lat, data = choro, group = group, fill = republican, geom="polygon", asp=.6, main = "Poll Shift", xlab = "", ylab = "")
p + labs(y = "", x = "")
p + opts(panel.grid.major=theme_blank(), panel.grid.minor=theme_blank(), panel.background=theme_blank(), axis.ticks=theme_blank())
p + scale_x_continuous("")
p + scale_y_continuous("") + coord_map()
p + opts(axis.text.x = theme_blank(),axis.text.y = theme_blank(), axis.title.x = theme_blank(), axis.title.y = theme_blank(), axis.tick.length = unit(0, "cm"), axis.ticks.margin = unit(0, "cm"))
p + scale_fill_gradient(limits = c(0, 90))
print(p)
dev.off();
print('done');
$BODY$
LANGUAGE 'plr'


All that spinach for the library calls got eliminated by making an .Rprofile in postgres user's home with the following line:

.libPaths("/home/postgres/R/x86_64-unknown-linux-gnu-library/2.14/")

You could also call out the libraries explicitly; both ways work. The additional spinach is various directions to eliminate the lat/long grid on the maps. None work!


Here's the Event 1 map:


Now, let's update the table to include an event_number (easier than using a date, anyway) and an event_type. That way, we can generate maps in sequence, but also note what sort of event just/last happened. We could also generate maps sequences for only certain sorts of events (they'd be in a check constraint).

So, let's make some new data:

insert into election (select state, county, tot_precincts, precincts_reporting, republican * .8, democrat * 1.2, other, 2, 'foo' from election where event_number = 1);

We wouldn't get such dramatic shifts (modulo Swift Boats) in the real world, but this is PoC territory.


This yields a new Event 2 map:


I'm still grappling with my main wish list item: showing the changes in the colors. As it stands, each map takes the full gamut, leaving the legend to display the shifts; doesn't do that all that well. Viewed another way, why not show the delta of polling strength (vote displays are a bit late, after all)? We can do that with a single map. How to get the data out of the election table? For that a correlated subquery is sufficient. It's that big SQL statement.


Here's what the delta map looks like:


What we see is the shift, in absolute, not relative, numbers. So Texas looks to be more Democrat from Event 1 to Event 2 just because it started with more votes; same with California.

Getting rid of the lat/long grid is still a problem, but then, this is a free PoC. Cheap at half the price.

25 November 2011

Tipoeing to the Truth

The NY Times makes another tiptoe step toward truth today. The published an op-ed piece by Herbert Gans, who is semi-famous. What remains unanswered is the question: what has caused the imbalance? In order to fix the problem, the problem must be correctly surmised. Gans, and others, continue to ignore the elephant sitting on the coffee table. He floats a few band-aid tactics, but ignores the cause.

The problem is two fold (as I have written in the past, so this is just today's musing). The first is that recovery from recession presupposes that the workforce can be re-employed doing what it did before the Crash, if only the economy's demand could be restored. That doesn't work this time, since the stability of the Bush years was based on two non-productive uses of capital: housing and finance. As unpleasant as it may be to hear, but putting capital into housing requires diverting capital from productive uses. The only way for the banksters to make money on housing is for the mortgage holders (that vanishing Middle Class) to see increasing income. Houses, per se, don't generate output the way a Brown and Sharpe milling machine does (or any physical capital). Therein lies the root cause of the Crash. Without growing incomes, housing becomes unsupportable as investment.

Finance is a zero sum game: all of the cash involved is fixed by the scope of the savers. Profit to the finance wizards is extracted from the flow of cash between the savers and the borrowers; the finance wizards don't create wealth, they purloin it. More than a few, formerly just from the left fringe and latterly from more staid venues, have suggested that our economies were better off (and would be again) when finance was reduced to the boring trade it was before the failed math fiddlers got involved. They don't appear to have brought anything positive to the situation. Off with their heads.

The elephant is distribution, as it always has been. Those who are lucky enough (and Krugman's piece today says so again) to score big did so out of luck, but continue to assert that it was personal brilliance. They still insist that none of the mess was their fault. It was mortgage companies, then banks, that created sub-prime and liars' loans in order to sell overpriced houses and pocket excess cash in the process. The liars didn't walk in pointing a gun demanding a sub-prime mortgage; there's been sufficient reporting of cases where folks (often of darker hue) were given only sub-prime mortgages when they qualified for conventional. The mess was created by the Gatekeepers to housing, for their benefit, not for the home buyer or society writ large. Off with their heads, too.

22 November 2011

That Stay Puft Man

One of the funniest bits in all of film comedy is the Stay Puft Man vignette in "Ghostbusters". It would be nice to see that look on the faces of Fat Men in Famine (way back in May, 2010 they were my subject). As second prize, is today's NY Times; a mother lode of confirmation. I just got back from my reading of the dead trees version, and here are the stories which relate to Obese Oligarches (comforting to see that the Mainstream Media finally gets it):

Corporations cash out and layoff
LinkedIn's execs cash out, big time
Europe's continuing angst
the party never started

I'll give you just one quote (from the first cite):
"But spending on capital investments like new plants and infrastructure has stagnated more broadly in corporate America, confounding efforts by the Obama administration to spur economic growth. Capital expenditures by companies on the Standard & Poor's 500-stock index are expected to total $546 billion in 2011, down from $560 billion in 2008, according to data compiled by Thomson Reuters Eikon."

As the Fat Man pieces explain, when you've got lots o cash, deflation is the guaranteed, risk free, way to acquire income. Now, that's what's wrong with this country. What the Right Wingnuts won't admit, for if they did their arguments collapse, is that the "debt" of our middle class (here and in Europe) is the necessary cost/price of capitalists' wealth. Without the cash, no one buys the stuff, and if no one buys the stuff, there's no return on capital. Note, carefully, the implication of the quote; to the extent that corporations make "profit" from financial manipulations rather than goods production, is the measure (inverse, alas) of an economy's stature. Jesus threw out the money changers. Oddly, our self-righteous Right Wingnuts don't seem to mind them. As I've written a number of times, economic growth isn't measured by cash increasing, but by increasing production of consumable goods. The only way for a business to re-pay debt is to generate *new* income (or cut costs, but that's explicitly limited to shutting down the business) levels from increased production. Real economic growth comes from better production, not money manipulation. Wall Street Banksters managed to change the rules to favour money manipulation. We, and not they, are the ones harmed by such "free market" decisions.

That last cite might seem out of place. But it fits in this way: I watch Bill Maher's show on Friday/HBO when he doesn't load up on Right Wingnuts; however, during his last show he ranted that college students were not majoring in science and engineering enough to suit him. He quoted some numbers, I forget the particulars, but he contrasted performance art (or somesuch) to engineering (ditto), with the former number of graduates greater than the latter.

Here's a paper looking at the issue, and a quote:
"Paul J. Kostek, who previously managed career activities as vice president of IEEE-USA, the electrical and electronic engineering institute, says there is no shortage. 'You saw what happened to the price of gasoline when there was a shortage last summer. If there's a shortage of engineers, why aren't people paying $200,000 to hire an engineer.'"

There's been plenty of anecdotal evidence that undergraduates flocked to Finance just because it was less rigorous and more lucrative. It's no secret that a degree from a business school was worth many Bongo Bucks. Not so much anymore. It's also documented that undergraduates are leaving computer science; there's no way to earn back the cost competing against a $2/hour Indian.

09 November 2011

Honesty in Government

[UPDATE] -- copied Sales the first time, same issue.

As I transition into data scientist, which means re-adding my stats mojo to my RDBMS mojo (not replacing the latter with the former, by the way), I've come across more than a few postings and writings in the statosphere about truth in data. The writing is always by data professionals (not lobbyists and the like, near as I can tell), and the point is always that the data is truth. By truth one means the most accurate picture of the real world, unadorned by propaganda.

Today's Federal data dump includes September wholesale inventories. They were down .1%. Here's the quote: "..were $462.0 billion at the end of September, down 0.1 percent (+/-0.2%)* from the revised August level." What's the starry thingee, one might ask? Well, it's the link to a footnote.

Here's the footnote:
"* The 90 percent confidence interval includes zero. The Census Bureau does not have sufficient statistical evidence to conclude that the actual change is different from zero."

Two points to note about the footnote: 1) the CI is 90% level, which is very generous and 2) it spans 0, which means what the note says. I wonder how many of the reports about the report will bother to tell us about that.

Here's the link to the original; click the link for Excel or PDF.

06 November 2011

Walk Like an Egyptian

I walk funny. Such has been observed, sometimes as complement and sometimes as insult, for years. The earliest complement I recall happened when I was working with guys from Touche-Ross (Boston), right out of grad school (1973, or thereabouts). One asked me how long I'd been studying karate. I hadn't said that I was, but I had been for about 6 months. I asked what led him to the conclusion. It turned out that it was my gait.

In high school, the response was somewhat different, along the lines of fruityness. Now, I'll admit to having a preference for hot dogs over tacos when it comes to food, but just the opposite when it comes to activities Sybaritic. Interpret that as you will.

The whole thing traces back to being a Boy Scout, around 1960. The Boy Scouts back then weaved in a good deal of "Indian" lore. One of the Indian practices involved how they moved through the forests. Specifically, how the feet should be held during walking and running. Most folks, especially girls who've spent time in ballet lessons, walk with toes splayed out. Donald (or Daisy) Duck in motion. Not so for the Indian; foot is exactly straight, either walking or running. The explanation in the Handbook was that walking with straight feet saved some number of steps per mile. I trained my feet to point straight ahead. I guess that led to some anomaly in how my gait looked.

That time in Boston wasn't only in karate academy (so he called it), but also running on the indoor track at the Boston YMCA. I ran barefoot, although I guess that wasn't within the written rules. I got spoken to occasionally. I still ran barefoot. Running barefoot required a different form from usual running. The fat running shoe had come into existence, so the common way for runners, especially distance runners, to stride was heel first. I never liked that; too much shock up the leg, shoes or not. Barefoot runners could never do that, anyway. The barefoot runner strides onto the ball of the foot, and releases some weight to the heel as the opposite foot moves forward. It takes more energy to run to the ball of the foot, since the heel stride allows the runner to "fall" onto the foreleg. Since I wasn't interested in marathon running, generating the extra oomph wasn't a problem.

Ball strike running encourages that straight foot form, as well (you want to land first on the knuckle of the big toe, that's what it's there for, then roll through to the arch). Fit very nicely with my zeitgeist.

What in the world compelled me to type all this out? Today's Times magazine has a story about Tarahumara Indians and barefoot running. Seems that what I'd learned from the Boy Scouts 50 years ago, and how I'd run 40 years ago is now Cool. While having nothing to do with Keynesian economics, yet another demonstration of my humble ability to foresee the future. Now, that's cool.

(I read this in today's Times Magazine, dead trees division. Looking for the link for this piece revealed that the author of the article has been writing about this topic for some years; even has a book. First I'd heard of it.)