Serendipity (and some quasi-folk singers from the 1960s) happens when you least expect it. For some time I'd been intrigued by a title on Amazon,
"Bayesian and Frequentist Regression Methods" by one Jon Wakefield. Never heard of him, but the table of contents promised a new approach: conpare and contrast real statistics and Bayesian foolishness (OK, that's harsh). So I've been wending my way through it in a desultory manner for a week or so. On the whole, IMHO, Wakefield demonstrates that Bayes offers little usefulness. Good on him.
He discusses a term I'd not run into, sandwich estimator (appears to be from 1960s and 1980s, though), as a palliative to heteroscedasticity. Now, for those who've not been through a baby stat or econometrics course as an undergraduate, one of the teehee moments was when the instructor starts discussing homoscedasticity and it's evil twin heteroscedasticity. Simply put, homo- means that variance is steady with increasing value, while hetero- means that variance increases with value. The problem with hetero- is that its presence mangles the maths' assumptions underlying the normal regression equations. The resulting regression results are "unreliable". Make mine with lettuce and tomato.
So, today Norman Matloff posts some slides from a talk and
new book on regression. The slides are linked to in his r-bloggers
post. You should go through the slides, tons of fun. And more sandwich estimator. I will destroy you, heteroscedasticity!!!!
There are a slew of sentences that I'd turn into preamble quotes; they'd last nearly a year. My hero.
Contrary to popular opinion, statistics is not a branch of computer science.
And, my favorite
Myth #3: R 2 is only for linear models.
• R 2 (on either sample or population level) is the squared correlation between Y and Y .
• Thus is defined for any regression procedure, even nonparametric ones like k-Nearest Neighbor.
• Example: Currency data.