Wednesday, March 18, 2009

how to win at everything, including march madness

Bracketology broken down to a hunch.
Don't bet the house. Unless it's already in foreclosure.

Seems as though betting on the office pool has done more to advance game theory statistics than most anything else. You can listen to those sports-talking blowhards drop hints about upset specials all you want, but just pony up your money right now 'cause I'm ready to take it.

My money and bracketology is backed by the combined power and wisdom of the Logistic Regression Markov Chain, Las Vegas odds, and a few secrets pulled from a hat. Are you ready for a schooling?

Every year some computer science engineer, or more likely a whole group of them, announce they've recently developed the most sophisticated computer model for predicting March Madness outcomes. Last year it was the Logistic Regression Markov Chain (LRMC) which is a fancy name for a multiple regression analysis.

Logistic Regression Markov Chain was developed and refined by researchers Joel Sokol, Paul Kvam and George Nemhauser who are optimization and statistics professors at the Stewart School of Industrial and Systems Engineering at Georgia Tech. I think what that really means is that you should never, ever, play poker against them.

LRMC is fairly simple. The data that goes into the model are scoreboard inputs, meaning everyone has easy access to them. The most obvious example of this is who won and who lost the game. Won-loss record is the one of the best predictors of who will win a head-to-head match. The rest of variables just help to further define the difference between the teams. Teams who kick ass all the time by racking up huge margins of victory are more likely to kick a new opponents ass. Seems pretty obvious doesn't it. LRMC throws in some other factors, like how well the team performed against tough opponents, home record, road record, etc.

Basically though, LRMC isn't that much different than a slew of other rankings, all using more or less the same basic strategy built around won-loss records and each with a few bells and whistles thrown in to try to find the magic piece of the puzzle. You can see them all side-by-side here(massey ratings). One good thing about this site is that you can begin to see how each model compares to other models. Look closely, they all pretty much match up. Why? They have to; they all use pretty much the same data!

The ranking systems can give you the big picture. In general, any team ranked higher than another team, would be expected to beat the one ranked lower. Anyone picking the higher seeds to win will correctly pick about 70 percent of the time; but generally you have to do better than that if you want the loot. In reality, very little separates the top teams from one another. So pray, tell me, how to make the tough choices and move into the range of 75-80 percent correct picks in order to obtain office bragging rights?

Vegas odds. There's a reason Vegas is always in the black. The public, as a whole, is good at predicting a winner. The number of bets placed, determines the odds, so the team with the highest odds is the best choice. The problem with Vegas odds is that they are a constantly moving target, the odds need to be re-evaluated after each round and your complete bracket is due by 11 am ET Thursday. Vegas odds do have the advantage of incorporating some intangibles, like near-home court advantage, injuries, suspensions, etc. One problem with Vegas odds is that every booking agency in Vegas makes their own odds and the public interest can be fueled by a big star. For example, a lot of money is moving on Oklahoma because of the publicity around Blake Griffith, likely the college player of the year, and the odds are increasing they'll win the show. Does this make Oklahoma a better team? No. But it might get the crowd behind them which can alter games. Find the latest Vegas futures odds here.

Lastly, it turns out once the teams have been reduced to eight (regional finals) you can just guess at the winners. There's a 50/50 chance of each team winning. Ranking, odds, none of this matters anymore. One guess is as good as the next. But, here's the problem for you my friend. Your bracket is due Thursday, so your guesses for the later rounds have to be made before you know who'll be playing. Your chances on each game would be even if you got to pick again after each round, which you probably don't. So how do you insure that you have the best chance of having the top 8? You go back to the rankings and the odds. This is your best chance to be in the mix. But how good is that?

Turns out the best models predict about 75 percent of the games. Actually, the models are built in reverse. They don't predict; they calibrate to past data. We try to use them like fortune tellers but they are not. Models explain the past to give us insight into the future. And if you read the statistical papers carefully, they generally talk only about how well they do with getting the final four teams right. They generally don't talk about how well they did with actually predicting the overall winner, because if they could do that with any degree of certainty, they wouldn't be slinging numbers for a living, they'd be slinging cash money at the bar.

So now you are down to eight? Maybe you have all eight, but then what? You still want to win to office pool. Time to pull the rabbit from the hat. And what's inside?

Any one's guess. No shit. That's the secret. Just guess. It's luck at this point. Even if you have correctly picked all teams in the final 8, there are still 24 possible combinations to get to a champion. You aren't that good. So just guess. But here's a thought. At this point, since it doesn't really matter and the most you stand to lose is what, five or ten bucks, then root, root, root for the home team if available. They just might win it all.

Who knows, next year's top model might just be Correlated Gaussian Methodology. But if you plan on being Top Office Dog, you'll still have to couple those fancy model predictions with a sophisticated Lucky Guess.

No comments: