Why?

Statistics and mathematical modeling have changed the way society functions.

Meteorologists use new methods of gathering information, especially from the atmosphere, to create a far grander database than any that human society has used for that purpose before. They feed these data points into supercomputers, and create complex models that involve a multitude of variables — often variables that are in a constant state of flux or that are difficult to get firm information on.

Edward Lorenz, who served as an army weather forecaster during World War II, and then studied and taught meteorology, coined the “butterfly effect” as shorthand for how tiny variations could affect weather models: “Does the flap of a butterfly’s wings in Brazil set off a tornado in Texas?” He pioneered the field of chaos theory largely in an effort to better predict meteorological changes. Scientists continue to use long-reaching models to predict damage caused to the earth by global warming and human impact, while similar (albeit less complex) models are used all over the Earth for long-range forecasting.

The world economy, tracking the financial dealings of seven-billion plus through a dizzying array of legal jurisdictions, tax laws and the like, is tracked through a series of macroeconomic statistics; the ones produced at Berkeley use GDP, CPI, unemployment rate, corporate profits, change in business inventories, housing starts, interest rates, exports, and personal savings rate, among others. It’s on the basis of similar economic models that men like the Governor of the Bank of Canada or the Chairman of the Federal Reserve issue projections and make their decisions — decisions that affect millions.

Even the advertising industry has made extensive use of mathematical modeling. Taking statistics compiled by government and the private sector, they break the population into demographics, and target their ads to different sectors of the population. By doing this they hope to better market their products. In 2001, The Coca-Cola Company alone spent nearly $2 billion on advertising — money that was largely distributed on the basis of demographic information compiled by market research companies.

Even sports teams have turned more to mathematical models in recent years. The book Moneyball, based on the success of Oakland Athletics’ GM Billy Beane, is generally considered to be the work that brought statistical work into the mainstream in baseball. Theo Epstein, originally a PR man with the San Diego Padres, worked his way up the ladder and in 2002 was hired by the Boston Red Sox as General Manager. He was 28; the youngest GM in the history of Major League Baseball, and a man who’d never played the sport at even the high school level. Using sabremetric principles (math pushed by the Society for American Baseball Research), his team has won two World Series championships in his six years at the helm.

These are just a few examples; in virtually every scientific field, whether “hard” science or the social sciences, mathematical modeling is a primary tool for extrapolating all kinds of factors. It’s used in other sports, it’s used in scenarios that are for less predictable than what occurs in the carefully regulated confines of the arena, and often it’s used with data far less complete than we have available for free from the stats page and play-by-play reports at NHL.com.

In most of these other fields, suggesting that your powers of observation and gut feeling were a better method for prediction than mathematical models using a vast database would get you laughed out of a job.

In hockey, it’s the only commonly accepted practice — and I have only one question. Why?

  • Hippy

    Ender wrote:

    Comparing hockey stats to anything in the hard or soft sciences is apples and oranges.

    I'm going to disagree with you a bit here, because even though the fields are vastly different, much of the math will still cross over.

    Take chaos theory – the same mathematical process is used to study epilepsy, the stock market, and weather patterns. I mean, the fields couldn't be more different, but the math crosses over.

    Now, we aren't talking about anything as complex as chaos theory, but it would seem to me that the same processes that allows us to account for multiple variables in a complex (but very observable) environment would translate across the fields, no?

  • Hippy

    @Jonathan

    I know I muddled a few points together, but the thing is that hockey is too fluid to deal with the kind of statistical analysis that you're talking about. Every one of your examples are "Let's roll the dice and see how they land." The issue is that in hockey they often don't land. I mean, we're talking about Quantum Mechanical level of complexity here. At best you're going to be able to work with probabilities, and even at that, the only meaningful things you're going to be able to talk about is how players play as a team due to scope.

    And that's assuming players are intelligent robots. What I mean by that is that quanta only act as quanta. They don't have "off-days" or get "psyched out" by a goalie. They are perfect players. Hockey players are not. Without interviewing/knowing the players or the room dynamic you're never going to know the validity of your assumptions. If Torres seems to be a streaky player, can we reasonably predict how long the streak will be? How many games? How many games on average? If we boil it to a macroscopic level (as to not look at the causes *for* the streak) we might be able to find those averages. But what exactly do they tell us? Presumably he doesn't have bipolar disorder on a strict schedule, so it would likely be things going right and wrong in his life. Can we assume his life to be reasonably static with barrels of money and a revolving door of teammates and cities? Probably not.

    I get what's going on in the oilogosphere, I really do. Just when I think about it at any sort of analytical level, I just wonder what the purpose is. People have said they want to "understand the game" better. The issue is that there are so many extenuating circumstances that we don't know about. I mean, take the "Moreau waved Pouliot out of the circle" thing or the "Garon hates Peeters" thing. At the level of study that's being done, you need reasons to make it make tangible sense, and more often than not, huge stretches are being made.

  • Hippy

    Jonathan Willis wrote:

    Ender wrote:

    Comparing hockey stats to anything in the hard or soft sciences is apples and oranges.
    I’m going to disagree with you a bit here, because even though the fields are vastly different, much of the math will still cross over.
    Take chaos theory – the same mathematical process is used to study epilepsy, the stock market, and weather patterns. I mean, the fields couldn’t be more different, but the math crosses over.
    Now, we aren’t talking about anything as complex as chaos theory, but it would seem to me that the same processes that allows us to account for multiple variables in a complex (but very observable) environment would translate across the fields, no?

    I think this is where we're getting held up. I'm saying that the dynamics of hockey are every bit as complicated as chaos theory and Quantum Mechanics.

  • Hippy

    Ender wrote:

    Jonathan Willis wrote:

    Ender wrote:
    Comparing hockey stats to anything in the hard or soft sciences is apples and oranges.
    I’m going to disagree with you a bit here, because even though the fields are vastly different, much of the math will still cross over.
    Take chaos theory – the same mathematical process is used to study epilepsy, the stock market, and weather patterns. I mean, the fields couldn’t be more different, but the math crosses over.
    Now, we aren’t talking about anything as complex as chaos theory, but it would seem to me that the same processes that allows us to account for multiple variables in a complex (but very observable) environment would translate across the fields, no?

    I think this is where we’re getting held up. I’m saying that the dynamics of hockey are every bit as complicated as chaos theory and Quantum Mechanics.

    I should add that while it's as complicated as your examples, it also has a psychological component which makes it that much more mathematically complex.

  • Hippy

    Ender wrote:

    Ender wrote:

    I should add that while it’s as complicated as your examples, it also has a psychological component which makes it that much more mathematically complex.

    And a physiological component. Both of which are non-linear variables far beyond outcomes that can be replicated.

  • Hippy

    David S wrote:

    Ender wrote:

    Ender wrote:
    I should add that while it’s as complicated as your examples, it also has a psychological component which makes it that much more mathematically complex.

    And a physiological component. Both of which are non-linear variables far beyond outcomes that can be replicated.

    Exactly.

  • Hippy

    JW, I'm going to take a wild stab at the title of your Masters thesis. Let me know if I'm right:

    A Subjective Analysis of the dietary Evolution of Polar Bears from B.C 7500 to Present

    😀 Am I even close?

  • Hippy

    @ Ender:
    Despite the earlier Avery comment, I'll step out here and say that a player with heart and spirit has a significantly better chance of looking favorable from a statistical perspective than one that does not. In other words, I can drive my score higher by loving what I do, even if I'm not particularly good at it. Sure, someone with God-given talent may do better than me. The reality is, however, that both of us are going to show up with good numbers because I simply want it more.

    I wish I had some statistics to back that up, but you're right that the quantification of spirit is very difficult.

  • Hippy

    @RLH

    I'm not really talking about "spirit" in the Rudy sense. I'm more talking about how much focus one has. It's one thing to love your job, but it's another to bring your A game when your marriage is falling apart (for example). Some people are better at dealing with stress than others, so I see the overlap between that and what you're calling spirit, but I don't think they're the same thing.

    And you're right. There's something to be said for "who wants it more."

  • Hippy

    RLH wrote:

    JW, I’m going to take a wild stab at the title of your Masters thesis. Let me know if I’m right:
    A Subjective Analysis of the dietary Evolution of Polar Bears from B.C 7500 to Present
    Am I even close?

    And I've been struggling with the title. Thanks!

  • Hippy

    @ Ender:

    Certainly if you want to predict it with game-in, game-out regularity we're talking an incredibly complex system, despite the fact that it's simple on the surface. But is that level of precision necessary?

    Basically, I view being a general manager (or a business owner or whatever) as making a series of smart bets. If, for example, we knew that a European defenseman playing a certain level of opposition and producing certain numbers relative to his team would translate well to the NHL 70% of the time.

    Then there are other things – look at the contract Philly signed Daniel Briere to. Guys generally get paid on offense and reputation; obviously as a GM you want to target guys who are undervalued, so paying for somebody who brings much more to the table than offense would seem to be the way to go, and some of the advanced statistics can help us there.

    You want to know which bets are good ones to make – and I think advanced stats can tell you that.

    Now, I apologize if that didn't make sense; I just got home from hockey and I fear that I'm not coherent right now.

  • Hippy

    Deans wrote:

    Contrary to public belief and practice, some things in life just aren’t quantifiable. How do you put a statistic on an important facet like desire and heart.

    Statistics are a vital assest for evaluation and prediction, however, we would be sticking our heads in the sand if we were to believe they told the whole story.

    I agree with these statement 9 times out of ten , with =/- of 2%, 99 times out of a 100.

  • Hippy

    If you’re using stats to confirm or deny your eyes, aren’t you running into the old correlation fallacy that if two things are correlated it “means” something? I mean, realistically there’s enough data out there to confirm just about anything to your eyes, depending on how rigorous (predictive) you want to be.

    Not necessarily. You know from talking to me personally that one of the things that bugs me the most about some of the stats stuff done in the Oilersphere is treating the Pearson correlation as some sort of all-purposes Truth Detector. The most you're ever going to get out any of these numbers, I suspect, is correlative in quality, and so the most you can imply is a relationship, rather than direct cause and effect, but I still think that they can at least tell you something about how a player's performed in the past, if not necessarily how they'll perform in the future (though you can make intelligent guesses, no doubt). I don't think there's too big of a logical leap from looking at a guy's GD and SD on the ice and his team's GD and SD without him and concluding that he's driving things in one direction or the other, especially since it's something you can go back and see for yourself. ("Hey look, that Horcoff guy is pretty good defensively.") If anything, it's QualComp-type stats and indices of zone time that are a bloody mess, though I respect the idea they're trying for, at least.

    Statistics are a vital assest for evaluation and prediction, however, we would be sticking our heads in the sand if we were to believe they told the whole story.

    No one method of analysis will tell you everything in a complex system with a large number of confounding variables. For example, I don't think I've seen any definitive statement on whether Staios or Smid is "the problem" on the third pairing, or if it's the forwards that are causing trouble for both of them. In any case, you're never going to be right 100% of the time when making predictions, because as Ender points out, the system you're talking about is at quantum-level complexity. I never understood how anyone could say hockey was less complex than baseball, because hockey is a fluid system of ten frequently-changing moving parts and two semi-stationary ones, whereas baseball is a discrete system with mostly-consistent starting conditions, but I digress.

    And a physiological component. Both of which are non-linear variables far beyond outcomes that can be replicated.

    Well, the physiological variables can probably be modelled a lot more readily than the psychological ones. The psychological model is another hugely complex system involving a good chunk of the brain, which we still don't have the technology to map and model at the single-neuron level (though we're getting there), and even if we could, it's a system in flux, i.e. it continues to change with time. The physiological model, at least, is largely mechanical, and there's an entire field (which I'm currently studying) devoted to studying biological systems from the perspective of a mechanical engineer. There's still a ton of variables there, but I suspect they're a little more manageable, because we understand a lot more of the underlying interactions of exercise physiology than we do psychology. While there are still some aspects of physiology that don't quite match our models neatly, like Newtonian physics, it's usually good enough for everyday purposes. Hell, we still use the Hill equation to model the force-velocity relationship of muscle more than 80 years after his paper was first published in 1938.

    I've talked about both components in the past at some length, and I think both have to be considered in the discussion. That being said, I do wonder if certain things show up over time as trends. For example, one of the things I'd like to see at some point is some more data on the leading-trailing-tied splits, and who scores the "true" GWG: the one that breaks the tie for good, not simply the "Opponent + 1" goal. I think that would be a better measure of "clutchness" (a concept many Statzis are allergic to, despite the very clear scientific evidence that such a thing exists from a psychological point of view, probably because it's too complex to realistically model, particularly with very little access to the most vital data), and tell us something new about some players. I suspect Hemsky would have a pretty decent "clutch" rating. I suspect "heart" and "desire," as well as better fitness, would naturally manifest themselves in those that are more consistent and more "clutch."

  • Hippy

    Jonathan Willis wrote:

    @ Ender:
    Certainly if you want to predict it with game-in, game-out regularity we’re talking an incredibly complex system, despite the fact that it’s simple on the surface. But is that level of precision necessary?
    Basically, I view being a general manager (or a business owner or whatever) as making a series of smart bets. If, for example, we knew that a European defenseman playing a certain level of opposition and producing certain numbers relative to his team would translate well to the NHL 70% of the time.
    Then there are other things – look at the contract Philly signed Daniel Briere to. Guys generally get paid on offense and reputation; obviously as a GM you want to target guys who are undervalued, so paying for somebody who brings much more to the table than offense would seem to be the way to go, and some of the advanced statistics can help us there.
    You want to know which bets are good ones to make – and I think advanced stats can tell you that.
    Now, I apologize if that didn’t make sense; I just got home from hockey and I fear that I’m not coherent right now.

    I'm not really saying that it's impossible to get the answers that you want on any scale. Faceoff numbers, for example, are reasonably repeatable. It's also entirely possible that some players *will* be consistent enough to hold up to this kind of analysis. It just won't be all of them.

    And here's me bringing my biases in, but I'd love to see some sort of statement of intent once in a while. People present information, and sure, it's interesting, but there is generally no conclusion. There is no indication of what scale or scope people are working on or what they're trying to get at with their writing.

    Take the LeCav piece for example. It's all well and good to put up stats that indicate that divisional play makes a big difference with regards to output. LeCav + St Louis will likely be better than Hemsky + Horcoff most days, but their stats will always be higher due to the divide between the skill levels of the divs. At that point it's easy to call it obvious to anyone who's watching the game, and move on. That said, it's important to test the obvious, if you have any numbers to back it up.

    The issue is that at that point, what are you really trying to say? I mean, there have been all sorts of rumors of dissent in the Bolts' locker room. Coaching and management have always been a problem. Plus, these players have played against soft opp for so very, very long. It's entirely possible that LeCav would flourish in the NW because it would be a challenge again (and some people like challenges) and entirely possible Hemsky would flounder in Tampa due to the lack of challenge. The only way you're going to be able to know that is by knowing the character of the players, which bloggers won't.

    Now, your question was "why?" so here's the answer. Anyone with eyes can see that both LeCav and Hemsky are top-level talent, regardless of who is better on paper. That said, GMs can actually talk to players and ex-coaches, etc to get to know the player. The "seen him good" crowd has a point in that if you're paying attention you can at least divide players up into tiers. From there, sure, you could crunch the numbers, but since we're talking about betting, I bet you that talking to the personel would be more informative than the numbers 99 times out of 100.

  • Hippy

    Ender wrote:

    From there, sure, you could crunch the numbers, but since we’re talking about betting, I bet you that talking to the personel would be more informative than the numbers 99 times out of 100.

    Really? 99 times out of 100? Then why do NHL franchises consistently make stupide decisions?

    Look at off-season free agency. Daniel Briere's widely hyped, signed to a contract that never ends, and hailed as a difference maker. Than he doesn't live up to expectations – from QualComp, we know he was playing sheltered minutes in Buffalo. That matters.

    Turns out, there are a whole bunch of players who are hailed as "difference makers" but then fail to live up to expectations – and using only QualComp, we should know better.

  • Hippy

    Jonathan Willis wrote:

    Ender wrote:

    From there, sure, you could crunch the numbers, but since we’re talking about betting, I bet you that talking to the personel would be more informative than the numbers 99 times out of 100.
    Really? 99 times out of 100? Then why do NHL franchises consistently make stupide decisions?
    Look at off-season free agency. Daniel Briere’s widely hyped, signed to a contract that never ends, and hailed as a difference maker. Than he doesn’t live up to expectations – from QualComp, we know he was playing sheltered minutes in Buffalo. That matters.
    Turns out, there are a whole bunch of players who are hailed as “difference makers” but then fail to live up to expectations – and using only QualComp, we should know better.

    Because people aren't doing their research. Are you really telling me that if they talked to the personnel in a responsible way and actually watched when Briere was on the ice they wouldn't notice the level of competition?

    I mean, you're assuming people aren't lazy here.

  • Hippy

    Once each game is taped and information for each player on the ice (and the puck) is tracked serious progress will be made – not that there hasnt been huge progress already.

    As more and more information becomes available mathematical models are going to become a huge competitive advantage.

    Jonathan lets go apply for jobs at the Oilers 😉

  • Hippy

    Ender wrote:

    Because people aren’t doing their research. Are you really telling me that if they talked to the personnel in a responsible way and actually watched when Briere was on the ice they wouldn’t notice the level of competition?

    If it was just Briere, but every summer there are a ton of guys who get picked up this way – and honestly, if you're gonna bet 65 million or whatever the number on Briere's contract is, you do your research.

    They aren't picking up a guy for their fantasy roster – it's their job and it's a truckload of real money that's on the line.

    I don't mess around at work with 1000$ on the line; forget about with 65 million. I don't buy the explanation that they're all lazy – I'd suggest that they just don't know who he's playing against.

  • Hippy

    Jonathan Willis wrote:

    I don’t mess around at work with 1000$ on the line; forget about with 65 million. I don’t buy the explanation that they’re all lazy – I’d suggest that they just don’t know who he’s playing against.

    And I'd suggest that if they don't know who he's playing against, they're not doing their research.

    Look, I'll give you Briere. But for every Briere there are a dozen Hejdas who come out of nowhere (I say this, ccertain that you'll pull numbers from his euro league days, but I still think it's beside the point). I'm not saying stats can't help that. I'm saying that stats *cannot* tell the whole story. However, between "seen him good" and access to people around the player, not using stats *can* tell the whole story. Not saying it definitely will or *should* but I still think it *can*.

    Whether or not people use the tools at their disposal for the best result is another story altogether.

  • Hippy

    Ender wrote:

    But for every Briere there are a dozen Hejdas who come out of nowhere (I say this, ccertain that you’ll pull numbers from his euro league days, but I still think it’s beside the point).

    Is that really beside the point? I would say a gem toiling in a Euro league is EXACTLY where the "seen him good" approach will fail, simply for lack of resources.

    It's both easier and cheaper to pay an intern to put all the numbers in a spreadsheet and flag the guys who float to the top.

    Hejda only "came out of nowhere" if you didn't know where to look.

  • Hippy

    Mike wrote:

    Ender wrote:

    But for every Briere there are a dozen Hejdas who come out of nowhere (I say this, ccertain that you’ll pull numbers from his euro league days, but I still think it’s beside the point).
    Is that really beside the point? I would say a gem toiling in a Euro league is EXACTLY where the “seen him good” approach will fail, simply for lack of resources.
    It’s both easier and cheaper to pay an intern to put all the numbers in a spreadsheet and flag the guys who float to the top.
    Hejda only “came out of nowhere” if you didn’t know where to look.

    I even said " I’m not saying stats can’t help that," if you would have followed the quote for another sentence or two. All I'm arguing is that stats are not necessary. I'm not saying they're not useful. That said, there are so many mitigating factors that anyone who bets that kind of money only on stats will always be crazy. The house always wins, as they say. However, A scenario exists in which stats are not necessary, and a better decision can be made than using all the stats in the world, and that's in-person scouting.

    Sorry, but you're arguing against a straw man, and it is beside the point.

  • Hippy

    Hejda is also a strange example because the guys who watched him and related to him all year (Oilers management) didn't think him worth re-signing (or, at least, the didn't re-sign him), while CLB did, to a 1 year, 1 mil contract.

    Meanwhile all the stats guys in the oilers blogs were crying, hoping and praying that EDM would re-sign Hejda as the stats they had compiled indicated that he was an underrated defencemen. And they were right.

  • Hippy

    @speeds

    Q – "In hockey, [Watching a player is] the only commonly accepted practice — and I have only one question. Why?"

    A – " I’m not saying stats can’t help that. I’m saying that stats *cannot* tell the whole story. However, between “seen him good” and access to people around the player, not using stats *can* tell the whole story. Not saying it definitely will or *should* but I still think it *can*.

    Whether or not people use the tools at their disposal for the best result is another story altogether."

    I'm not going to argue these straw man cases. I always end up trying to explain and re-explain myself in these threads, and I'm done with that. Here's another Q.

    Q – "When people use non-stats arguments, why do the stats people argue "but stats show that!" when it has absolutely nothing to do with what the person is even saying?"

  • Hippy

    What is a straw man about Hejda?

    Your contention is that by watching him play the Oilers should have known he was worth keeping? Admittedly everyone makes mistakes, but who saw him more than the Oilers?

  • Hippy

    It's not a matter of whether or not you think he was worth keeping. It's a matter of whether or not he fit into their long-term plans. Until you can tell me with any level of certainty exactly what they were thinking when they let him go, it's a straw man argument.

  • Hippy

    The problem with "saw him good" is that, as the British commercial goes, it does exactly what it says on the tin. You see him good one night, and don't see him again for another couple of months, what do you know? If you watch a guy every night, that's fine, but even then, it's easy to miss things. The stats are a tool to help fill in the blank. Using only stats or only saw-him-good are, I think, fool's errands. Using both things in their proper contexts, as well as things like talking to a player, his coaches, etc. (if you can) seems like the best way to get the whole story on a guy.

    Put another way, the problem with stats has less to do with whether they can tell you everything — rare and mistaken is the person who tells you so — and more to do with whether they tell you what they think they're telling you, and that's a usage and sample-size problem. The problem with saw-him-good is that it purports to do exactly that, and I don't believe that it can. Your eye can catch certain things the stats can't, just as the stats can catch certain things your eye and memory can't.

    As for Hejda, I was under the impression he refused to come back because of how he was treated (benched for most of the first half of the season). As for the stats, well, I saw him good all the way back in the pre-season that year, so make of that what you will.