Statistics Are Dumb

Yes, I wrote that title. Not only did I write it, but I mean it.

“Dumb” basically means lacking intelligence. Most NHL statistics, frankly, require no intelligence. Let’s look at the complicated math involved in the basic statistics on NHL.com’s summary page.

  • Goals: Watching and counting
  • Assists: Watching and counting
  • Points: Adding goals and assists (or, alternately, watching and counting)
  • Plus/Minus: Watching and counting
  • Penalty minutes: Watching and counting
  • Power play goals: Watching and counting
  • Shorthanded goals: Watching and counting
  • Game-winning goals: Basic addition, watching and counting
  • Overtime goals: Watching and counting
  • Shots: Watching and counting
  • Shooting percentage: Basic division, watching and counting
  • Time on ice: Watching and counting
  • Shifts per game: Watching and counting
  • Face-off percentage: Basic division, watching and counting

Basically, if you’re capable of turning on your TV and counting things, you can create almost any NHL statistic from scratch. If you’re capable of doing that and then later using the division key on a calculator or computer, you can create any NHL statistic from scratch. I’ve listed a bunch above, but they’re all basically the same – goalie stats involve counting shots, goals and minutes played, real-time statistics all consist entirely of counting, and so on.

What about all those fancy advanced statistics that get thrown around? Scoring chance percentage, Fenwick, Corsi, EVPTS/60 – those are more complicated, right?

No.

Scoring chances involve somebody watching the game and counting. Scoring chance percentage simply involves taking the number of good scoring chances, and dividing them by the total number of scoring chances. In other words, if you know how to count and can press a division key on a calculator, you can have a firm grasp of this “advanced” statistic.

What about Fenwick? Well, you take those shots and missed shots that somebody counted up, and then you add them together – just like plus/minus. Corsi is the same thing, except that it includes blocked shots as well.

Points per 60 minutes of even-strength ice-time (or EVPTS/60) is almost as simple – one takes all the points a player scored at even-strength, and divides them by ice-time at even-strength to create a scoring rate. It is, once again, counting and pressing the divide key on a calculator. Pretty much as simple as can be.

But let’s go back to scoring chances. In an article yesterday, I did something audacious – I added up scoring chances for and against for Oilers’ defensemen. In the comments section, Robin Brownlee jokingly advised one commenter (i.e. not me) to do the following:

Your only option is to watch the games and draw your own conclusions.

Personally, I think that’s a great idea for everyone. It’s a little obvious, perhaps, but still a great idea.

It is, after all, what I do. I look for specific things – which players play the best opponents, what part of the ice players start their shifts in, how often players helps their team create a scoring chance, and how often players make mistakes that lead to chances against. As a rule, I try and get a gut feel for the game based on those things (others too, of course – which players take bad penalties, who wins faceoffs, etc.). Rather than watch the game multiple times and count those things up, I rely on others to do it – the NHL keeps track of a lot of these things (as mentioned above, by watching the game and counting) and people like Dennis King and Gabriel Desjardins catch the rest. I find that a firm number (i.e. Eric Belanger won 7 of 10 faceoffs) is better than my gut feeling (Eric Belanger wins a lot if faceoffs), so usually I’ll use the firm number instead of simply repeating my gut feeling. It’s the same thing with scoring chances – I know that Cam Barker’s getting heavily out-chanced by his opposition, but rather than say something like “man, that Cam Barker looks really bad” I’ll look up Dennis’ work and say “Cam Barker has been on the ice for 35 chances for and 49 against, which is one of the worst totals on the Oilers!” Afterward, rather than add “and he looks bad even though he’s got an easier job than other defensemen” I might use a number – like how many times he’s started shifts in the offensive zone, or how often he’s played the other team’s top line.

Of course, when I say “Barker has been on the ice for 35 chances for and 49 against” rather than “man, Cam Barker looks really bad,” someone comes along to tell me I should “watch the games.” I laugh, because it’s funny.

  • @ Clyde Frog:

    Gotcha – my bad for misunderstanding the question. I didn’t take it as an attack.

    Gabe Desjardins (behindthenet.ca, arcticicehockey.com) and Vic Ferrari (vhockey.blogspot.com), among others, do that sort of thing fairly regularly. My own statistical background is shallow enough that I need a lot of hand-holding to do complex statistical modeling, so I typically don’t do much of it myself.

  • Clyde Frog

    @ Willis,

    I may have been one of the guys giving you crap yesterday about nerding up the stats column…let’s be frank…everyone who posts on this site is at least a bit of a hockey nerd, and loves their hockey stats…I apologize for my douchie-ness…I guess I only like stats when they make my favorite players look good, I am far too biased! I guess my point yesterday is that there is no reason to jump on some guys because we had a 3 game losing streak, and 80% of Edmonton knows that 95% of stats can make the other 5% look like 50%…if you know what I mean.

  • @ Shredder:

    No problem.

    Just so you know, I don’t do much based on a team’s short-term record. Good teams have bad streaks and bad teams have good streaks all the time. So if there’s a three-game losing streak, I (as a rule) will intentionally ignore it, because it’s not long enough to tell us anything with certainty.

    When I’m writing, I write whatever strikes me as being of interest that day. Sometimes, it’s negative when the team’s playing poorly, but that’s usually just a result of chance rather than intent.

    • Clyde Frog

      So are all the advanced stats kiddies keeping their numbers close to their chest? The more I look the less understanding of the underlying math they use to verify the numbers there is.

      I can find all the calculations to recreate their statistics, but have little interest trying to verify the numbers and model them. (A very time consuming task)Just hoping you know if they post them as if your already doing the regression testing and more excel or whatever statistics tool you are using normally provides that information.

      There seems to be little information on how to determine outliers, significant observations, confidence and more. I get that the aformentioned information is kind of overkill, but it is kind of necessary for understanding how all these theories work together and impact each other.

      From my cursory knowledge and searches all I can seem to find is articles where the stats have been applied and conclusions drawn then argued about endlessly in the discussion threads.

      • SmellOfVictory

        The numbers repository is at behindthenet.ca, if that’s of interest to you. Don’t know what the exact method of collection is (shot totals could be easily enough grabbed from the NHL, but I’m not sure about blocked/missed shots).

        In terms of confidence interval, the sample size is pretty huge when it comes to shots over the course of a couple of seasons. I don’t have a strong background in statistics, but I don’t think a confidence interval is particularly relevant when you’re looking at such large samples; especially when you have precise (albeit somewhat fallible in terms of possible human error in shot recording, etc) data on the entire population.

        • Clyde Frog

          Depends on your definition of sample size, thats what I am asking. If you taking a week, a month or 3 months worth of data and comparing it to a career and drawing statistical inference from it there is a big issue between those sample sizes.

          So for those issues it really does matter because what advanced stats wants to do is break things down and state how that performance is working versus the “norm”, will the future maintain this pattern or is it a outlier?

          I hate the internet for taking out tone, so I hope that didn’t come off dickish.

          Just really interested in what amount of ice time or other stats you need to be even 80% confident you have the unbiased point estimate of the mean captured in your sample. Also what kind of qualifiers the advanced stats guys are using to ensure context isnt lost when we dive at players.

  • I think it bears noting that your post title has a double meaning: stats are also dumb in that they do require proper interpretation. If someone throws out “this guy has a 3.5 corsi/60 rating”, that’s all fine and good, but there is no context to it whatsoever. Context (which can be provided by other stats to a great degree) and understanding of the implications of the statistics are the things that make advanced stats a little more complicated than simply counting.