Statistics Are Dumb

Jonathan Willis
November 16 2011 08:18PM

Yes, I wrote that title. Not only did I write it, but I mean it.

“Dumb” basically means lacking intelligence. Most NHL statistics, frankly, require no intelligence. Let’s look at the complicated math involved in the basic statistics on NHL.com’s summary page.

  • Goals: Watching and counting
  • Assists: Watching and counting
  • Points: Adding goals and assists (or, alternately, watching and counting)
  • Plus/Minus: Watching and counting
  • Penalty minutes: Watching and counting
  • Power play goals: Watching and counting
  • Shorthanded goals: Watching and counting
  • Game-winning goals: Basic addition, watching and counting
  • Overtime goals: Watching and counting
  • Shots: Watching and counting
  • Shooting percentage: Basic division, watching and counting
  • Time on ice: Watching and counting
  • Shifts per game: Watching and counting
  • Face-off percentage: Basic division, watching and counting

Basically, if you’re capable of turning on your TV and counting things, you can create almost any NHL statistic from scratch. If you’re capable of doing that and then later using the division key on a calculator or computer, you can create any NHL statistic from scratch. I’ve listed a bunch above, but they’re all basically the same – goalie stats involve counting shots, goals and minutes played, real-time statistics all consist entirely of counting, and so on.

What about all those fancy advanced statistics that get thrown around? Scoring chance percentage, Fenwick, Corsi, EVPTS/60 – those are more complicated, right?

No.

Scoring chances involve somebody watching the game and counting. Scoring chance percentage simply involves taking the number of good scoring chances, and dividing them by the total number of scoring chances. In other words, if you know how to count and can press a division key on a calculator, you can have a firm grasp of this “advanced” statistic.

What about Fenwick? Well, you take those shots and missed shots that somebody counted up, and then you add them together – just like plus/minus. Corsi is the same thing, except that it includes blocked shots as well.

Points per 60 minutes of even-strength ice-time (or EVPTS/60) is almost as simple – one takes all the points a player scored at even-strength, and divides them by ice-time at even-strength to create a scoring rate. It is, once again, counting and pressing the divide key on a calculator. Pretty much as simple as can be.

But let’s go back to scoring chances. In an article yesterday, I did something audacious – I added up scoring chances for and against for Oilers’ defensemen. In the comments section, Robin Brownlee jokingly advised one commenter (i.e. not me) to do the following:

Your only option is to watch the games and draw your own conclusions.

Personally, I think that’s a great idea for everyone. It’s a little obvious, perhaps, but still a great idea.

It is, after all, what I do. I look for specific things – which players play the best opponents, what part of the ice players start their shifts in, how often players helps their team create a scoring chance, and how often players make mistakes that lead to chances against. As a rule, I try and get a gut feel for the game based on those things (others too, of course – which players take bad penalties, who wins faceoffs, etc.). Rather than watch the game multiple times and count those things up, I rely on others to do it – the NHL keeps track of a lot of these things (as mentioned above, by watching the game and counting) and people like Dennis King and Gabriel Desjardins catch the rest. I find that a firm number (i.e. Eric Belanger won 7 of 10 faceoffs) is better than my gut feeling (Eric Belanger wins a lot if faceoffs), so usually I’ll use the firm number instead of simply repeating my gut feeling. It’s the same thing with scoring chances – I know that Cam Barker’s getting heavily out-chanced by his opposition, but rather than say something like “man, that Cam Barker looks really bad” I’ll look up Dennis’ work and say “Cam Barker has been on the ice for 35 chances for and 49 against, which is one of the worst totals on the Oilers!” Afterward, rather than add “and he looks bad even though he’s got an easier job than other defensemen” I might use a number – like how many times he’s started shifts in the offensive zone, or how often he’s played the other team’s top line.

Of course, when I say “Barker has been on the ice for 35 chances for and 49 against” rather than “man, Cam Barker looks really bad,” someone comes along to tell me I should “watch the games.” I laugh, because it’s funny.

74b7cedc5d8bfbe88cf071309e98d2c3
Jonathan Willis is Managing Editor of the Nation Network. He also currently writes for the Edmonton Journal's Cult of Hockey, Grantland, and Hockey Prospectus. His work has appeared at theScore, ESPN and Puck Daddy. He was previously founder and managing editor of Copper & Blue. Contact him at jonathan (dot) willis (at) live (dot) ca.
Avatar
#51 Clyde Frog
November 17 2011, 12:50PM
Trash it!
0
trashes
+1
0
props
Jonathan Willis wrote:

@ Clyde Frog:

You're setting the confidence level awfully high at 80%.

Take goalies, for example - even-strength save percentage is the most repeatable year-to-year statistic (more so than wins, more so than GAA, more so than shutouts, etc.) and the year to year repeatability is below 50% with a single year of data (add up years and it jumps, but for one year it's fairly low).

Skater and team statistics are much better - measures like Fenwick correlate strongly with goals for/against, goals/for against correlates strongly with wins - but at this point there's no silver bullet statistic.

Then again, by eye measures don't come close to hitting 80% either. I've spent some time looking at team predictions from the start of the year in various hockey magazines, and the best of the bunch routinely are in the 60% range (to clarify - that's each year, the leader will hit 60%. I haven't seen a magazine yet that can hit 60% every year) when it comes to predicting team finish, simply because there are so many variables to try and lock down.

The confidence level (Alpha) isn't a measure of the likelihood that an outcome will be right, but a measure of how confident YOU are that the data in the sample and the model based off the sample accuratelly represents the population data (The unknown).

When you set it to 80 in your calculations, handled wonderfully by excel, you will end up with parametres defining the size of you sample, the confidence that the event happening is captured within it and so forth.

Basically what I was asking is if the stats people are doing regression modelling or anything else on known populations to see what kind of statistical spread, standard deviation and so on to determine what confidence they can put behind the model/measures they use.

I get what your saying about the lack of a silver bullet, mainly because this aint baseball and the events being measured are on a completely different scale for random variability.

But what quantitative results have been put forth? Correlation and correlation matrixes are nice for trend lines, but not for stating the confidence one holds in the statistical analysis.

I apologize if this comes off as an attack, its not meant to be. I really just want to find out the theory behind the numbers, I worry sometimes with correlation from statistical inference that the old trap of picking numbers to justify results over modeling the actual prediction.

Avatar
#52 Romulus' Apotheosis
November 17 2011, 01:21PM
Trash it!
0
trashes
+1
0
props

@Jonathan Willis

Thanks for the reply... good to know I was on the right track and didn't put anyone off.

I ask a lot of questions on here, often in a silly manner. But it's a genuine effort to solicit information. Like a lot of people who discover themselves on here I have never really thought much about hockey beyond "It's awesome!" and having knowledgable people inform you is a real asset.

Avatar
#54 Shredder
November 17 2011, 01:56PM
Trash it!
0
trashes
+1
0
props

@ Willis,

I may have been one of the guys giving you crap yesterday about nerding up the stats column...let's be frank...everyone who posts on this site is at least a bit of a hockey nerd, and loves their hockey stats...I apologize for my douchie-ness...I guess I only like stats when they make my favorite players look good, I am far too biased! I guess my point yesterday is that there is no reason to jump on some guys because we had a 3 game losing streak, and 80% of Edmonton knows that 95% of stats can make the other 5% look like 50%...if you know what I mean.

Avatar
#57 SmellOfVictory
November 17 2011, 02:48PM
Trash it!
0
trashes
+1
0
props

I think it bears noting that your post title has a double meaning: stats are also dumb in that they do require proper interpretation. If someone throws out "this guy has a 3.5 corsi/60 rating", that's all fine and good, but there is no context to it whatsoever. Context (which can be provided by other stats to a great degree) and understanding of the implications of the statistics are the things that make advanced stats a little more complicated than simply counting.

Avatar
#58 SmellOfVictory
November 17 2011, 02:49PM
Trash it!
0
trashes
+1
0
props

*to clarify, the requirment for context means they're "dumb" because they can't do all the work for you simply by reading them off behindthenet

Avatar
#59 Clyde Frog
November 17 2011, 04:03PM
Trash it!
0
trashes
+1
0
props

@Jonathan Willis

So are all the advanced stats kiddies keeping their numbers close to their chest? The more I look the less understanding of the underlying math they use to verify the numbers there is.

I can find all the calculations to recreate their statistics, but have little interest trying to verify the numbers and model them. (A very time consuming task)Just hoping you know if they post them as if your already doing the regression testing and more excel or whatever statistics tool you are using normally provides that information.

There seems to be little information on how to determine outliers, significant observations, confidence and more. I get that the aformentioned information is kind of overkill, but it is kind of necessary for understanding how all these theories work together and impact each other.

From my cursory knowledge and searches all I can seem to find is articles where the stats have been applied and conclusions drawn then argued about endlessly in the discussion threads.

Avatar
#60 SmellOfVictory
November 17 2011, 07:39PM
Trash it!
0
trashes
+1
0
props
Clyde Frog wrote:

So are all the advanced stats kiddies keeping their numbers close to their chest? The more I look the less understanding of the underlying math they use to verify the numbers there is.

I can find all the calculations to recreate their statistics, but have little interest trying to verify the numbers and model them. (A very time consuming task)Just hoping you know if they post them as if your already doing the regression testing and more excel or whatever statistics tool you are using normally provides that information.

There seems to be little information on how to determine outliers, significant observations, confidence and more. I get that the aformentioned information is kind of overkill, but it is kind of necessary for understanding how all these theories work together and impact each other.

From my cursory knowledge and searches all I can seem to find is articles where the stats have been applied and conclusions drawn then argued about endlessly in the discussion threads.

The numbers repository is at behindthenet.ca, if that's of interest to you. Don't know what the exact method of collection is (shot totals could be easily enough grabbed from the NHL, but I'm not sure about blocked/missed shots).

In terms of confidence interval, the sample size is pretty huge when it comes to shots over the course of a couple of seasons. I don't have a strong background in statistics, but I don't think a confidence interval is particularly relevant when you're looking at such large samples; especially when you have precise (albeit somewhat fallible in terms of possible human error in shot recording, etc) data on the entire population.

Avatar
#61 Clyde Frog
November 17 2011, 07:53PM
Trash it!
0
trashes
+1
0
props

@SmellOfVictory

Depends on your definition of sample size, thats what I am asking. If you taking a week, a month or 3 months worth of data and comparing it to a career and drawing statistical inference from it there is a big issue between those sample sizes.

So for those issues it really does matter because what advanced stats wants to do is break things down and state how that performance is working versus the "norm", will the future maintain this pattern or is it a outlier?

I hate the internet for taking out tone, so I hope that didn't come off dickish.

Just really interested in what amount of ice time or other stats you need to be even 80% confident you have the unbiased point estimate of the mean captured in your sample. Also what kind of qualifiers the advanced stats guys are using to ensure context isnt lost when we dive at players.

Comments are closed for this article.