Jim Corsi and His Statistic

Jonathan Willis
March 25 2009 12:42PM

corsi-and-miller

Jim Corsi, who spent his entire NHL career with the Edmonton Oilers, is a rather interesting guy. He took an unusual road to the NHL, through Canadian university hockey. He spent three years with the Canadian Olympic soccer team during that time as well, before leaving soccer to focus solely on hockey.

Corsi played two seasons with the Quebec Nordiques before the World Hockey Association folded at the end of 1978-79. The next year, Corsi went 8-14-3 with a 3.65 GAA for the 1979-80 Edmonton Oilers, a team that featured 18-year olds Mark Messier and Wayne Gretzky, along with a 20-year old Kevin Lowe. They lost in the first round that season, although Corsi had already moved on, being dealt to Minnesota for future considerations. He spent the next decade in Italy and represented them at the world championships eight different times.

Corsi holds a graduate degree in engineering, and speaks four different languages. He’s been the goaltending coach for the Buffalo Sabres since the 1997-98 season, and during his tenure the Sabres have been a reliable producer of NHL goaltenders (Ryan Miller, Martin Biron, Mika Noronen).

Keith Loria of NHL.com interviewed several NHL goalie coaches, including Corsi, for a January 29th piece, and he talked a little bit about pregame preparation and also what he does with the Sabres’ minor-league goaltenders:

On the road, you give your goalie an understanding of the surroundings, how the boards work, how the glass works, video and stats of the opposing players. I have to coordinate with the coaching staff and give a general idea on how the opposition prepares itself in the offensive side of the game and power plays."

"One of my duties is to work with our Portland team and we have various technical ways of following our goaltenders and draft picks playing in college. We'll look at videos and watch tape. We stay in contact with them as best we can. It's not always easy because you can see how the player is playing in a game but you don't always know the emotional part, so you try to be hands on with all of them."

I thought there were some interesting points in those quotes. Jim Corsi is best known for the statistic that bears his name, the Corsi number; and as a result he’s frequently mislabeled as someone who only uses numbers in his work. Such a label is obviously wrong: the statistics don’t tell you how pucks bounce off the glass or the boards. He also talks about the difficulties of managing players spread out across the country, saying that it’s difficult to coach when “you don’t always know the emotional part”.

Yet, from his quote we can also get an idea of the technical difficulties of following just the goaltenders already drafted by an organization. There are hundreds of players in the NHL, thousands more playing in Europe, the AHL and the ECHL. Then there are the players in various junior leagues around North America, and going the college route; even with a large scouting staff, it is difficult to keep tabs on everyone. The Sabres have resorted to doing the vast majority of their scouting via video, a decision that may end up making some sense – is it better to see a player a few times live or many times on video?

This is one of the real advantages of statistics over watching a player live: sample size. The short-hand for this is what I call the Brian Boucher effect. Boucher is an NHL journeyman; he’s currently the backup goaltender for the San Jose Sharks. He’s had some ups and downs in his career, occasionally taking over the starting job and sometimes getting stashed in the minors.

On December 31, 2003, Boucher posted a shutout over the Los Angeles Kings. He posted another in his next game against the Dallas Stars. Over the next week, the Coyotes would beat the Carolina Hurricanes, Washington Capitals, and Minnesota Wild, with Boucher posting shutouts in all five games.

Now imagine, for a moment, that Boucher were a little used junior goaltender, and the scout watching him caught just that five game segment. Without statistical context, what assumption would likely be made?

That is the first difficulty. The second lies in trying to gauge all 18 skaters playing in a single game. It’s a difficulty that Gare Joyce ran into when he tagged along with Columbus scouts for his book Future Greats and Heartbreaks. Joyce attempted to do scouting reports on all of the players, and found his notes a confused and nearly useless mess. A long-time scout advised him to focus on just one player for an entire shift, following him exclusively, because that’s how he isolated players for his reports. Joyce did so, with superior results.

I’ve used that technique myself. I’m hardly a scout; I imagine that there are so many things picked up over the course of a lifetime in the game that a fan like me doesn’t start to comprehend. Still, by choosing one player and following him, you can get an excellent idea of his quality. On the other hand, such focus comes at a price; you miss much of what else is happening on the ice.

The combination of these two problems is where statistics become useful. Of course, they’re limited by some of the same problems: let’s say Player X spends 90% of his ice-time with offensive stars, and 10% with the rest of the team. In the 10% segment, his line scores 3 goals and allows 3. In the 90% segment, he puts up very good numbers playing with good players. But is he a good player, or is he being carried by his line-mates? It’s very difficult to tell, based on the numbers.

This is where the Corsi number comes in handy. It’s not a common statistic, so I’ll give a brief explanation. The NHL tracks shots on net, missed shots and blocked shots. The Corsi number is the total of all shots at net (incl. misses and blocks) for and against while a player is on the ice. To make it more accurate, this is often measured only at even-strength. Because the vast majority of shots come from the offensive zone, this statistic is a fairly good measure of who is spending a lot of time in the offensive zone, and who is getting stuck in the defensive zone.

Suddenly, that 10% segment is much bigger. Instead of 3 goals for and 3 goals against, we can expand it to 21 shots for, and 40 shots against. If we toss in missed shots and blocked shots, we could be looking at a Corsi of +60/-100, which tells us that our Player X was likely spending too much time in the wrong end of the rink. Over the course of an entire season, a first-line player is on ice for about 2000 shot attempts in one direction or the other, and even a fourth-line player generally sees around 500. Thus, this statistic gives us a big-picture view of which direction the play is going when any given player is on the ice.

This isn’t to advocate turning control of a hockey team over to a shot-counting computer (although it probably wouldn’t do any worse than Doug MacLean or Mike Milbury). Jim Corsi, who created the statistic, talks about the importance of the emotional state of a player, and different visual variables like how pucks react along the boards of a given arena, and the same holds true for scouting. As one example, the numbers tell us that Ales Kotalik is an elite powerplay performer; but only by watching the games or video do we realize that he’s scoring goals by playing the left point. How a player gets his points is certainly a major consideration when acquiring him, and it’s only one of the things that the statistics don’t show well.

On the other hand, what the statistics do show is overall effectiveness. They don’t do a good job of show the process, but they do an excellent job of showing the results. Sticking to the Kotalik example, by using various advanced statistics we know that at even-strength he’s been playing a third or fourth line role for years (and starting in the offensive zone more often than not), and that he really hasn’t produced much offensively relative to his ice-time 5-on-5. This is something that Oilers’ management either didn’t pick up on or chose to ignore; they put Kotalik on the top line despite a proven track record of not being a difference maker 5-on-5, and the results were predictable.

The other thing statistics do is catch stuff that has been missed, much in the way video does. Don Cherry was coaching when Roger Neilson started the push towards video, and he said something to the effect that he saw the game well enough from behind the bench and didn’t need to review tape. Now, every team in the league does - because coaches are human, and have human limitations. The scope of the game is too broad to catch everything in one go. As one quick example – is the Strudwick pairing deployed more frequently in the offensive or defensive zone? Without the numbers, I’d say defensive; Strudwick doesn’t put up points and has a solid reputation as a tough, physical, stay-at-home guy. In point of fact, no defenseman on the team is deployed more in the offensive zone (relative to ice-time) than Jason Strudwick – and that’s an important thing to know when evaluating his performance this season.

In short, statistics are a useful tool for talent evaluation; they don’t predict the future, but they show where and how a player has been used, and what his results have been in different circumstances – against different players, with different players, in different zones, with lots of icetime, with little icetime. They do it for the entire league, providing a broad picture for every player to spend significant time in the NHL. They can tell you who to watch and what to watch for; if a player is putting up negative results, video can be examined for the reason. They aren’t meant to replace visual observation; they’re meant to augment it, focus it and refine it.

As a final point, statistical analysis is frequently mocked by journalists and other fans as something that bloggers came up with and use because they don’t have access to the team, or as something that nobody who ever played the game would use. In reality, statistics have been developed by experienced NHL personnel, and are used at that level. We’re just trying to catch up to the things done by people like Jim Corsi, Ron Wilson, Roger Neilson, and the like – observing and copying a trend, not starting one.

74b7cedc5d8bfbe88cf071309e98d2c3
Jonathan Willis is a freelance writer. He currently works for Oilers Nation, the Edmonton Journal and Bleacher Report. He's co-written three books and worked for myriad websites, including Grantland, ESPN, The Score, and Hockey Prospectus. He was previously the founder and managing editor of Copper & Blue.
Avatar
#1 Archaeologuy
March 25 2009, 12:52PM
Trash it!
0
trashes
Cheers
0
cheers

Epic entry Willis, thanks for the first installment in the Willictionary of Statistics

Avatar
#2 442Junkie
March 25 2009, 01:00PM
Trash it!
0
trashes
Cheers
0
cheers

Were Jim Corsi and Eugene Levy separated at birth? Other than that I've got nothing.

Avatar
#3 Phil
March 25 2009, 02:12PM
Trash it!
0
trashes
Cheers
0
cheers

Thanks for the excellent write up Jon.

Great piece by you. Education at work.

Someone que "The More You Know" pic from those commercials.

I'm trying not to hate statistics & math so much anymore.

Avatar
#4 Chris
March 25 2009, 02:14PM
Trash it!
0
trashes
Cheers
0
cheers

Great post Willis. I get the feeling that current management regime often "goes with their gut" as opposed to taking a more measured approach when it comes to personnel/lineup decisions. Lowe, and MacTavish have been able to draw on their years of first hand experience; and undoubtably have a high level of insight...So their GUT calls have often been very sound... I wonder, however, that as the years pass, and the game continues to evolve, and the players change; Are MacT and Lowe beginning to lose their touch? They have made some pretty bizarre decisions recently... They seem to have a habit of blatantly disregarding good data, making moves that haven't worked out at all. We have both repeatedly cited the Kotalik example...

Avatar
#5 Ender
March 25 2009, 02:18PM
Trash it!
0
trashes
Cheers
0
cheers

Sidenote: My history might be bad, but I thought Vic actually developed the Corsi stat based on an offhand quote from Mr. Corsi himself.

More to the point, I'm still not sold on the Corsi concept. Sometimes teams get outshot because goalies give up a lot of rebounds. Some times they get outshot because the opposing team is taking a bunch of garbage shots from the outside. Sometimes they're being outshot because the opposing team is playing catch-up. You say Corsi shows results, but what results exactly? How many shots for/against?

Now, you go on to say that stats are to refine video, which is something I'm personally fine with. When your "journalists and other fans" are mocking stats as something that bloggers came up with to compensate for not being in the game, you're IMO missing the point of those journalists and other fans. Stats on the 'sphere, as a general rule, are used *instead* of video. That removes context, and really anything can be proven by presenting the right numbers in the right order.

I mean, hell, the 'sphere would get a lot further with the people you seem to be arguing against by posting youtube clips of plays and then backing it up further with stats.

At the end of the day, it's the old Math vs Physics debate. Math is always correct, but it doesn't necessarily represent the real world. Physics isn't ever entirely correct, but it is a pretty good approximation of the real world. The issue comes in when the people working Math present it as though it's Physics.

Avatar
#6 Jonathan Willis
March 25 2009, 02:30PM
Trash it!
0
trashes
Cheers
0
cheers

Ender wrote:

Sidenote: My history might be bad, but I thought Vic actually developed the Corsi stat based on an offhand quote from Mr. Corsi himself.

Behind the Net credits Jim Corsi, and IIRC Lindy Ruff discussed this in an interview somewhere. If Vic did in fact develop it, I'm sure he'll chime in and let me know.

Avatar
#7 Mike
March 25 2009, 02:31PM
Trash it!
0
trashes
Cheers
0
cheers

Ender wrote:

Some times they get outshot because the opposing team is taking a bunch of garbage shots from the outside. Sometimes they’re being outshot because the opposing team is playing catch-up. You say Corsi shows results, but what results exactly?

As per Willis, that would tell us "that our Player X was likely spending too much time in the wrong end of the rink."

It's a rough indicator of zone time, which the NHL does not track. Sure, once in a while you could get less shots in 12 minutes of offensive time than the Red Wings get in 8 minutes in your own end, but those will be the outliers, and over the course of a season will smooth out to be less of a blip.

Avatar
#8 Mike
March 25 2009, 02:32PM
Trash it!
0
trashes
Cheers
0
cheers

And I should say "attempted shots", not just shots.

Avatar
#9 Jonathan Willis
March 25 2009, 02:35PM
Trash it!
0
trashes
Cheers
0
cheers

Ender wrote:

Sometimes teams get outshot because goalies give up a lot of rebounds. Some times they get outshot because the opposing team is taking a bunch of garbage shots from the outside.

To your first point, I'd suggest that adding in missed shots and blocked shots adds some clarity; it's fairly rare that rebounds result in either of those two numbers.

To your second point, I completely agree. I've made the mistake in the past of using straight Corsi from team to team, and that's wrong; Detroit for example has always been a shot-happy outfit.

On the other hand, these factors should be consistent throughout the roster, and when faceoffs are added in Corsi still gives us a fairly accurate - in broad strokes, mind you - picture of who is spending time in which zone.

Avatar
#10 Jonathan Willis
March 25 2009, 02:35PM
Trash it!
0
trashes
Cheers
0
cheers

Mike wrote:

It’s a rough indicator of zone time, which the NHL does not track.

Yes - if the NHL did track it, Corsi would be largely superfluous.

Avatar
#11 Jonathan Willis
March 25 2009, 02:41PM
Trash it!
0
trashes
Cheers
0
cheers

Ender wrote:

When your “journalists and other fans” are mocking stats as something that bloggers came up with to compensate for not being in the game, you’re IMO missing the point of those journalists and other fans. Stats on the ’sphere, as a general rule, are used *instead* of video. That removes context, and really anything can be proven by presenting the right numbers in the right order.

In a vacuum, perhaps. But the folks who follow the Oilers in the 'sphere do watch the games, as do the people who read the blogs. The video is an influence; we've all formed our opinions through a combination of watching and statistical analysis, whether we admit it or not - at least in reference to our own team.

As for those "other fans", would you agree with me that there is a wide disparity in a fan's ability to judge a game? Look at David Staples' player ratings as an example - there's very rarely a consensus when they're discussed. Not that long ago, there was a fellow on here arguing that Ales Hemsky is a perimeter player - something that's clearly untrue to a moderately competent observer. Mileage is always going to vary on personal observation, whereas statistics are universal. A 3 is a 3 is a 3, regardless of whether John Smith thought it was more of an 8. Statistics help serve as a common point of reference between fans - you can argue with me that, say, Ales Hemsky had a good or bad game, but we can both agree that he was a -2 last night.

Avatar
#12 Jonathan Willis
March 25 2009, 02:43PM
Trash it!
0
trashes
Cheers
0
cheers

Ender wrote:

I mean, hell, the ’sphere would get a lot further with the people you seem to be arguing against by posting youtube clips of plays and then backing it up further with stats.

The other day I put up all those videos of Ales Kotalik scoring from the point, and I still have people arguing with me that I'm out to lunch on him.

Besides which, scouting by youtube video is the worst possible kind of sample contamination; it's a fraction of the game, generally without context, and a tiny sample of a player's season. Even 5 or 6 clips fail to represent even a drop in the bucket.

Avatar
#13 Ender
March 25 2009, 02:46PM
Trash it!
0
trashes
Cheers
0
cheers

Jonathan Willis wrote:

Mike wrote: It’s a rough indicator of zone time, which the NHL does not track. Yes - if the NHL did track it, Corsi would be largely superfluous.

If you're using it as an indicator of zone time, and you have the stats to count shots and (presumably) the time that they're taken, why wouldn't you do something like this instead?

Data:

Player on team A takes a shot at 10:52 Player on team A takes a shot at 10:54 Player on team A takes a shot at 10:56 Player on team B takes a shot at 11:00 Player on team A takes a shot at 11:12

Method:

Time between player on team A takes a shot (or attempted shot) and player on team B takes a shot (or attempted shot). You eliminate the Detroit effect, and have a decent approximation of when teams are inside the zones. Set an arbitrary number for the neutral zone, in that if it's longer than 30 seconds between shot from A and shot from B, assume they were in the neutral zone for that time.

I mean, it's obviously not perfect, and the neutral zone number would need to be tweaked, but if you're talking about that vs Corsi, at first glance I'd expect it to be more realistic as far as zone possession goes. Plus it uses the same Corsi data, so it shouldn't be too hard to work out algorithmically.

I've said it before and I'll likely say it again, but at some point people need to decide what they want the stats to show, and look to see if there's a better way of showing that. As it sits, I look at the "advanced stats" bandied about and while I can figure out what they're trying to get across most of the time, I rarely think they actually are the best method for what the writer is trying to measure.

Avatar
#14 sittingatmydesk
March 25 2009, 02:50PM
Trash it!
0
trashes
Cheers
0
cheers

WTF

Avatar
#15 Ender
March 25 2009, 02:51PM
Trash it!
0
trashes
Cheers
0
cheers

Jonathan Willis wrote:

Ender wrote: I mean, hell, the ’sphere would get a lot further with the people you seem to be arguing against by posting youtube clips of plays and then backing it up further with stats. The other day I put up all those videos of Ales Kotalik scoring from the point, and I still have people arguing with me that I’m out to lunch on him. Besides which, scouting by youtube video is the worst possible kind of sample contamination; it’s a fraction of the game, generally without context, and a tiny sample of a player’s season. Even 5 or 6 clips fail to represent even a drop in the bucket.

I'm talking about illustrating an example here. Obviously you can't show a whole game, but I think more people have an easier time getting behind Coach's Cornerish views of certain plays and how players fucked up on certain plays than EVPTS/60. Like you said, you can make a really bad assumption based on youtube vids or highlights. I'd just argue that it's because on those youtube vids, people are intentionally stripping away the context. I'd also argue that it's not better or worse than most of the math that goes on around here as far as presenting something as holistic.

Avatar
#16 Jonathan Willis
March 25 2009, 02:55PM
Trash it!
0
trashes
Cheers
0
cheers

@ Ender:

Of course if you were going to use that method (which I think is good) you could expand it from just shots to include hits, faceoffs and all the other events that the NHL play-by-play sheets keep track of.

I'm well out of my depth on the programming end of things though, so I use what's available.

On the other hand, I do have real doubts that there would be much difference in terms of correlation between Corsi and zone time.

Avatar
#17 Ender
March 25 2009, 03:00PM
Trash it!
0
trashes
Cheers
0
cheers

Jonathan Willis wrote:

On the other hand, I do have real doubts that there would be much difference in terms of correlation between Corsi and zone time.

Really? I mean, already given the obvious example of Detroit? There are a bunch of teams who shoot a lot, and a bunch of teams who don't. While it might even out on a specific roster, it certainly wouldn't even out over a season, especially with the varying amount of games between divisions and conferences.

To be honest, given that you're a big proponent of math as applied to hockey in general, I'm really surprised by that statement, as it's effectively saying that it doesn't matter who you play or when, or whether another team plays them as much as you do - it'll somehow even out.

Avatar
#18 Jonathan Willis
March 25 2009, 03:00PM
Trash it!
0
trashes
Cheers
0
cheers

Ender wrote:

I’m talking about illustrating an example here. Obviously you can’t show a whole game, but I think more people have an easier time getting behind Coach’s Cornerish views of certain plays and how players fucked up on certain plays than EVPTS/60.

Obviously - but then, statistics aren't meant to give you a narrow range like that. They aren't terribly useful in small samples, only in the aggregate.

Play-by-play illustrations like you describe aren't terribly useful either outside the context of specific games, IMO, or for coaching specific players. To use the statistic you sited, EVPTS/60 probably gives you a better view of a player's offensive acumen than a play-by-play view because it takes ice-time into account. If you toss in QualComp, QualTeam and ZoneShift, you have a good idea of the context surrounding those results too.

That's not something you can get from youtube clips; I really don't even see how little snippets of video and Corsi compare - they're used for completely different purposes, and only one of the two gives you a rounded idea about the overall game of the player.

Avatar
#19 Jonathan Willis
March 25 2009, 03:02PM
Trash it!
0
trashes
Cheers
0
cheers

Ender wrote:

While it might even out on a specific roster

I'm sorry - I should have been clearer. I did mean the correlation on a specific roster, not league-wide.

Avatar
#20 Ender
March 25 2009, 03:04PM
Trash it!
0
trashes
Cheers
0
cheers

Jonathan Willis wrote:

I really don’t even see how little snippets of video and Corsi compare - they’re used for completely different purposes, and only one of the two gives you a rounded idea about the overall game of the player.

And unsurprisingly, I'd argue that neither gives you a rounded idea about the overall game of the player.

And really, without implementing some sort of control (as in control group) how can you even say "rounded"?

Avatar
#21 Ender
March 25 2009, 03:10PM
Trash it!
0
trashes
Cheers
0
cheers

Jonathan Willis wrote:

I’m sorry - I should have been clearer. I did mean the correlation on a specific roster, not league-wide.

Ok, now I'm lost. Are you using Corsi as a zone indicator, or a player on a roster indicator? While not mutually exclusive, you're still banking on all teams being of similar quality - or if not that players never get sick, tired, or injured for multiple games that would skew the stat. Let's say I'm on Montreal and dressed and played in every game vs the Southeast, but benched, sick, or injured for every game against BOS, NJD. The relevancy of the stat decreases drastically.

In short, I don't think Corsi is well tailored to either scenario, whether it's zones or a player's performance on their team.

Avatar
#22 Mike
March 25 2009, 03:12PM
Trash it!
0
trashes
Cheers
0
cheers

Ender wrote:

I mean, it’s obviously not perfect, and the neutral zone number would need to be tweaked, but if you’re talking about that vs Corsi, at first glance I’d expect it to be more realistic as far as zone possession goes. Plus it uses the same Corsi data, so it shouldn’t be too hard to work out algorithmically.

Or the NHL could get off their asses and start a program of RFID pucks. Sensors at the bluelines and both goal lines would allow both zone time and a better solution for goal review.

Avatar
#23 Mike
March 25 2009, 03:16PM
Trash it!
0
trashes
Cheers
0
cheers

Ender wrote:

Let’s say I’m on Montreal and dressed and played in every game vs the Southeast, but benched, sick, or injured for every game against BOS, NJD. The relevancy of the stat decreases drastically. In short, I don’t think Corsi is well tailored to either scenario, whether it’s zones or a player’s performance on their team.

Yeah, but even measuring actual zone time would show the same thing - unusually awesome numbers from playing against the SE instead of the Bostons and NJs.

Hell, even measuring the most basic of counting stats doesn't mean anything if you are lucky enough to only dress for the 8-2 wins and pressboxed for the 9-2 losses.

Avatar
#24 Kent W
March 25 2009, 03:16PM
Trash it!
0
trashes
Cheers
0
cheers

Ok, now I’m lost.

Truer words have never been spoken.

Avatar
#25 Ender
March 25 2009, 03:28PM
Trash it!
0
trashes
Cheers
0
cheers

Kent W wrote:

Ok, now I’m lost. Truer words have never been spoken.

And I think that's my cue to leave.

Avatar
#26 Matt
March 25 2009, 03:38PM
Trash it!
0
trashes
Cheers
0
cheers

Ender @ 5 is right, to the best of my recollection. Vic Ferrari fleshed the thing out net-wise, and glossed it as the Corsi number because he had first heard the idea discussed specifically by Mr. Corsi in a radio interview.

Other than that, he's off the mark I think... he's made this same argument before, and I just don't follow. Do some people overstate the Corsi #'s worth on occasion? Sure [shrug]. But it has predictive value, which is why it's interesting. It's better than past outscoring at predicting future outscoring. That's valuable, no? It suggests that the on-ice elements underlying a good Corsi # are important factors in being good at hockey, no?

As for the Zone Time vs. Corsi issue, I agree that they substantially go hand-in-hand, but I'd guess that Corsi is generally better when the two diverge. For every guy who's "shot happy", there's 2 (or 3 or 5) for whom the strong relative Corsi would reflect a genuine ability to create shots and scoring chances within the zone, aka the ability to make good hockey plays.

Avatar
#27 Ender
March 25 2009, 04:08PM
Trash it!
0
trashes
Cheers
0
cheers

But it has predictive value, which is why it’s interesting. It’s better than past outscoring at predicting future outscoring. That’s valuable, no?

Yes.

It suggests that the on-ice elements underlying a good Corsi # are important factors in being good at hockey, no?

Possibly. It "suggests," but it does not prove. It's possible that other work has been done to check on that, and I'm not aware of it.

As for the Zone Time vs. Corsi issue, I agree that they substantially go hand-in-hand, but I’d guess that Corsi is generally better when the two diverge. For every guy who’s “shot happy”, there’s 2 (or 3 or 5) for whom the strong relative Corsi would reflect a genuine ability to create shots and scoring chances within the zone, aka the ability to make good hockey plays.

Maybe. That said, there's a lot of "I'd guess" going on. One would expect that the Corsi for a so-called "one shot scorer" would be lower than an average player, as well. One would expect that it would be slightly different for each *type* of player.

Look. I seem to get misunderstood a fair bit around these parts so let's see if I can clear this up. I don't think the pursuit of new stats is a bad thing. I even think it's a good thing. I don't, however, think Corsi shows what it's being used to show. In that case, it would make sense to have something that shows what you're intending it to. With Zone Time vs Corsi, there's a good chance you'd see that Detroit has much less possession than its Corsi would imply. If it's that way with one team, how many others? Maybe Corsi works well for some players or even most players as an indicator for how objectively good they are. However, if you ignore the outliers you can't make your work better. If your pursuit is to use the stats you have and try to show something with them, fine. Go ahead. You're not going to sell me on it, and you're not likely to sell the journalists and "less informed" fans (which people seem to strongly look at as idiots because they disagree) on it either.

However, if you took what you have, take a look at its limitations and faults, with your brains which seem to be superior to "other fans" I'm pretty sure you could come up with better stats. Stats that do show exactly the thing you want them to. You can't argue with things that work in all cases, but you really can argue easily with any stat that ignores Detroit or treats the team as an outlier.

That doesn't make your stats useless. It doesn't make them not valuable. It is just saying that they don't show what you're using them to show. You can accept that and move on (and stop bitching about how stupid people don't agree with you, or are lost) or you can make something that does show what you want.

If you'll notice, as a general rule, I only post this type of stuff when people are implying that people are idiots or uninformed if they disagree. When people post their normal stuff I don't say a word.

Avatar
#28 Jonathan Willis
March 25 2009, 04:10PM
Trash it!
0
trashes
Cheers
0
cheers

Ender wrote:

Let’s say I’m on Montreal and dressed and played in every game vs the Southeast, but benched, sick, or injured for every game against BOS, NJD. The relevancy of the stat decreases drastically.

Arguing that a statistic needs context isn't an excuse to dismiss the statistic; it's a reason to examine the context - and since Vic Ferrari's Time On Ice tool allows that (game-by-game), I don't see that argument as having much merit.

Avatar
#29 baggedmilk
March 25 2009, 04:19PM
Trash it!
0
trashes
Cheers
0
cheers

# of words understood in this article? 6

Just tossing that out there, that dude looks like Eugene Levy. Just saying.

Avatar
#30 Jonathan Willis
March 25 2009, 04:26PM
Trash it!
0
trashes
Cheers
0
cheers

Ender wrote:

If you’ll notice, as a general rule, I only post this type of stuff when people are implying that people are idiots or uninformed if they disagree. When people post their normal stuff I don’t say a word.

People aren't idiots or uninformed because they disagree, Ender. People are either idiots or uninformed when they wander by with their "have you even ever watched the game" comments, which I get a lot of and I'm done trying to reason with.

For example, the fellow who wandered by here yesterday and said "I'll bet you've never even played a game of hockey" fits into that LCD class; a class I've never associated you with but if you feel like self-identifying with them go right ahead.

I've been patient with you every time you've raised arguments (as far as I can recall, feel free to correct me), because I have no issues discussing things rationally. Also, as far as I can tell, the closest anyone got to mocking you was Kent's rather mild jibe at your statement that you were lost; I'd suggest a thicker skin on that one.

Avatar
#31 Jonathan Willis
March 25 2009, 04:32PM
Trash it!
0
trashes
Cheers
0
cheers

Ender wrote:

However, if you ignore the outliers you can’t make your work better. If your pursuit is to use the stats you have and try to show something with them, fine. Go ahead.

Seems to me that you're making the argument that if a statistic can't do everything, it shouldn't do anything. The way I see it, first you knock down the majority, than you try and move on to outliers, making a curve that's increasingly broad. You don't scrap the process because on your first run you were only right 80% of the time.

As for progress, I think things are coming along rather nicely - personally, I'm trying to work out the math to adapt QualComp, QualTeam and ZoneShift into the same equation as Corsi to give us something else entirely. And when it comes to developing these things, I'm a pretty small fish in a very large pond.

It will take years to definitively prove the predictive power of Corsi (and even then it's never going to be 100% effective) but it's one of the most promising things I've seen on the statistical front; I see no reason to abandon it just because it isn't perfect at this instant in time, and I see no reason not to offer up working hypothesis as they come along.

Avatar
#32 Jonathan Willis
March 25 2009, 04:35PM
Trash it!
0
trashes
Cheers
0
cheers

Ender wrote:

You can’t argue with things that work in all cases, but you really can argue easily with any stat that ignores Detroit or treats the team as an outlier.

Corsi hardly ignores Detroit; I'd argue rather strongly that it compares Detroit Player X to Detroit Player Y very well at this point. It hasn't reached the level where we can take Detroit Player X and compare him straight across to Edmonton Player X, but it probably won't be that difficult to create an adjustment for a player's strength relative to team that allows us to compare from team to team - similar to the weighted +/- used by Desjardins. In fact, a weighted Corsi is the best way to go , IMO, and it's something I'd like to take a crack at this summer.

Avatar
#33 Ender
March 25 2009, 04:46PM
Trash it!
0
trashes
Cheers
0
cheers

Alas, I'm apparently destined to be misunderstood perpetually, so I'll comment briefly on Jon's last comment and then I'm out, cuz I waste too much time as it is to make myself understood.

People aren’t idiots or uninformed because they disagree, Ender. People are either idiots or uninformed when they wander by with their “have you even ever watched the game” comments, which I get a lot of and I’m done trying to reason with.

A) You might get less of that if the stats were specific enough to back up what you're saying and B) It might be of some use to look at the reasons why they're saying that and see if it's something that should be taken into consideration.

It's possible that A would give the same results, and it's possible that you already do B. I'm just saying.

but if you feel like self-identifying with them go right ahead.

I'm not self-identifying with anyone. I'm giving you reasons as to why they might be acting the way they are. Devils advocate and all that.

I’ve been patient with you every time you’ve raised arguments (as far as I can recall, feel free to correct me), because I have no issues discussing things rationally.

While I think the use of the word "patient" here is pretty derogatory and implying that you're right and I'm just not seeing it, I also have no issues discussing things rationally. As it is, discussions between you and I are pretty reasonable as far as I can tell - and I'd expect and hope we both take something from it. From the word use above, I don't know if that's the case, but it's a moot point.

Also, as far as I can tell, the closest anyone got to mocking you was Kent’s rather mild jibe at your statement that you were lost; I’d suggest a thicker skin on that one.

Did I say anything about people mocking me here? More that anything, I sighed when I saw Kent's comment because I thought things were pretty reasonable up to that point. Wasn't a big deal, just an example of a larger problem in the 'sphere. It's possible I'm more sensitive to them than others because I do tend to argue unpopular views and as such have gotten the brunt of a lot of unnecessary barbs, but hey, that's the internet. Do I wish people would take their actions on the net as responsibly as they do real life? Sure. Not likely to happen any time soon. It's just that this topic of conversation comes up pretty regularly and a lot of people take and give a lot of grief over something that could possibly just be eliminated. I promise, I'm trying to help here.

Avatar
#34 Jonathan Willis
March 25 2009, 04:54PM
Trash it!
0
trashes
Cheers
0
cheers

Ender wrote:

While I think the use of the word “patient” here is pretty derogatory and implying that you’re right and I’m just not seeing it, I also have no issues discussing things rationally. As it is, discussions between you and I are pretty reasonable as far as I can tell - and I’d expect and hope we both take something from it. From the word use above, I don’t know if that’s the case, but it’s a moot point.

Why, of course I'm right, Ender ;)

Patience actually refers to going through trials with an even-temper, and whatever else you say you can't deny that you are putting these statistics (and thereby their adherents) through tests (i.e. trials). In other words, I'm discussing your objections with an even temper - being patient while the work is examined.

Avatar
#35 Greg MC
March 25 2009, 05:41PM
Trash it!
0
trashes
Cheers
0
cheers

Mike wrote:

Or the NHL could get off their asses and start a program of RFID pucks. Sensors at the bluelines and both goal lines would allow both zone time and a better solution for goal review.

Excellent idea, Mike. Why not also RFID the players? With some sophisticated software you have extremely accurate data not only on the individuals themselves, but how they perform with other players, against other players, how individual players perform as the game proceeds. You could build books on coaches based on the data, etc.

Avatar
#36 Chris
March 25 2009, 06:16PM
Trash it!
0
trashes
Cheers
0
cheers

Greg MC wrote:

Excellent idea, Mike. Why not also RFID the players?

Wait!!! We can feed all the RFID data from each individual player into a SUPER COMPUTER that will allocate salary based on performance alone... Then when a team like Detroit starts winning too much they will come up against the salary cap... and have players confiscated and sent to Edmonton (A Team that had previously financially rewarded mediocrity)

It's perfect! It's fair! All teams will be tied in the standings! The Oilers may finally get HOSSA!

Avatar
#37 Ender
March 25 2009, 06:39PM
Trash it!
0
trashes
Cheers
0
cheers

Jonathan Willis wrote:

Patience actually refers to going through trials with an even-temper, and whatever else you say you can’t deny that you are putting these statistics (and thereby their adherents) through tests (i.e. trials). In other words, I’m discussing your objections with an even temper - being patient while the work is examined.

K. Just came across poorly, but that's fair. I'll try this one more time in bullets.

*Statistical analysis is a good thing.

*Corsi is too general and abstract a stat to be used for specific things.

*Since Corsi is too general and abstract, I propose new stats or even modifications of Corsi to be able to say specific things about specific players and specific teams.

*Defining what exactly you want to quantify will aid in figuring out the modifications on old stats, or even what new stats are necessary to say a specific thing.

*Defining what the outliers are and why they outlie will aid in figuring out the modifications on old stats, or even what new stats are necessary to say a specific thing.

*Wishing/Complaining about the NHL to give us new info (like zone times) is a waste because they don't care about us - likewise with Bettman points. Figuring out how to get what you want out of the data you have (as approximation) should be the #1 concern of the statzis.

*Forget stick and carrot. Making new stat methods by combining old ones is a good thing. There is a large community online, and having a public document like at http://www.mindmeister.com/ or even google docs and creating flowcharts as to how things work can put the information in hands of the people who code so they can just code, and the thought in the people who think, so they can just think. It's the internet people. Community + Work should = Public collaboration.

And lastly:

*For your audience's sake be clear about what you're arguing. I'm not asking for a re-definition of terms every time you use Corsi or EVPTS/60, but there have been too many posts about Penner this year making hemsky better. Corsi and various PTS/60 have been trotted out. However, the two have played on the same line for less than half the season, have played with and without each other against varying levels of opposition, with different defensemen behind them. At *best* you can say that when Penner plays with Hemsky things seem to go in the right direction, but you can't make the argument that Penner drives the bus without looking at the underlying factors. Combine those two with and without Visnovski. Any difference? List what games the two played together in, and how much of their icetime was actually together, etc.

I know I'm generally left wondering what the hell people's point is when they throw out those numbers while knowing what the numbers mean and how math works at a decently high level. Do you think average Joe will understand completely at first glance and get behind you?

Now, some of those points are regarding the initial article and some are aimed at the community as a whole. Jonathan has been the most reasonable person I've discussed any of this with, and as such it goes here. More than anything else, I don't get why people get all annoyed when people don't see their point when they don't make a clear/logical point, but I'm sure people will be more than willing to say the same about me, and I won't argue with it.

Avatar
#38 Fiveandagame
March 25 2009, 06:40PM
Trash it!
0
trashes
Cheers
0
cheers

@ Jonathan Willis: "In point of fact, no defenseman on the team is deployed more in the offensive zone (relative to ice-time) than Jason Strudwick –"

That is a scary stat.........

Avatar
#39 Jason Gregor
March 25 2009, 08:38PM
Trash it!
0
trashes
Cheers
0
cheers

Jonathan Willis wrote:

Statistics help serve as a common point of reference between fans - you can argue with me that, say, Ales Hemsky had a good or bad game, but we can both agree that he was a -2 last night.

He could be -2 and have done nothing wrong on either play, so what does that stat prove exactly? Stats have their value for sure, but to take them at face value all the time is where a defence can break down.

You mention Strudwick starts often in the offensive zone off of a faceoff, I'm guessing here. Since he can't start a shift on the fly in the offensive zone.

He normally plays with 3rd and 4th line guys, and those centres don't win draws. The other team gets possession and comes into the Oilers zone. Does this make Strudwick a bad player if then they are hemmed in for 30 seconds?

I don't really understand how someone can evaluate how good a defenceman is due to which end of the ice he spends his shift in. That is where one would have to watch the game and see what decisions HE is making in each respective zone. It might give you a base, but it doesn't give you the answer by itself.

Avatar
#40 David Staples
March 25 2009, 09:19PM
Trash it!
0
trashes
Cheers
0
cheers

* @Matt Does Corsi predict future goals scored better than past goals scored? I've been looking for that work and would appreciate a link, as, in my mind, it's the missing link?

Other than that, after some consideration, I buy the basic premise that there's a relationship between Corsi numbers and territorial domination.

@Ender. A better stat? Here is where I completely agree with you. I'm somewhat leery of all team stats, such as Corsi (some trust), traditional plus/minus (zero trust) and Dennis King's numbers (some trust), as they don't tell us who is driving a play, for good or ill, just that a player is on the ice when something happens.

My suggestion for a useful stat -- individual tracking of all major scoring chances (Grade A and Grade B), broken down so that we know the one, two or three major contributors, for and against, on each scoring chance.

With PVRs, modern fans can fairly easily do this work as the game goes along. Dennis King has done excellent work in this regard, tracking the scoring chances, but it just needs a bit of refinement.

Me, I'd love to know which NHL players led the league in being a major contributor in Grade 'A' scoring chances, and which players were at fault for most Grade 'A' scoring chances against. I think a plus/minus on this would be useful, and would create a better Qualcomp number.

The issue with this is subjectivity. What is a major scoring chance as opposed to a minor one? How can you tell exactly which players are involved?

If you try this exercise, can various experts watch the same game and come up with the same results, generally.

By setting a series of agreed upon standards for defining a scoring chances and defining a players involvement in the scoring chance, I think these issues can be solved, and you should be able to replicate ratings. Essentially, one expert fan would come out generally with the same ratings as the other expert fan.

So the exercise would be useful and the information would be gold for NHL fans.

We'd just be counting up the good plays here (and the bad ones) and counting up the keys who made those good players (and screwed up on the bad ones).

Avatar
#41 David Staples
March 25 2009, 09:26PM
Trash it!
0
trashes
Cheers
0
cheers

And, of course, in that last line, I meant "guys" not "keys."

Anyway, Jonathan, you wrote a strong article here, and Ender has raised some good points in reply. This debate has been heated, at times, but I've come to realize there's a lot of baggage around this issue, a lot of wars fought at other times on other blogs and discussions boards, so raising it can be like poking an old wound for many folks.

It's going to take some time to find the best statistics, though I'm sure we can all agree that when we hear how many "goals" a player has, that's not the entire story, so more precise, descriptive and possibly predictive stats are useful.

Avatar
#42 Jonathan Willis
March 25 2009, 09:50PM
Trash it!
0
trashes
Cheers
0
cheers

Jason Gregor wrote:

He normally plays with 3rd and 4th line guys, and those centres don’t win draws. The other team gets possession and comes into the Oilers zone. Does this make Strudwick a bad player if then they are hemmed in for 30 seconds? I don’t really understand how someone can evaluate how good a defenceman is due to which end of the ice he spends his shift in. That is where one would have to watch the game and see what decisions HE is making in each respective zone. It might give you a base, but it doesn’t give you the answer by itself.

Brodziak's the second-best faceoff guy on the team, and last I checked he's still on the right side of 50%.

Jason Strudwick has spent over 99 minutes of even-strength ice-time together with 11 different forwards on the team, so we've seen him spend a good amount of time with plenty of different players (he averages 11 minutes at evens per night, so this is the equivalent of at least 9 complete games played with each line on the team). Given that most of these players have a positive Corsi, and Strudwick is mired deep, deep in the red, who is the constant?

Either Strudwick's just getting horribly unlucky with every possible combination of forwards, or he's the problem.

As for why which zone they're in matters, I don't care how good a player you are - you can't win from your own end.

Avatar
#43 Ender
March 25 2009, 09:54PM
Trash it!
0
trashes
Cheers
0
cheers

As for why which zone they’re in matters, I don’t care how good a player you are - you can’t win from your own end.

Unless you're Rob Davidson

http://www.youtube.com/watch?v=meFICJYORvA

Avatar
#44 David S
March 25 2009, 11:46PM
Trash it!
0
trashes
Cheers
0
cheers

I've seen Ender take it on the chin more often than not for simply challenging how stats are used, and the people who are more than a little dismissive of others who don't buy into their point of view. In fairness, Jonathan at least tries to have an open mind, whereas alot of the stats guys are far from objective in those discussions.

From what I've seen, alot of the guys who espouse stats as a predictive mechanism use baseball as a fallback comparative. Problem is, baseball is a fairly static sport. There's only so many options available for each type of play, and they are well categorized. Hockey on the other hand is a dynamic game, with a much larger and 3-dimensional quality that can have almost unlimited options for each event. For example, a good defenceman has to weigh the alternatives every time he touches the puck, and the consequences of every possible play he makes.

The other problem I've seen is that "outliers" are minimized by the large sample statistic, when in fact hockey as a dynamic event is full of them. I found this out last week, where the guys at MC79 simply could not accept that psychological factors (such as playing loose) would have anything to do with explaining small sample performance (such as last year's end-season run). Their reasoning came down to the assertion that those outliers are to be chalked up as "random events" - luck if you have it. Of course, that's the result of outliers being removed by the large sample effect because the statistics they employ break down in small samples.

The problem I have is that rather than admitting that stats simply don't work in hockey as much as they think they do, some guys will come right out and say others are wrong because the explanations they pose don't fit in their mathematical constructs. Sorry man, that's not cool.

At the end of the day, hobbiests try to fit events in nature to the math they know. While true mathemeticians try to invent math that explains the natural event. I'd suggest that there are more than a few hobbiests out there who fancy themselves mathemeticians. Ender's "physics vs. mathematics" metaphor was quite well posed too.

Avatar
#45 Phil
March 26 2009, 12:04AM
Trash it!
0
trashes
Cheers
0
cheers

STATS WARZ...

YEEAAAARRRRRRRGGGHHHHHHHHH.

Avatar
#46 Jonathan Willis
March 26 2009, 12:14AM
Trash it!
0
trashes
Cheers
0
cheers

David S wrote:

I found this out last week, where the guys at MC79 simply could not accept that psychological factors (such as playing loose) would have anything to do with explaining small sample performance (such as last year’s end-season run). Their reasoning came down to the assertion that those outliers are to be chalked up as “random events” - luck if you have it.

I read that entire debate too, and I think you're interpretation of the discussion is just a little bit slanted - "luck" is the term used, but really all we're talking about is variance.

It's a bell curve, and every so often stuff is going to fall to either end of it; it's simple probability, and improbable runs tend to prove rather than disprove the math used. I think people tend to get their backs up at the word luck.

Avatar
#47 David S
March 26 2009, 12:28AM
Trash it!
0
trashes
Cheers
0
cheers

Jonathan Willis wrote:

It’s a bell curve, and every so often stuff is going to fall to either end of it; it’s simple probability, and improbable runs tend to prove rather than disprove the math used. I think people tend to get their backs up at the word luck.

And that's the problem. Looked at from the perspective of the "bell curve" or the large sample, the anomalous event is minimized precisely because it's an outlier. The cool stuff happens at the ends of the bell curve. Why not try to explain it instead of writing it off? Or at least admit other things might be going on besides "luck". No matter what you call it, be it random event or luck, it's an act of over simplification to dismiss a small sample event in sport because you can't explain it within your current sphere of understanding.

Avatar
#48 Ender
March 26 2009, 12:57AM
Trash it!
0
trashes
Cheers
0
cheers

@David S & Jonathan

I was actually going to post in that thread but decided against it. My wife wrote her PhD thesis in Neuroscience on memory and stress responses. There is no luck here; there isn't even "anomaly." An optimal amount of stress exists for each person (and it's different from person to person) to perform at their peak. It's hard science. Tons of journal articles have been written on the subject. We're as sure of it as we're sure that gravity exists.

Now if you want to build in a "Consistency Modifier" to measure how many points per game a player gets, weighted against quality of opposition and goaltending (and adjusted for quality of own team), I'm with ya. That might make the Corsi mean something on a player to player basis. In my head, the trick isn't to throw out what you guys have been doing, but rather figure out why the outliers are outliers, and try to build that into the model. Ignoring it can only hurt the process.

Avatar
#49 Jonathan Willis
March 26 2009, 08:31AM
Trash it!
0
trashes
Cheers
0
cheers

@ David S: @ Ender:

Now, I'm a couple of years removed from my university courses, but IIRC while outliers exist and generally have valid reasons for them you normally project things to the centre of the curve rather than the extremities.

Given conditions A, B, and C, while landing in the top 5% or lower 5% is possible, you're much more likely to be grouped around the centre, so when you're using it for extrapolation most sciences do cut off the fringes - but this isn't because they want to ignore them, they just accept that 90% probability (or whatever exact % they can maximize it to) is what's reasonably possible to project. Than once you have a model you can try and tweak it to make it as expansive as possible, but you start by ensuring it works for the centre of the spectrum before you move to the outsides.

Avatar
#50 Jason Gregor
March 26 2009, 08:44AM
Trash it!
0
trashes
Cheers
0
cheers

Jonathan Willis wrote:

As for why which zone they’re in matters, I don’t care how good a player you are - you can’t win from your own end.

But you still never answered how a defencemen can control the puck going from the offensive zone back into the defensive zone. Regardless of who the D-man is, I'm curious how you see him having an impact on that play?

Comments are closed for this article.