Jim Corsi and His Statistic

Jonathan Willis
March 25 2009 12:42PM

corsi-and-miller

Jim Corsi, who spent his entire NHL career with the Edmonton Oilers, is a rather interesting guy. He took an unusual road to the NHL, through Canadian university hockey. He spent three years with the Canadian Olympic soccer team during that time as well, before leaving soccer to focus solely on hockey.

Corsi played two seasons with the Quebec Nordiques before the World Hockey Association folded at the end of 1978-79. The next year, Corsi went 8-14-3 with a 3.65 GAA for the 1979-80 Edmonton Oilers, a team that featured 18-year olds Mark Messier and Wayne Gretzky, along with a 20-year old Kevin Lowe. They lost in the first round that season, although Corsi had already moved on, being dealt to Minnesota for future considerations. He spent the next decade in Italy and represented them at the world championships eight different times.

Corsi holds a graduate degree in engineering, and speaks four different languages. He’s been the goaltending coach for the Buffalo Sabres since the 1997-98 season, and during his tenure the Sabres have been a reliable producer of NHL goaltenders (Ryan Miller, Martin Biron, Mika Noronen).

Keith Loria of NHL.com interviewed several NHL goalie coaches, including Corsi, for a January 29th piece, and he talked a little bit about pregame preparation and also what he does with the Sabres’ minor-league goaltenders:

On the road, you give your goalie an understanding of the surroundings, how the boards work, how the glass works, video and stats of the opposing players. I have to coordinate with the coaching staff and give a general idea on how the opposition prepares itself in the offensive side of the game and power plays."

"One of my duties is to work with our Portland team and we have various technical ways of following our goaltenders and draft picks playing in college. We'll look at videos and watch tape. We stay in contact with them as best we can. It's not always easy because you can see how the player is playing in a game but you don't always know the emotional part, so you try to be hands on with all of them."

I thought there were some interesting points in those quotes. Jim Corsi is best known for the statistic that bears his name, the Corsi number; and as a result he’s frequently mislabeled as someone who only uses numbers in his work. Such a label is obviously wrong: the statistics don’t tell you how pucks bounce off the glass or the boards. He also talks about the difficulties of managing players spread out across the country, saying that it’s difficult to coach when “you don’t always know the emotional part”.

Yet, from his quote we can also get an idea of the technical difficulties of following just the goaltenders already drafted by an organization. There are hundreds of players in the NHL, thousands more playing in Europe, the AHL and the ECHL. Then there are the players in various junior leagues around North America, and going the college route; even with a large scouting staff, it is difficult to keep tabs on everyone. The Sabres have resorted to doing the vast majority of their scouting via video, a decision that may end up making some sense – is it better to see a player a few times live or many times on video?

This is one of the real advantages of statistics over watching a player live: sample size. The short-hand for this is what I call the Brian Boucher effect. Boucher is an NHL journeyman; he’s currently the backup goaltender for the San Jose Sharks. He’s had some ups and downs in his career, occasionally taking over the starting job and sometimes getting stashed in the minors.

On December 31, 2003, Boucher posted a shutout over the Los Angeles Kings. He posted another in his next game against the Dallas Stars. Over the next week, the Coyotes would beat the Carolina Hurricanes, Washington Capitals, and Minnesota Wild, with Boucher posting shutouts in all five games.

Now imagine, for a moment, that Boucher were a little used junior goaltender, and the scout watching him caught just that five game segment. Without statistical context, what assumption would likely be made?

That is the first difficulty. The second lies in trying to gauge all 18 skaters playing in a single game. It’s a difficulty that Gare Joyce ran into when he tagged along with Columbus scouts for his book Future Greats and Heartbreaks. Joyce attempted to do scouting reports on all of the players, and found his notes a confused and nearly useless mess. A long-time scout advised him to focus on just one player for an entire shift, following him exclusively, because that’s how he isolated players for his reports. Joyce did so, with superior results.

I’ve used that technique myself. I’m hardly a scout; I imagine that there are so many things picked up over the course of a lifetime in the game that a fan like me doesn’t start to comprehend. Still, by choosing one player and following him, you can get an excellent idea of his quality. On the other hand, such focus comes at a price; you miss much of what else is happening on the ice.

The combination of these two problems is where statistics become useful. Of course, they’re limited by some of the same problems: let’s say Player X spends 90% of his ice-time with offensive stars, and 10% with the rest of the team. In the 10% segment, his line scores 3 goals and allows 3. In the 90% segment, he puts up very good numbers playing with good players. But is he a good player, or is he being carried by his line-mates? It’s very difficult to tell, based on the numbers.

This is where the Corsi number comes in handy. It’s not a common statistic, so I’ll give a brief explanation. The NHL tracks shots on net, missed shots and blocked shots. The Corsi number is the total of all shots at net (incl. misses and blocks) for and against while a player is on the ice. To make it more accurate, this is often measured only at even-strength. Because the vast majority of shots come from the offensive zone, this statistic is a fairly good measure of who is spending a lot of time in the offensive zone, and who is getting stuck in the defensive zone.

Suddenly, that 10% segment is much bigger. Instead of 3 goals for and 3 goals against, we can expand it to 21 shots for, and 40 shots against. If we toss in missed shots and blocked shots, we could be looking at a Corsi of +60/-100, which tells us that our Player X was likely spending too much time in the wrong end of the rink. Over the course of an entire season, a first-line player is on ice for about 2000 shot attempts in one direction or the other, and even a fourth-line player generally sees around 500. Thus, this statistic gives us a big-picture view of which direction the play is going when any given player is on the ice.

This isn’t to advocate turning control of a hockey team over to a shot-counting computer (although it probably wouldn’t do any worse than Doug MacLean or Mike Milbury). Jim Corsi, who created the statistic, talks about the importance of the emotional state of a player, and different visual variables like how pucks react along the boards of a given arena, and the same holds true for scouting. As one example, the numbers tell us that Ales Kotalik is an elite powerplay performer; but only by watching the games or video do we realize that he’s scoring goals by playing the left point. How a player gets his points is certainly a major consideration when acquiring him, and it’s only one of the things that the statistics don’t show well.

On the other hand, what the statistics do show is overall effectiveness. They don’t do a good job of show the process, but they do an excellent job of showing the results. Sticking to the Kotalik example, by using various advanced statistics we know that at even-strength he’s been playing a third or fourth line role for years (and starting in the offensive zone more often than not), and that he really hasn’t produced much offensively relative to his ice-time 5-on-5. This is something that Oilers’ management either didn’t pick up on or chose to ignore; they put Kotalik on the top line despite a proven track record of not being a difference maker 5-on-5, and the results were predictable.

The other thing statistics do is catch stuff that has been missed, much in the way video does. Don Cherry was coaching when Roger Neilson started the push towards video, and he said something to the effect that he saw the game well enough from behind the bench and didn’t need to review tape. Now, every team in the league does - because coaches are human, and have human limitations. The scope of the game is too broad to catch everything in one go. As one quick example – is the Strudwick pairing deployed more frequently in the offensive or defensive zone? Without the numbers, I’d say defensive; Strudwick doesn’t put up points and has a solid reputation as a tough, physical, stay-at-home guy. In point of fact, no defenseman on the team is deployed more in the offensive zone (relative to ice-time) than Jason Strudwick – and that’s an important thing to know when evaluating his performance this season.

In short, statistics are a useful tool for talent evaluation; they don’t predict the future, but they show where and how a player has been used, and what his results have been in different circumstances – against different players, with different players, in different zones, with lots of icetime, with little icetime. They do it for the entire league, providing a broad picture for every player to spend significant time in the NHL. They can tell you who to watch and what to watch for; if a player is putting up negative results, video can be examined for the reason. They aren’t meant to replace visual observation; they’re meant to augment it, focus it and refine it.

As a final point, statistical analysis is frequently mocked by journalists and other fans as something that bloggers came up with and use because they don’t have access to the team, or as something that nobody who ever played the game would use. In reality, statistics have been developed by experienced NHL personnel, and are used at that level. We’re just trying to catch up to the things done by people like Jim Corsi, Ron Wilson, Roger Neilson, and the like – observing and copying a trend, not starting one.

74b7cedc5d8bfbe88cf071309e98d2c3
Jonathan Willis is a freelance writer. He currently works for Oilers Nation, the Edmonton Journal and Bleacher Report. He's co-written three books and worked for myriad websites, including Grantland, ESPN, The Score, and Hockey Prospectus. He was previously the founder and managing editor of Copper & Blue.
Avatar
#51 Jonathan Willis
March 26 2009, 08:55AM
Trash it!
0
trashes
Cheers
0
cheers

@ Jason Gregor:

On that particular play it's unlikely that he would have much of an impact; it as, after all a 5-skater game, which makes it more difficult to separate players from each other statistically.

Fortunately for us, most defensemen have a good rotation fo forwards - sticking with Strudwick, he's spent the most time with Brodziak, but it's still less than 200 minutes. So there's three guys around 100 minutes w/ Strudwick, and three guys with around 180-190 minutes, and the rest fall in the middle. When you consider the opposition each line faces it's a pretty fair curve.

Avatar
#52 Ender
March 26 2009, 09:27AM
Trash it!
0
trashes
Cheers
0
cheers

@Jonathan

A) I'm under the (possibly incorrect) assumption that people have been working with these stats for a significant amount of time and there isn't improvement to be made at the center without acknowledging the outliers.

B) Curves generally only mean something for random, repeatable experiments. You can argue that the curve is used in psychology and whatnot, but we're not trying to get into average player's head here, are we? We're trying to compare players from across the league objectively.

C) I haven't seen anyone ensuring things work at the centre. I've seen people using the data they have to make conclusions without testing the centre, while ignoring or avoiding the margins. Again, I could just be missing something here.

D) If things can be skewed to the extent that we're talking about "luck" (or variance if you prefer) on "small" spaces (~20 games), and these small spaces a 1/4 of the season, how can we rationally say we're talking reasonably about the center or the outliers?

Avatar
#53 Ender
March 26 2009, 09:30AM
Trash it!
0
trashes
Cheers
0
cheers

Oh, and most sciences cut off the fringes not because of reasonability to predict, but rather artifacts of data collection. If something falls 99 out of 100 times, it's likely that 100th time you conducted the experiment wrongly. We can see this isn't the case in hockey.

Avatar
#54 Jonathan Willis
March 26 2009, 09:47AM
Trash it!
0
trashes
Cheers
0
cheers

Ender wrote:

@Jonathan A) I’m under the (possibly incorrect) assumption that people have been working with these stats for a significant amount of time and there isn’t improvement to be made at the center without acknowledging the outliers.

NHL teams certainly have, but I have trouble finding anything published more than about five years ago. Desjardins site only runs back to 2005-06.

We're looking at an emerging field of study here, not an established one.

B) Curves generally only mean something for random, repeatable experiments. You can argue that the curve is used in psychology and whatnot, but we’re not trying to get into average player’s head here, are we? We’re trying to compare players from across the league objectively.

Nonsense - curves are used everywhere. Demographics, for example makes extensive use of curves. Political science has done tremendous work with the rather unstable prediction of population segments - and voting intention subdivided by ethnicity, gender, age, income, education, etc. is a far more complex system than a hockey game.

C) I haven’t seen anyone ensuring things work at the centre. I’ve seen people using the data they have to make conclusions without testing the centre, while ignoring or avoiding the margins. Again, I could just be missing something here.

There hasn't exactly been a bunch of peer-reviewed studies, but folks like Iain Fyffe and Alan Ryder have done a lot of testing on different statistics - most of it's heavy math stuff, so I usually just get at the edges. Another thing worth checking out is Yahoo's Hockey Analysis Group, which has a lot of background information on statistical research.

As for other testing, read through the archives at Irreverent Oilers Fans; there's plenty of evidence.

D) If things can be skewed to the extent that we’re talking about “luck” (or variance if you prefer) on “small” spaces (~20 games), and these small spaces a 1/4 of the season, how can we rationally say we’re talking reasonably about the center or the outliers?

I was thinking more fifteen games, but let's use 20 as an example. That's 1/4 of a season for one NHL team, meaning that there are 120 such segments taking place in any given season in the NHL. Even if 10 of them fall outside your curve's estimation, you're still having a 92% correlation between Corsi/PDO and results. So statistically, missing one 20-game sample really isn't evidence that your theory is flawed.

Besides, what's the alternative? Look at how Montreal started the year this season, or Ottawa the season before - what answers do the non-stats guys offer (hint: Emery is a cancer... uhh...). Whereas looking at it via Corsi it's plain that they were posting results beyond what their outshooting data would support. The non-stats guys (as far as I've seen, anyway, please point out anything I've missed that you're aware of in the instances) seem to accept that first 20 or 30 game segment as their genuine talent level, and assume that the much larger remainder of the season is a result of a disruption in the room or whatever when it's much more easily explained as an artificial hot-streak and not much else.

Avatar
#55 David S
March 26 2009, 10:15AM
Trash it!
0
trashes
Cheers
0
cheers

Jonathan Willis wrote:

We’re looking at an emerging field of study here, not an established one.

Just the fact that you said that is more than I've heard from anybody else Jonathan. Problem is, from how most guys use these stats, you'd swear it's accepted doctrine.

Avatar
#56 Ender
March 26 2009, 10:21AM
Trash it!
0
trashes
Cheers
0
cheers

@Jonathan

Re: A: Emerging field of study or not, a lot of things have popped up on the outlier list already. This implies that there is something fundamentally flawed with the way that either people are using the stat method, or the stat method itself.

Re: B: The curves you list are of very specific things, and none of them are anywhere near as general as the Corsi. Also, we've been through this before, and I can state with extreme confidence that not only is hockey much more complicated than the things that you're listing, but it falls into a level of chaos theory complexity. A hockey game is less predictable than subatomic particles in an atom.

Re: C:

OK.

Re: D:

It all depends on what your theory is purported to be. Any first-year stats course will teach you than correlation is not the same as causation, though it's treated as such here and elsewhere. Also, anybody in math or science seeing less than a 99.5% agreement between data and real life will tell you that your experiment or stat method aren't specific enough to say anything with any certainty.

The alternative is to try to make things better rather than wasting time and energy complaining that people don't understand or believe in the stats. Rationally, I and every single person who doesn't buy into what's currently being done is entirely logically justified in thinking so. It isn't a matter of who's right and who's wrong. It's a matter of if you believe in it so much, hone it and make it as perfect as it can be.

It seems like every few months you come out and ask why people don't believe in the stats or "people did it before us" or "science people use them." That's fine. However, I and others keep bringing up a lot of logically and mathematically sound reasons why, and they're ignored because they don't fit into the existing model. Then the question gets ranted again.

Newton "discovered" gravity and wrote some mathematical laws that seemed to govern it. Years later, people found reasons why it didn't work all of the time, and explained it away with luminiferous ether (sound is transmitted by air, light is transmitted by luminiferous ether). Someone who wanted to prove the LE was there in fact proved that it wasn't. Lorentz built some math constructs as a fudge factor to make Newtonian gravity exist in the real world. Einstein then explained the Lorentz transformations in real-world terms in Special Relativity, expounding on it in General Relativity.

The stats guys aren't to the point of Newton's initial leap, and don't seem interested in getting there. They're ignoring the outliers that need Lorentz transformations, and not explaining things in real life.

And you're content with this? Really?

Avatar
#57 Ender
March 26 2009, 10:52AM
Trash it!
0
trashes
Cheers
0
cheers

Whereas looking at it via Corsi it’s plain that they were posting results beyond what their outshooting data would support.

IIRC Edmonton nearly always posts data beyond what their outshooting data would support. Even in the 80s. I'm pretty sure Bruce even looked it up at one point.

But I'm out (for real). Radio silence for the rest of the day. I might post this evening, but it looks like a busy night as well.

Avatar
#58 Jonathan Willis
March 26 2009, 10:54AM
Trash it!
0
trashes
Cheers
0
cheers

Ender wrote:

And you’re content with this? Really?

As a starting point, absolutely.

You use Newton's Theory of Gravity as an example - but the mathematical laws that he used aren't even accurate; General Relativity disagrees with gravity in places and observation (one example being the Pioneer probes, another being the deflection of light) has shown that our understanding of gravity isn't totally accurate, either.

It's interesting to me that you talk about needing the equivalent of Lorentz transformations - but those didn't come until 200 years after Newton first proposed the theory of gravity.

Maybe I'm misinterpreting you, but you seem to be saying: 'look, you have a small amount of variance and until you explain it the theory doesn't hold water.' But if Newton had waited to publish his findings until that small amount of variance was explained it wouldn't even have happened in his century. It is enough to start with that there is a high correlation (and that correlation being unequal to causation argument only holds water if there's reason to believe that they are unrelated - and here there isn't) and then work to refine from there. Doesn't that just make sense?

Avatar
#59 Ender
March 26 2009, 11:03AM
Trash it!
0
trashes
Cheers
0
cheers

Jonathan Willis wrote:

Ender wrote: And you’re content with this? Really? As a starting point, absolutely. You use Newton’s Theory of Gravity as an example - but the mathematical laws that he used aren’t even accurate; General Relativity disagrees with gravity in places and observation (one example being the Pioneer probes, another being the deflection of light) has shown that our understanding of gravity isn’t totally accurate, either. It’s interesting to me that you talk about needing the equivalent of Lorentz transformations - but those didn’t come until 200 years after Newton first proposed the theory of gravity. Maybe I’m misinterpreting you, but you seem to be saying: ‘look, you have a small amount of variance and until you explain it the theory doesn’t hold water.’ But if Newton had waited to publish his findings until that small amount of variance was explained it wouldn’t even have happened in his century. It is enough to start with that there is a high correlation (and that correlation being unequal to causation argument only holds water if there’s reason to believe that they are unrelated - and here there isn’t) and then work to refine from there. Doesn’t that just make sense?

You're right, you're misinterpreting me ;)

The issues that made Newtonian gravity inaccurate were not visible until he was dead anyway. Einstein brought new data to the 99-100% of the time point with Special Relativity, and pushed the purely theoretical General Relativity as well. New gravity probes are sending data that conflict. Ok, all that means is that someone needs to work to bring the theory up to 99-100% agreement with the data again. The theory must match the data or you have little to stand on.

What I'm saying is if you can't get at least 99% success, you need to narrow your scope. If you want to keep your scope where it is, you need to do more work to get that number up to 99%. If you don't want to do either of these things, stop bringing up the "Stats matter. Really!" subject, because you really don't have a leg to stand on.

Then again, when you have a dead horse, just:

http://www.youtube.com/watch?v=Uqxo1SKB0z8

Really now, I'm out.

Avatar
#60 topshelf FMNF
March 26 2009, 11:04AM
Trash it!
0
trashes
Cheers
0
cheers

I'm so lost..

Where is the game day thread for us simple folk?

Avatar
#61 David S
March 26 2009, 11:40AM
Trash it!
0
trashes
Cheers
0
cheers

Ender - Really well put.

Jonathan - Thanks for not responding in the usual stats guy condescending tone. You points were at least well considered.

Avatar
#62 David S
March 26 2009, 11:40AM
Trash it!
0
trashes
Cheers
0
cheers

"you're"

*cringe*

Avatar
#63 Jonathan Willis
March 26 2009, 11:44AM
Trash it!
0
trashes
Cheers
0
cheers

@ David S:

"your" ;)

Avatar
#64 David S
March 26 2009, 11:54AM
Trash it!
0
trashes
Cheers
0
cheers

Jonathan Willis wrote:

@ David S: “your”

*facepalm*

Avatar
#65 Sean
March 30 2009, 02:27PM
Trash it!
0
trashes
Cheers
0
cheers

Excellent read JW! I have been overseas and behind. The following quote was interesting

They aren’t meant to replace visual observation; they’re meant to augment it, focus it and refine it.

Id agree with this except for Dustin Penner. Each time I watch him he's frustrating but the numbers keep telling me he is doing well. So in essence the numbers are replacing my opinion of him.

Anyways if you read this, Id be curious to hear your thoughts.

Comments are closed for this article.