Tuesday, December 30, 2008

The Black-Ink Test

Updated: While working on the Gray-Ink Test numbers, I noticed that the Innings Pitched and Games Played leaderboards weren't quite accurate, and that this caused some of the Black-Ink Test scores to change. The changes can be seen here. The biggest changes can be seen at the top of the list, where a number of pitchers jump higher on the list (because the IP category is worth 3 points). Also, the number one season is no longer held solely by Walter Johnson, as Christy Mathewson's 1908 season and Joe Medwick's 1937 are now tied with the Big Train's 1912 season. Take a look at the list for more changes, or check out the Gray-Ink Test numbers for a more thorough look at season leaders. I apologize for that.

In an interesting exploration of Triple Crown seasons and OBP, Tom Stone over at Seamheads.com brought up a statistic that I had never heard of:
"And that got me thinking: what single-season has the highest Black Ink or Gray Ink score? These measures don’t actually include OBP, though they do include total walks. Even so, it would be an interesting ranking I think. So I went over to Baseball-Reference.com, assuming I’d find the answer there. But I can’t seem to locate single-season Black Ink or Gray Ink numbers, or an all-time single-season leaderboard for these measures (they only list career totals it seems). If you can find this analysis online somewhere, please let me know."
According to the definition of Black-Ink Test that Tom linked to above, the statistic measures how often a player led the league in certain categories. Each category is given a point value between 1 and 4, with the triple crown categories having the highest point values and games played/appearances, at-bats, etc. having the lowest point value. It's a simple little statistic and it does have some weaknesses (penalizes modern players for being a bigger pool, does not allow for positional or ballpark adjustments, etc.), but it does get directly at what a lot of people wonder about anyway: what did he lead the league in that year?

Because it was such a simple statistic, in principle, and because it promised to return some interesting results, I figured I'd tackle it on my own. Now, someone else may have already ran these numbers somewhere, so I may just be repeating someone else's work, but it seemed like a good exercise anyway. Actually, the Grey-Ink Test would probably return even more interesting results, since it accounts for Top-10 finishes instead of just league-leading finishes, but that will have to wait since it requires a little more SQL magic than I felt like doing yesterday.

Anyhow, I ran through the work required for the Black-Ink Test yesterday, and I did my best to account for all the little nuances of the minimum requirements needed to lead a stat like batting average. I feel pretty confident that my numbers match those shown at places like Baseball-Reference.com.

The full list of the top 150 or so seasons, as rated by the Black-Ink Test, can be found here, along with a selected list of categories led for some player-seasons. Before I highlight some of these seasons, a couple of notes:
  • The leaders are broken down by league (i.e., AL vs NL), so players who change leagues during the season are penalized. Think Mark McGwire in 1997, who led the majors with 58 home runs, but led neither the AL nor NL individually. This, I believe, is consistent with MLB's record books, so I don't feel too bad about this.
  • The list I provided is only since 1901. It's already top-heavy with dead-ball era players, and it didn't seem worth including players from the 1880s as well.
  • I haven't checked thoroughly, but there is a small possibility that a rate-stat like batting average might be incorrectly placed at the top of a list due to rounding errors. I spot-checked as best I could, but I couldn't do all of them.
  • Players who share a lead in a category (such as Jose Canseco and Cecil Fielder each hitting 44 home runs in 1991) are both credited with the points for the Black-Ink Test.
With all of that said, here are some notable seasons near the top of the list:

#1 - Walter Johnson, 1913 (25 points): The Big Train won the MVP this season, after leading the league in: Wins, ERA, Strikeouts, IP, Win PCT, Complete Games, BB/9 innings, Hits/9 innings, and Shutouts (that's 9 of the 12 measured categories).

#2 - Joe Medwick, 1937
(24 points): The St. Louis outfielder also won the MVP in this Triple Crown season, after leading the league in: HR, RBI, Average, Runs, Hits, Slugging, Doubles, and AB.

#3 - Ty Cobb, 1909; Nap Lajoie, 1901; and Rogers Hornsby, 1922
(23 points): All three players won the Triple Crown in these years, while also leading in Runs, Hits, and Slugging. Lajoie and Hornsby also led the season in Doubles, but Cobb got his additional points from leading the league in Stolen Bases.

#10 - Carl Yastrzemski, 1967
(21 points): Yaz's Triple Crown year is the highest "modern" year, with the only post-World War season ranking above it being Williams' 1949 Triple Crown season. Yaz ranks this high by leading the league in: HR, RBI, Average, Runs, Hits, and Slugging.

#11 - Randy Johnson, 2002
(20 points - tied with 8 others, including two Triple Crown years from Ted Williams): The Big Unit's final Cy Young season in the desert is, by far, the highest ranking season of anyone who has played in the last 10 years. He managed this by leading the league in: Wins, ERA, Strikeouts, IP, Win PCT, and Complete Games.

#20 - Sandy Koufax, 1965 and 1966
(19 points - tied with 6 others): Koufax won the "pitching Triple Crown" in back-to-back years, placing in the top-20 best Black-Ink seasons each time. Besides Wins, ERA, and Strikeouts, Koufax led the league in Win PCT, Complete Games and Hits/9 innings in 1965 and IP, Complete Games, Games Started and Shutouts in 1966.

#28 - Roger Clemens, 1997
(18 points - tied with 5 others, including Triple Crown years from Mickey Mantle and Frank Robinson): Clemens is the next highest player from the last 10 years to make the list, leading the league in Win, ERA, Strikouts, IP, Complete Games, and Shutouts.

#34 - Pedro Martinez, 1999; Jim Rice, 1978
(17 points - tied with 7 others): In Pedro's dominating year (when he really, really should have won the MVP), he led the league in Wins, ERA, Strikeouts, Win PCT, and hits/9 innings. Then there's perennial HOF argument Jim Rice, who had a phenomenal year in 1978. That season, Rice led the league in HR, RBI, Hits, Slugging, Games, AB, and triples. It was the highest ranking season of any player from the 1970s.

#43 - Greg Maddux, 1994 and 1995; Albert Belle, 1995; Todd Helton, 2000
(16 points - tied with 15 others): This is where I'll stop. Maddux's two best seasons, by the Black-Ink Test, were back-to-back, just like Koufax. He led the league each year in Wins, ERA, Complete Games, and Shutouts. He also led the league in IP and Hits/9 innings in 1994 and in Win PCT and BB/9 innings in 1995. Helton and Belle are the next highest players of the last ten years to appear on the list. Belle led the league in 1995 in HR, RBI, Runs, Slugging, and Doubles, while Helton led the league in 2000 in RBI, Average, Hits, Slugging, and Double.

Those are the players I found to be most interesting. The original post by Tom at Seamheads was dealing with OBP, so it's interesting to note that, of the players that I checked in the top 20, only Ted Williams, in his 1942 and 1949 Triple Crown seasons, also led the league in walks.

I'll probably follow this up at some point this weekend, maybe, with leaders in the Grey-Ink Test. Like I said, though, that might require a little more SQL magic than I'd like, though.

3 comments:

Anonymous said...

Interesting that Ruth is so far down the list!

Thomas R. Stone said...

I've thought about this a bit more. Grey-ink is definitely a crude metric. That is not so bad when you are looking at all-time numbers, but single-season it would be pretty crude I think. So I wonder if better would be a modified version of Grey-Ink... where instead of getting the points no matter where in the top ten you are, you get some sort of graded score based on being 1st, 2nd,... 10th? And then multiplied by the grey-ink score factor of 1, 2, 3, 4, which indicates the relative importance of each stat? So if coming in first place in BA were worth 10 x 4 = 40, then coming in 10th place in BA would give you 1 x 4 = 4. Compare that with coming in first place for AB which is 10 x 1 = 10, or tenth place in AB which is 1 x 1 = 1. Still a crude thing -- is it better to lead the league in AB than be tenth in BA? Is it better by a ratio of 10 to 4? Who knows. But at first glance, this seems better -- at least for comparing single seasons -- than giving 4 points to the guy who wins the BA title and 4 points to the guy who comes in 10th. SO... if you do the SQL work to get these results... maybe do it with and without this additional "weighting" factor? What do you think?

lar said...

Tom,

That's exactly what I was thinking too. When I get around to the grey-ink stuff sometime over this holiday weekend, I figure on doing just that. Maybe granting the full 4 points to the HR leader and then .9*4 to #2 and .8*4 to #3 and so on. I'll definitely let you know.