The book Moneyball is often misunderstood, usually by people that didn’t read it or think it is about on base percentage. One of the key themes in the book (hint: the key to the book was finding players that were not correctly valued by other teams, with David Justice, Chad Bradford, Ricardo Rincon, and Scott Hatteberg being the MLB examples) was the draft, and how the Athletics front office believed that college statistics were an underutilized tool that they could take advantage of in the draft (the book was about the draft as much as it was about the MLB club, but the former doesn’t really show up in the movie). Of course, that was over a decade ago, and the draft, outside of Nick Swisher, was pretty ineffective, but college statistics are used by MLB teams to help evaluate for the draft. In the past, I developed a translation method to predict MLB success using college statistics, but it had a survivor bias in it and the linear method was pretty simplistic, so I don’t really use it. Also, the bats in college have changed, which makes comparing OPS difficult, since offensive numbers are different that they were just a few years ago. So, I mainly use scouting, which I do a lot of at my own site. However, I found myself, when writing about players I have watched in college, peeking at OPS, along with looking at things like steals and homers to see if we can get a glimpse of how the players’ tools are playing in games. I thought it might be wise to make a metric that we can grade college players with, and then test whether or not it is predictive or not. I call it Tool WAA. Here are its components:
I will use the Simple Speed Score that I used when designing my KBO WAR, which is just the stolen base/caught stealing component of the statistic. I will then change the value of over or under average to a percentage, as in if a batter has a 6.5 speed score, change it to 1.5% (since 5 is considered average).
I wanted to add a defensive component, but range factor is the only real number we have for college players, which is fine, except the main NCAA baseball website doesn’t keep it, and individual teams only started really archiving their statistics rather recently (so some schools you can go way back, and some you can’t even really look at 2010. Instead, I just used a positional adjustment. I changed the run value (the traditional one used by Tango and others) to a percentage. As I did for all the numbers, I used The Baseball Cube’s college statistics and information. If they don’t specify which outfield position the player played, we will use -5. If they don’t have any specific position in the infield and are listed as just infield, we will use 0.
The third component measures plate discipline, K%-BB%. MLB Average is usually ~20% for Ks, and 8% for BBs, so we will consider 12% average. You will see that the vast majority of the college players have a better K%-BB%, but they are compared to each other, so it isn’t a big deal if most of them are positive here.
Finally, I used HR %. 2.6% is usually MLB average. While not NCAA average, this seemed like a solid measuring stick, and because we are comparing the players, the number itself isn’t that important, it is what their ranking is in comparisons. You then add all the percentages together. It doesn’t create a run value but should give you an idea of how much better (or worse) the players are based on percentage.
Just for fun, Mike Trout’s Tool WAA from 2012 in the Majors (used the regular speed score instead of simple speed score) was 8.89, while Miguel Cabrera’s was 11.21. To test the tool and see if it had any predictive validity, I looked at the Big 12 (A major NCAA conference that gives us over 90 players of a sample size, and enough time that we pretty much know whether or not the players are going to succeed or not) from 2006 (every hitter with over 100 At-bats) and calculated their Tool WAAs below along with their highest level played in:
|Name||Speed Score %||Positional Adj. %||K %- BB %||HR %||Tool WAA||Highest Level||Tool WAA -PA||Rank||OPS||Draft|
25 Players didn’t play in a higher level than the NCAA, they had an average Tool WAA of 3.8
Independent (5): 4.17
A- (8): 7.66 (the one rookie ball player had a 2.3 Tool WAA)
A+ (8): 7.09
AA (12): 3.43
AAA (18): 4.88
7 Players made the Majors, they had an average Tool WAA of 10.19
So not a lot of correlation in the middle, but the MLB players were much better than the rest. The NCAA and Independents were worse than most of them, but the A-ball players strangely turned out to be the worst. When you sort by Tool WAA, we don’t see much correlation, as out of the top 20, only 2 players turned out to be MLBers (10 % versus the 7.6%, so better than picking players at random, but not great correlation). In fact, one of the worst 20 players made the Majors (Andrew Brown). Looking at just speed score seemed to be a better predictor, as 5 of the top 20 made the Majors, with Andrew Brown being the 2nd worst and still making the Majors. K-BB % wasn’t a good predictor, with just 2 of the top 20 making the Majors, and Andrew Brown made it despite being in the worst 20. HR % had 4 MLBers in the top 20, with Kevin Russo making the Majors despite being in the bottom 20.
As I was inputting the data, it seemed that the positional values were being weighed too heavily, especially with catchers. 15 players were catchers, none of them made the Majors, and just 2 made AAA. Just to test this out, I took the positional adjustment out and created a 2nd Tool WAA (Tool WAA – PA). This seems to help, as 5 of the MLB players were in the top 21. However, there were still 6 players that didn’t play past college in the top 21, which is a higher percentage per player than in total. There were also just 2 AAA players. So overall, we are seeing that the MLB players tended to have good Tool WAAs in college, but it doesn’t seem to be that predictive. I wanted to introduce some control groups, and I thought of 3, all easily found on Baseball Cube. The first was the draft. Was Tool WAA a better predictor for which players made the Majors than the draft? 2 of the 3 first round picks made the Majors, with the other maxing out at AAA. 5 of the 7 MLB players were drafted in the first 7 rounds (the first 16 players). This is better correlation than the Tool WAA, and the adjusted Tool WAA.
The 2nd was OPS for that season. I talked about how I used to use OPS above, but didn’t really like the linear approach and change of bats made it harder to judge current day players. Did Tool WAA fare better than this? 11 players had an OPS over 1000, 3 of them were the MLBers, with 2 AAA guys mixed in (perhaps most importantly, all of them would play in affiliated baseball). The top 20 as a whole (946 OPS or better) had 4 MLBers. This is worse than speed score, the draft, and the adjusted Tool WAR.
The 3rd was the Baseball Cube hitter rankings, as it ranks the full time hitters based on a formula. The rankings predicted 3 MLB players perfectly, as the top 3 in the rankings all made the Majors. However, you have to go to 29 to see the next MLB player. This seems weaker than OPS as a whole.
I want to go back to the draft and see if we can separate what caused the 11 players that have failed to make the Majors in the top 7 rounds versus the 5 that did. This may give us an idea of which college numbers are more predictive.
Speed Score % of the 5 MLB Players: .722
Speed Score % of the 11 nonMLB players: -1.8
Tool WAA (-PA) of the 5 MLB Players: 9.44
Tool WAA (-PA) of the 11 nonMLB Players: 3.34
OPS of the 5 MLB Players: 973
OPS of the 11 nonMLB Players: 899
Rank of 5 MLB Players: 32 (for non ranked players, I gave them the rank of 93)
Rank of the 5 nonMLB Players: 47
All of them were predictive, but it seemed that Tool WAA was the best predictor.
So here I will at the 2012 draft, and look at all the college position players taken in the first 15 rounds of the draft, and list their adjusted (no positional adjustment) college career tool WAA.
|Speed Score %||HR %||K%-BB%||Tool WAA||Team||Round|
|Deven Marrero||1.47||-0.5||7.45||8.42||Red Sox||1|
|Joey Demichle||0.1||1.15||5.75||7||White Sox||3|
The problem is that it doesn’t adjust for competition and conference. Which stat was the most predictive for draft position.
Which stats were more predictive for draft position?
The top 10 Speed Score players were drafted in the 2.2 round on average. The worst 10 were drafted in the 2.9 round on average.
The top 10 HR % players were drafted in the 2.4 round on average. The worst 10 were drafted in the 3.1 round on average.
The top 10 K-BB% players were drafted in the 3 round on average. The worst 10 were drafted in the 2.9 round on average.
The top 10 Tool WAA players were drafted in the 2.1 round on average. The worst 10 were drafted in the 2.7 round on average.
It is tough to see a great correlation there, but top Tool WAA players were the most likely to be drafted the highest, though the worst Tool WAA players were drafted higher than the worst of any of the other metrics. Players that did not hit homers were the least likely to be drafted high. K-BB % clearly had the worst correlation, which makes sense since it didn’t predict MLBers in the sample we looked at.
The Padres and Cardinals both had 4 college players in the top 5 rounds, and the Padres had a 5th that went to a community college (and thus didn’t have statistics on The Baseball Cube). The only one the Braves picked had a negative Tool WAA, as was the Royals. The Astros’ had the highest average Tool WAA, with both of their picks with extremely high numbers, but the Rays weren’t that far behind, with two players that had very high Tool WAAs. The Athletics only pick had a 10.95 Tool WAA, and the 3 picks the Mariners had an average of 10 Tool WAA, and the Orioles only pick had a Tool WAA of nearly 16.
So it seems we have found a group of statistics that may help us predict college players success in professional ball a little better. As we get closer to the draft, we will talk more about Tool WAA, and look at current college players.