Predicting MLB Success from College Players-Off the Radar

facebooktwitterreddit

June 21, 2011; Omaha, NE, USA; A general view to the entrance of TD Ameritrade Park prior to the game between the Virginia Cavaliers and the South Carolina Gamecocks. Mandatory Credit: Brace Hemmelgarn-USA TODAY Sports

The book Moneyball is often misunderstood, usually by people that didn’t read it or think it is about on base percentage. One of the key themes in the book (hint: the key to the book was finding players that were not correctly valued by other teams, with David Justice, Chad Bradford, Ricardo Rincon, and Scott Hatteberg being the MLB examples) was the draft, and how the Athletics front office believed that college statistics were an underutilized tool that they could take advantage of in the draft (the book was about the draft as much as it was about the MLB club, but the former doesn’t really show up in the movie). Of course, that was over a decade ago, and the draft, outside of Nick Swisher, was pretty ineffective, but college statistics are used by MLB teams to help evaluate for the draft. In the past, I developed a translation method to predict MLB success using college statistics, but it had a survivor bias in it and the linear method was pretty simplistic, so I don’t really use it. Also, the bats in college have changed, which makes comparing OPS difficult, since offensive numbers are different that they were just a few years ago. So, I mainly use scouting, which I do a lot of at my own site. However, I found myself, when writing about players I have watched in college, peeking at OPS, along with looking at things like steals and homers to see if we can get a glimpse of how the players’ tools are playing in games. I thought it might be wise to make a metric that we can grade college players with, and then test whether or not it is predictive or not. I call it Tool WAA. Here are its components:

I will use the Simple Speed Score that I used when designing my KBO WAR, which is just the stolen base/caught stealing component of the statistic. I will then change the value of over or under average to a percentage, as in if a batter has a 6.5 speed score, change it to 1.5% (since 5 is considered average).

I wanted to add a defensive component, but range factor is the only real number we have for college players, which is fine, except the main NCAA baseball website doesn’t keep it, and individual teams only started really archiving their statistics rather recently (so some schools you can go way back, and some you can’t even really look at 2010. Instead, I just used a positional adjustment. I changed the run value (the traditional one used by Tango and others) to a percentage. As I did for all the numbers, I used The Baseball Cube’s college statistics and information. If they don’t specify which outfield position the player played, we will use -5. If they don’t have any specific position in the infield and are listed as just infield, we will use 0.

The third component measures plate discipline, K%-BB%. MLB Average is usually ~20% for Ks, and 8% for BBs, so we will consider 12% average. You will see that the vast majority of the college players have a better K%-BB%, but they are compared to each other, so it isn’t a big deal if most of them are positive here.

Finally, I used HR %. 2.6% is usually MLB average. While not NCAA average, this seemed like a solid measuring stick, and because we are comparing the players, the number itself isn’t that important, it is what their ranking is in comparisons. You then add all the percentages together. It doesn’t create a run value but should give you an idea of how much better (or worse) the players are based on percentage.

Just for fun, Mike Trout’s Tool WAA from 2012 in the Majors (used the regular speed score instead of simple speed score) was 8.89, while Miguel Cabrera’s was 11.21. To test the tool and see if it had any predictive validity, I looked at the Big 12 (A major NCAA conference that gives us over 90 players of a sample size, and enough time that we pretty much know whether or not the players are going to succeed or not) from 2006 (every hitter with over 100 At-bats) and calculated their Tool WAAs below along with their highest level played in:

NameSpeed Score %Positional Adj. %K %- BB %HR %Tool WAAHighest LevelTool WAA -PARankOPSDraft
Drew Stubbs1.872.54.12.5511.02MLB8.52210191
Corey Brown1.162.55.063.4212.14MLB9.64311061
Jackson Williams-1.5712.510.2-1.1119.84AAA7.34578011
Jordy Mercer1.297.51.50.410.69MLB3.197263
Roger Kieschnick-0.06-7.54.831.2-1.53AAA5.97229663
Kyle Russell-1.24-7.5-11.33.5-16.54AAA-9.04320.9113
Shelby Ford-2.472.55.93.038.96AAA6.46289793
Tyler Mach-1.892.583.812.41A-9.91510444
Bradley Suttle-32.59.35-0.88.05AA5.55477954
Tyler Reves-5.512.54.712.6614.37A+1.87378694
Joe Dunigan-0.27-7.5-9.190.79-16.17AA-8.678255
Jeff Christy-0.7812.53.041.3816.11AAA3.618136
Ryan Rohlinger-22.515.522.4818.5MLB16110676
Jordan Danks1.292.59.42-0.8812.33MLB9.83619467
Ty Wright-1-7.55.420.03-3.05AAA4.45878127
Luke Gorsett-2-56.694.654.33AA9.331310717
Beamer Weems-2.237.53.810.859.93AAA2.43268748
Jared Goedert0.85-519.613.9219.38AAA24.381110749
Evan Frey0.332.59.89-2.610.12AAA7.624084710
Seth Fortenberry1.29-50.991.36-1.36AAA3.641099311
Jacob Priday0.33-5-2.592.12-5.14A-0.143684811
Kyle Colligan-0.27-7.5-7.16-0.1-15.03A+-7.5379212
Craig Stinson-312.59.74-1.118.14A-5.644568112
Jake Opitz-32.58.1-1.625.98AAA3.488478012
Blake Stouffer-0.0962.58.55-0.8810.07A7.573975113
Gus Milner0.85-52.190.042-1.92AAA3.08791514
Carson Kainer-1.34-55.51-0.87-1.7AA3.32097514
Matt Smith-5.512.513.641.2321.87A+9.3717105914
Kody Kaiser0.6-51.380.26-2.75AA-2.25889415
Hunter Mense0.33-53.55-1.66-2.78AAA2.225471717
Chance Wheeless-3.59-12.56.98-1.69-10.8A1.78271617
Deik Scram-0.06-7.52.50.97-4.09AAA3.4115107718
Ritchie Price-2.2304.41-1.910.27A-0.271870418
Andrew Brown-6.3-7.5-0.063.78-10.08MLB-2.585989718
Nick Peoples2.172.54.5-1.287.89A5.391679119
Brandon Buckman-2.09-12.515.043.43.85AA16.352498019
Ryan Wehrle3.1908.90.9413.03A-13.0312102620
Kevin Russo2.652.513.29-1.7416.7MLB14.22973620
Zach Dillon-112.527.53-0.6638.37AA25.871499620
Matt Clarkson-4.4312.5-0.14-0.467.47A--5.0375520
Joey Callender0.552.513.2-2.214.05A+11.553476421
Aaron Reza-1.427.51.86-2.145.8A+-1.72782621
Byron Wiley-1.24-511.310.855.92A+0.926490722
Brock Bond-1.332.59.01-1.68.58AAA6.084185824
Keanon Smith0.85-52.910.13-1.11A3.893191025
Rebel Ridling-4.43-12.54.6-0.75-13.08AA-0.5886825
Cristen Tapia-4.43-12.55.9-0.31-11.34NCAA1.167678128
Parker Dalton1.40-3.03-1.95-3.58A-3.586961729
Kyle Martin-3.9105.022.053.16A3.1688529
Brian Capps-2.5706.04-1.681.79A-1.796687530
Jared Schweitzer-32.512.411.9613.87A11.376101830
Preston Clark-4.112.54.6-0.2912.71NCAA0.214274933
Ryan Lollis-1.892.56.06-1.615.06AAA2.5682037
Chuckie Caufield0.33-7.56.460.720.015AAA7.515495839
Eli Rumler-2.097.51.4-2.64.21AA-3.296567839
Kevin Smith-3.912.55.080.864.53AA2.03989939
Erik Morrison1.292.5-1.063.115.84AA3.342186746
Brock Simpson-1.57-59.07-0.162.34A-2.666382346
Willie Rueda3019.39-2.0320.36NCAA20.3635786
Brandon Farr1.5512.521.04-233.09NCAA20.5933821
Joe Roundy1.44-54.421.442.3Rk7.3191042
Nick Jaros1.44-54.63-0.490.58Ind5.5888890
John Allman1.2912.52.3-0.0716.02A+3.5223906
Barrett Rice1.2900.42-1.020.69Ind0.06943938
Russell Daley1.2905.82-1.55.63AA5.6358700
Aaron Ivey1.2-514.88-1.649.36NCAA14.3656725
Bryce Nimmo1.07-512.49-1.147.42NCAA12.4246729
Chase Gerdes0.85-54.31-1.57-1.41Ind3.5944811
Trevor Helms0.8508.53-1.447.94NCAA7.9474709
Matt Sodolak-0.0612.53.18-2.613.02NCAA0.5283651
Gary Arndt-0.1406.04-0.315.59Ind5.5980718
Jake Mort-0.3703.91-2.60.94NCAA0.9477691
Austin Boggs-0.52.51.9-2.61.3NCAA-1.251690
Jose Salazar-0.62013.652.0515.08AAA15.0850740
Zane Taylor-0.86012.41-1.789.77NCAA9.7730799
Ben Booker-1-5-3.24-1.38-10.62NCAA5.62
Blair Wilkins-1.89015.01-3.0410.08NCAA10.0878894
John Infante-2-57.42-1.84-1.42NCAA3.58644
Preston Land-2.140-5.613.06-4.7NCAA-4.738994
John McKee-2.142.5-40.4-3.25NCAA-0.7562842
J.C. Field-2.1412.55.17-0.1215.41Ind2.9171712
Buck Afenir-2.1412.5-5.591.15.87A--6.63738
Jake Vazquez-2.2312.52.251.4713.99NCAA1.4986783
Tyler Link-2.3306.5-1.682.49NCAA2.49751
Kevin Sevigny-2.41-52.87-0.77-5.31NCAA-0.3160771
Matt Baty-3-513-1.63.4NCAA8.449816
Chais Fuller-32.55.6-2.62.5A+0550.611
Freddy Rodriguez-3010.45-2.085.37NCAA5.3767720
Ryan Hill-3-511.07-1.671.4NCAA6.491702
Derek Chambers-4.11-12.58.860.09-7.66NCAA4.8448818
Tim Jackson-4.432.51.17-2.6-3.36NCAA-0.8670645
Andy Gerch-6.33-52.840.45-8.04NCAA-3.0490842

25 Players didn’t play in a higher level than the NCAA, they had an average Tool WAA of 3.8

Independent (5): 4.17

A- (8): 7.66 (the one rookie ball player had a 2.3 Tool WAA)

A (8):1.8

A+ (8): 7.09

AA (12): 3.43

AAA (18): 4.88

7 Players made the Majors, they had an average Tool WAA of 10.19

So not a lot of correlation in the middle, but the MLB players were much better than the rest. The NCAA and Independents were worse than most of them, but the A-ball players strangely turned out to be the worst. When you sort by Tool WAA, we don’t see much correlation, as out of the top 20, only 2 players turned out to be MLBers (10 % versus the 7.6%, so better than picking players at random, but not great correlation). In fact, one of the worst 20 players made the Majors (Andrew Brown). Looking at just speed score seemed to be a better predictor, as 5 of the top 20 made the Majors, with Andrew Brown being the 2nd worst and still making the Majors. K-BB % wasn’t a good predictor, with just 2 of the top 20 making the Majors, and Andrew Brown made it despite being in the worst 20. HR % had 4 MLBers in the top 20, with Kevin Russo making the Majors despite being in the bottom 20.

As I was inputting the data, it seemed that the positional values were being weighed too heavily, especially with catchers. 15 players were catchers, none of them made the Majors, and just 2 made AAA. Just to test this out, I took the positional adjustment out and created a 2nd Tool WAA (Tool WAA – PA). This seems to help, as 5 of the MLB players were in the top 21. However, there were still 6 players that didn’t play past college in the top 21, which is a higher percentage per player than in total. There were also just 2 AAA players. So overall, we are seeing that the MLB players tended to have good Tool WAAs in college, but it doesn’t seem to be that predictive. I wanted to introduce some control groups, and I thought of 3, all easily found on Baseball Cube. The first was the draft. Was Tool WAA a better predictor for which players made the Majors than the draft? 2 of the 3 first round picks made the Majors, with the other maxing out at AAA. 5 of the 7 MLB players were drafted in the first 7 rounds (the first 16 players). This is better correlation than the Tool WAA, and the adjusted Tool WAA.

The 2nd was OPS for that season. I talked about how I used to use OPS above, but didn’t really like the linear approach and change of bats made it harder to judge current day players. Did Tool WAA fare better than this? 11 players had an OPS over 1000, 3 of them were the MLBers, with 2 AAA guys mixed in (perhaps most importantly, all of them would play in affiliated baseball). The top 20 as a whole (946 OPS or better) had 4 MLBers. This is worse than speed score, the draft, and the adjusted Tool WAR.

The 3rd was the Baseball Cube hitter rankings, as it ranks the full time hitters based on a formula. The rankings predicted 3 MLB players perfectly, as the top 3 in the rankings all made the Majors. However, you have to go to 29 to see the next MLB player. This seems weaker than OPS as a whole.

I want to go back to the draft and see if we can separate what caused the 11 players that have failed to make the Majors in the top 7 rounds versus the 5 that did. This may give us an idea of which college numbers are more predictive.

Speed Score % of the 5 MLB Players: .722

Speed Score % of the 11 nonMLB players: -1.8

Tool WAA (-PA) of the 5 MLB Players: 9.44

Tool WAA (-PA) of the 11 nonMLB Players: 3.34

OPS of the 5 MLB Players: 973

OPS of the 11 nonMLB Players: 899

Rank of 5 MLB Players: 32 (for non ranked players, I gave them the rank of 93)

Rank of the 5 nonMLB Players: 47

All of them were predictive, but it seemed that Tool WAA was the best predictor.

So here I will at the 2012 draft, and look at all the college position players taken in the first 15 rounds of the draft, and list their adjusted (no positional adjustment) college career tool WAA.

Speed Score %HR %K%-BB%Tool WAATeamRound
Alex Yarbrough-0.5-0.616.034.92Angels4
Nolan Fontana1.04-0.2221.3722.19Astros2
Andrew Aplin-1.13-1.2518.0715.69Astros5
Max Muncy-0.271.1910.0310.95Athletics5
Blake Brown0.850.42-2.84-1.57Braves5
Victor Roache-1.77.459.4915.24Brewers1
Mitch Haniger-3.31.8910.59.09Brewers1
James Ramsey1.691.8312.1315.65Cardinals1
Stephen Piscotty-0.27-0.8611.1310Cardinals1
Patrick Wisdom-12.364.145.5Cardinals1
Alex Mejia-2.55-2.046.361.77Cardinals4
Ronnie Freeman-11.28.358.55Dbacks5
Mac Williamson1.073.65-3.631.09Giants3
Tyler Naquin-3-1.586.181.6Indians1
Chris Taylor1.291.399.2411.92Mariners5
Patrick Kivlehan2.434.812.479.71Mariners4
Mike Zunino1.594.262.518.36Mariners1
Austin Nola-1.33-0.8410.378.2Marlins5
Kevin Plawecki-3.430.0615.612.2Mets1
Matt Reynolds1.58-0.49.610.78Mets2
Tony Renda2.77-1.039.1510.89Nationals2
Spencer Kieboom-4.43-1.6714.328.22Nationals5
Brandon Miller-2.095.812.195.91Nationals4
Christian Walker-2.091.4716.4815.86Orioles4
Jeremy Baltz2.112.9110.5515.57Padres2
Travis Jankowski3.91-1.349.311.87Padres1
Fernando Perez0.33-0.838.477.97Padres3
Dane Phillips-3-10.81-3.19Padres2
Chris Serritella0.852.272.755.87Phillies4
Barrett Barnes2.822.587.1312.53Pirates1
Brandon Thomas2-178Pirates4
Pat Cantwell0.62-1.6610.669.62Rangers3
Preston Beck-3.340.5412.39.5Rangers5
Richie Shaffer22.39.2313.53Rays1
Andrew Toles1.93-0.9811.7312.68Rays3
Deven Marrero1.47-0.57.458.42Red Sox1
Jeffrey Gelalich2.320.361.724.4Reds1
Tom Murphy-0.272.45.77.83Rockies3
Kenny Diekroeger-4-1.23-0.63-5.86Royals4
Adam Walker4.923.77-2.95.79Twins3
Joey Demichle0.11.155.757White Sox3
Peter O’Brien-4.584.12.221.74Yankees2

The problem is that it doesn’t adjust for competition and conference. Which stat was the most predictive for draft position.

Which stats were more predictive for draft position?

The top 10 Speed Score players were drafted in the 2.2 round on average. The worst 10 were drafted in the 2.9 round on average.

The top 10 HR % players were drafted in the 2.4 round on average. The worst 10 were drafted in the 3.1 round on average.

The top 10 K-BB% players were drafted in the 3 round on average. The worst 10 were drafted in the 2.9 round on average.

The top 10 Tool WAA players were drafted in the 2.1 round on average. The worst 10 were drafted in the 2.7 round on average.

It is tough to see a great correlation there, but top Tool WAA players were the most likely to be drafted the highest, though the worst Tool WAA players were drafted higher than the worst of any of the other metrics. Players that did not hit homers were the least likely to be drafted high. K-BB % clearly had the worst correlation, which makes sense since it didn’t predict MLBers in the sample we looked at.

The Padres and Cardinals both had 4 college players in the top 5 rounds, and the Padres had a 5th that went to a community college (and thus didn’t have statistics on The Baseball Cube). The only one the Braves picked had a negative Tool WAA, as was the Royals. The Astros’ had the highest average Tool WAA, with both of their picks with extremely high numbers, but the Rays weren’t that far behind, with two players that had very high Tool WAAs. The Athletics only pick had a 10.95 Tool WAA, and the 3 picks the Mariners had an average of 10 Tool WAA, and the Orioles only pick had a Tool WAA of nearly 16.

So it seems we have found a group of statistics that may help us predict college players success in professional ball a little better. As we get closer to the draft, we will talk more about Tool WAA, and look at current college players.