SPARKMAN: A PROJECTION SYSTEM FOR MINOR LEAGUE PITCHERS

I evolved from a regular old baseball fan into something well beyond that the same way a lot of other people do nowadays – with fantasy baseball. I’d been playing most of my life, but in 2014 I dove into my first league that involved prospects. I didn’t really know what I was doing, but I ended up doing pretty well, I think, rounding up Corey Seager, Josh Bell, Michael Conforto, and Touki Toussaint in the first 7 rounds, and even managed to scoop a 17-year-old Luiz Gohara in the 22nd. Though any success I had past Seager was blind luck, really.

As the number of available names from the public “Top 100” lists began to slowly creep to 0, I didn’t know who to pick. So I stuck with the numbers. In the 10th round, I picked a 2013 draftee who, in his first full season, pitched 120+ IP at A+, struck out 117, walked just 25, allowed only 21 ER, and gave up just 2 HR all year. This, to me at the time, was a great pick.

His name was Glenn Sparkman, pitcher for the Kansas City Royals.

Of course, there was a lot I didn’t know at the time. I didn’t know that 22 year olds in High-A should [more or less] being pitching exactly that well below AA due to competition that is sometimes underdeveloped. I didn’t know that his 11 appearances out of the bullpen were noteworthy, as they would inflate his K totals while deflating his ERA. I didn’t know that his home park (Wilmington) suppresses HR like crazy and that his measly 1.4% HR/FB could in no way be a skill. 

This worthless little anecdote always stuck with me, and that’s how Sparkman, the projection system that I’ve developed for pitching prospects, got both its name and a little bit of inspiration.

WHAT DOES SPARKMAN DO?

Sparkman projects the Major League impact of a pitcher through his 20’s using his age and stats at whatever level he pitched at. The projections come in the form of a total FanGraphs WAR total for these years and are based on historical seasons in the minors from 2007 forward. [1]

This will be the first time of many I will say that Sparkman is not intended in any way to replace or be better than actual scouting. That’s not its purpose. When analyzing a prospect, pitching or hitting, visit the actual scouting reports of Prospects Live, FanGraphs, Baseball America, or wherever else first, as the numbers alone will never come close to tell the whole story. What Sparkman can hopefully do is contextualize certain things such as age/level/risk to pershaps shed some light on some pitching performances that were either underappreciated or not as impressive as they seemed on the surface.

[1] Learning to web scrape is very much on the top of my to-do list, but until then, only the years 2007 and beyond for Minor League Baseball data are [free and] available in one nice, neat location for me to download and analyze. On one hand, that’s disappointing; more data is never a bad thing, and I would love to go into the 90s or beyond. On the other hand, the game has changed so much over the last 10 – 20 years, I’m not sure if this data would be more helpful than what I already have. It could even be making the model worse. I won’t know until I get my hands on it. For now, however, I was happy with the data I had for an initial roll out, which was almost 40,000 individual data points from Low-A to AAA over those years.

METHODOLOGY

What Sparkman sets out to do is, for each pitching prospect that pitched in the last year, project the percent chance that said pitcher will reach certain career “milestones” before the age of 30. Then, based on those percentages, calculate an expected WAR for each pitcher over that same time period. To do this, Sparkman takes a players stats and uses various logistic regressions that change depending on level and milestone.

Logistic regressions, for those unfamiliar, have two outputs: 

0 (or “False”, “Negative”, “No”, etc.)

1 (or “True”, “Positive”, “Yes”, etc.)

All historical seasons in the Minors have a 0 or a 1 associated with them for each milestone at the Major League level. Here’s James Paxton in his 20’s (13.6 total fWAR), for example:

Plotted in red along Y=1 (“Yes”) are the players that were worth 2+ WAR in their 20’s. These are dots, as one player only has one output, but may look like a line in most places due to the high density of data (there’s only so much you can spread out thousands of data points across one line). Plotted in purple at Y=0 (“No”) are are the players that didn’t hit that milestone. In light blue, we have the percent chance that the player will reach said milestone as predicted by the logistic regression given his league adjusted K%. As you can see, there are very few players who had an 80 K%+ or below in AA who reached this 2 WAR milestone. Meanwhile, the difference in chances to reach 2+ WAR more than doubled from 100 K%+ to 140 K%+. It’s hard to tell from a plot such as this, but the density of Y=1 dots past a 110 K%+ is much higher than the density of dots along Y=0 at this point, which is why the curve gravitates upward towards Y=1 as the K%+ increases. This is just one example of how a stat can impact Sparkman’s projection of a pitcher.

If at this point you’re thinking to yourself, “this sounds a bit like Chris Mitchell’s KATOH”, you wouldn’t be off base. Sparkman and KATOH are built similarly. Each model uses binary output models to predict milestones. KATOH was built using a probit model whereas Sparkman was built on logit models. They’re very similar, and I just stuck with logistic regression solely because I was much more familiar with it. Also, Sparkman is (from what I can tell from reading Mitchell’s work/writing) a bit more “dynamic” than KATOH in that each milestone at each level doesn’t necessarily take the same inputs as the milestone before it. For example, BB% might be helpful in predicting the “Make MLB” milestone in A-ball, but it might be not helpful whatsoever to predict the “7 WAR” milestone. In that case, BB% was accounted for only until it begins losing predictive value.

The rundown of which stats at which level I found to be predictive of which milestones would honestly be an entire article in and of itself, so I’m not going to go into too much detail, but for the most part, the most important factors were unsurprisingly age, rates of games started vs. relief appearances (GS%), and K%+ (adjusted to league average). BB%+ was helpful quite a bit, but not always, and HR/BBE+ (used as a proxy for “hard” contact) and (GB+IFFB)/BBE+ (used as a proxy for “weak” contact) were helpful in a few cases, but were unhelpful and thus unused more often than not.

K%+, BB%+, HR/BBE+, and (GB+IFFB)/BBE+ were (as you can tell by the “+”) centered around league averages of the league in question during that year. Park factors for HR/BBE+ were also input based on HR park factors recently produced by Sam Dykstra at MiLB.com. Wilmington has a 53 (!!) park factor for home runs over the last 3 years, so it’s no wonder Glenn Sparkman thrived there way back when.

All 4 of these were also regressed back towards league average depending on the sample size at hand. If a pitcher faced 50 batters and walked 1 person, his BB% that went into the model wasn’t 2% (or whatever BB%+ resulted from a 2% mark). Instead, uncertainty was measured based on 50 TBF and regressed upwards due to sample size that was small relative to the “reliability” point of BB% (~170 TBF). [3]

[3] My theory as to why HR/BBE wasn’t more impactful was due to the large “reliability” point of home run rate, which I estimated around 800 batted balls based on a few things. When all the numbers were run, very few HR marks strayed very far from the 90-110 range, as 800 batted balls in a single year is impossible to reach, giving us no “stable” – and thus always regressed –  HR/BBE numbers.

RESULTS

  1. Remember, all numbers you see below are projections before the age of 30
  2. FV values are loosely based on projected WAR, both historically (FG) and by the distribution of xWAR totals that the model has output in the past. Don’t take them too seriously; it’s just a way to quickly group similar outputs in a way that people understand.
  3. The model doesn’t know who has made the MLB or not, so there are (by design) players who have already made the MLB with a “Make MLB %” below 100%. Don’t worry too much about that.
  4. If on mobile, turn to view in landscape to see results in table form
RankNameTeamAgeMake MLB2 WAR4 WAR7 WAR10 WAR14 WAR18 WARExpected WARFV
1Jesus LuzardoAthletics210.940.660.640.460.340.20.117.860
2Deivi GarciaYankees200.840.530.50.410.280.180.126.955
3Bryse WilsonBraves210.930.650.620.370.270.150.086.755
4MacKenzie GorePadres200.860.580.50.40.270.150.096.555
5Simeon Woods RichardsonBlue Jays180.830.590.560.50.250.130.056.455
7Ian AndersonBraves210.860.550.450.390.230.130.08655
8Lewis ThorpeTwins230.930.560.520.340.20.110.065.955
9Adrian MorejonPadres200.920.670.50.310.220.090.025.655
10Matt ManningTigers210.880.590.520.290.170.090.065.655
11Luis PatinoPadres190.830.520.450.350.230.130.065.655
12Joey CantilloPadres190.780.510.480.390.210.10.045.555
13Mitch KellerPirates230.940.570.530.330.170.090.055.350
14Sixto SanchezMarlins200.870.60.520.240.140.080.065.250
15Adbert AlzolayCubs240.920.470.440.380.170.090.04550
16Grayson RodriguezOrioles190.760.480.450.380.180.090.04550
17Jordan BalazovicTwins200.80.450.370.250.160.090.054.650
18Reggie LawsonPadres210.810.460.390.280.170.090.054.650
19Dustin MayDodgers210.880.550.450.230.130.070.044.650
20Brendan McKayRays230.890.490.410.260.140.080.044.650
21Kyle WrightBraves230.920.530.430.260.130.070.034.550
22Tarik SkubalTigers220.780.40.340.260.160.090.054.350
23Genesis CabreraCardinals220.860.430.380.250.130.070.044.350
24Logan AllenIndians220.880.430.390.240.110.060.03450
25Bryan MataRed Sox200.810.480.290.20.130.070.04450
26Ethan SmallBrewers220.550.260.240.230.180.160.073.950
27Patrick SandovalAngels220.890.440.360.230.10.050.023.850
28Taylor HearnRangers240.90.420.360.240.080.040.023.750
29Spencer HowardPhillies220.830.430.330.20.110.060.033.750
30Corbin MartinDiamondbacks230.920.450.380.210.090.050.023.750
31Edward CabreraMarlins210.810.440.320.210.120.070.033.750
32Forrest WhitleyAstros210.780.350.280.240.140.080.043.750
33Nate PearsonBlue Jays220.840.430.330.190.10.050.023.650
34Jose UrquidyAstros240.840.360.330.210.110.060.033.650
35Huascar YnoaBraves210.760.340.310.210.120.060.033.550
36Kyle MullerBraves210.730.330.230.210.120.070.043.450
37Zack ThompsonCardinals210.550.260.210.20.140.120.053.450
38John DoxakisRays200.580.320.210.190.130.110.043.350
39DL HallOrioles200.710.340.320.20.10.060.013.350
40Alek ManoahBlue Jays210.610.340.260.150.10.070.023.250
41Brusdar GraterolTwins200.810.470.240.140.080.050.033.250
42Jhoan DuranTwins210.780.410.280.160.090.050.023.250
43Brock BurkeRangers220.840.430.310.160.080.040.023.250
44Joey WentzTigers210.650.310.240.190.110.060.043.150
45Devin SmeltzerTwins230.860.40.320.140.060.030.013.150
46Drew RomOrioles190.640.320.270.220.080.040.033.150
47Kyle YoungPhillies210.780.370.230.150.090.050.043.150
49Trevor RogersMarlins210.710.30.230.150.090.050.04350
50Tony GonsolinDodgers250.940.420.270.160.040.020.01350
51Nick LodoloReds210.60.340.270.150.10.070.02350
52Beau BurrowsTigers220.850.340.310.170.070.040.02350
Scroll to Top