Working with Minor League Similarity Scores
Using the Minor League Baseball database to calculate each players most similar comparison.
There are some people who see baseball players that way-each one is unique, absolutely not interchangeable with another. I don't deny the validity of that approach-but if you take that tack, then you can't turn around and argue that your player should be in the Hall of Fame because his numbers are just as good as this other player's. "Similarity" is a complex concept, and two players who are not statistically similar may be profoundly similar in some other way...players who have similar primary characteristics will tend to have similar secondary characteristics as well.Bill James in Whatever Happened to the Hall of Fame?
Similarity scores were created by Bill James to compare the careers of Hall of Fame eligible players.In In the most basic sense, similarity scores use aggregated performance statistics to compare a playerâs worth for induction into the Hall of Fame. Projection systems follow from this method: Steamer, PECOTA, Marcel, ZiPS, and others use some combination of a playerâs recent performance, usually the last 3-4 seasons, to project future performance. Depending on the method, a playerâs base statistics are then modified using typical aging curves, linear weights, regression, and numerous other factors. Notably, PECOTA uses 3 year performance statistics of comparable players, using nearest-neighors analysis, to forecast a playerâs future performance.
"The PECOTA similarity scores are based primarily on looking at a three-year window of a pitcherâs performance. Thus, we might look at what a pitcher did from ages 35-37, and compare that against the most similar age 35-37 performances, after adjusting for parks, league effects, and a whole host of other things. This is different from the similarity scores you might see at baseball-reference.com or in other places, which attempt to evaluate the totality of a playerâs career up to a given age." Nate Silver
One of Silverâs first explanations of PECOTAâs forecasting method details the value in projecting a minor league playerâs future career based on the career performance of their comparisons. Teams would be remiss to not consider what a playerâs future statistics might look like based on their previous performance. PECOTA has created a projection system that models minor league players better than itâs competitors utilizing the comparable playerâs model. Letâs use our minor league database to investigate minor league similarity scores and create projections for a notable minor league player.
Bill James Similarity Scores
Similarities - Career
Jamesâ Similarity Score model was designed for major league careers, but letâs see how the model holds for minor league careers. The dataset includes minor league statistics from 2000-2014:
Name | Age | LevEq | G | PA | AB | R | H | X2B | X3B | HR | RBI | SB | CS | BB | SO | BA | OBP | SLG | OPS | TB | SS |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Kris Bryant | 22 | 1.58 | 174 | 740 | 620 | 140 | 203 | 48 | 3 | 52 | 142 | 16 | 4 | 97 | 197 | 0.327 | 0.428 | 0.666 | 1.094 | 413 | 1000 |
Ryan Braun | 23 | 1.67 | 199 | 864 | 767 | 131 | 240 | 61 | 6 | 42 | 144 | 34 | 12 | 70 | 151 | 0.313 | 0.375 | 0.572 | 0.947 | 439 | 926 |
Alex Gordon* | 26 | 1.89 | 235 | 1061 | 867 | 199 | 278 | 69 | 5 | 48 | 170 | 30 | 5 | 155 | 212 | 0.321 | 0.438 | 0.578 | 1.016 | 501 | 906 |
Kelvin Diaz | 21 | 0 | 182 | 765 | 639 | 117 | 213 | 47 | 7 | 22 | 147 | 23 | 12 | 74 | 92 | 0.333 | 0.426 | 0.532 | 0.958 | 340 | 903 |
Jake Lamb* | 23 | 1 | 244 | 1079 | 920 | 158 | 295 | 83 | 10 | 37 | 193 | 10 | 2 | 127 | 229 | 0.321 | 0.406 | 0.553 | 0.959 | 509 | 902 |
D.J. Peterson | 22 | 1.25 | 178 | 777 | 703 | 119 | 210 | 42 | 2 | 44 | 158 | 8 | 2 | 65 | 158 | 0.299 | 0.362 | 0.552 | 0.914 | 388 | 896 |
Matt Williams | 36 | 2.17 | 21 | 71 | 65 | 11 | 22 | 5 | 0 | 5 | 15 | 1 | 0 | 6 | 5 | 0.338 | 0.394 | 0.646 | 1.04 | 42 | 894 |
Evan Longoria | 26 | 2.37 | 219 | 937 | 803 | 145 | 238 | 43 | 1 | 47 | 160 | 8 | 2 | 104 | 170 | 0.296 | 0.385 | 0.528 | 0.913 | 424 | 890 |
Jose Fernandez | 26 | 3 | 255 | 1052 | 920 | 169 | 287 | 74 | 5 | 41 | 182 | 19 | 10 | 104 | 184 | 0.312 | 0.389 | 0.537 | 0.926 | 494 | 889 |
Albert Pujols | 20 | 1.67 | 133 | 544 | 490 | 74 | 154 | 41 | 7 | 19 | 96 | 4 | 5 | 46 | 47 | 0.314 | 0.378 | 0.543 | 0.921 | 266 | 887 |
Pedro Feliz | 36 | 2.5 | 156 | 646 | 606 | 96 | 174 | 39 | 2 | 38 | 119 | 1 | 2 | 31 | 110 | 0.287 | 0.321 | 0.546 | 0.867 | 331 | 883 |
You might have heard of a few of those names. The issue with these similarities is that they encompass a playerâs career minor league statistics; weâre more interested in the performance of Bryantâs same-aged peers.
Similarities - Age
Letâs see how he compares to other 22 year old players:
Name | Age | LevEq | G | PA | AB | R | H | X2B | X3B | HR | RBI | SB | CS | BB | SO | BA | OBP | SLG | OPS | TB | SS |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Kris Bryant | 22 | 2.5 | 174 | 740 | 620 | 140 | 203 | 48 | 3 | 52 | 142 | 16 | 4 | 97 | 197 | 0.327 | 0.428 | 0.666 | 1.095 | 413 | 1000 |
Alex Gordon* | 22 | 2 | 130 | 576 | 486 | 111 | 158 | 39 | 1 | 29 | 101 | 22 | 3 | 72 | 113 | 0.325 | 0.427 | 0.588 | 1.016 | 286 | 932 |
Corey Dickerson* | 22 | 1 | 175 | 743 | 659 | 132 | 204 | 49 | 14 | 45 | 148 | 21 | 12 | 67 | 150 | 0.31 | 0.38 | 0.631 | 1.011 | 416 | 924 |
Nick Akins | 22 | 0 | 126 | 548 | 472 | 94 | 152 | 40 | 7 | 32 | 120 | 5 | 5 | 58 | 135 | 0.322 | 0.407 | 0.64 | 1.047 | 302 | 921 |
Kevin Mench | 22 | 1 | 132 | 583 | 491 | 118 | 164 | 39 | 9 | 27 | 121 | 19 | 7 | 78 | 72 | 0.334 | 0.427 | 0.615 | 1.042 | 302 | 909 |
Ryan Braun | 22 | 1.5 | 165 | 730 | 650 | 103 | 200 | 49 | 6 | 32 | 122 | 30 | 9 | 55 | 140 | 0.308 | 0.367 | 0.549 | 0.917 | 357 | 907 |
Mark Teixeira# | 22 | 1.5 | 86 | 375 | 321 | 63 | 102 | 21 | 5 | 19 | 69 | 5 | 2 | 46 | 60 | 0.318 | 0.413 | 0.592 | 1.005 | 190 | 904 |
Jake Lamb* | 22 | 0.5 | 136 | 619 | 528 | 95 | 167 | 44 | 5 | 22 | 109 | 8 | 2 | 74 | 126 | 0.316 | 0.405 | 0.544 | 0.949 | 287 | 902 |
James Darnell | 22 | 1 | 142 | 630 | 524 | 89 | 167 | 41 | 5 | 22 | 96 | 9 | 7 | 98 | 101 | 0.319 | 0.428 | 0.542 | 0.97 | 284 | 901 |
Jedd Gyorko | 22 | 1.5 | 208 | 945 | 844 | 154 | 273 | 64 | 2 | 32 | 155 | 14 | 4 | 92 | 171 | 0.323 | 0.392 | 0.518 | 0.909 | 437 | 900 |
Hunter Pence | 22 | 1.0 | 172 | 737 | 652 | 119 | 207 | 40 | 5 | 39 | 127 | 12 | 10 | 79 | 120 | 0.317 | 0.391 | 0.574 | 0.964 | 374 | 898 |
Still a very impressive list. Using these similar players, lets take a play out of the PECOTA playbook and generate some basic projections without adjusting for outside effects (park factors, leagues, league-wide performance shifts, etc). By simply calculating the mean of these top 10 player comparables for each statistical category we can get a general idea of Bryantâs future performance.
10 Year Projection - Kris Bryant
Year | Age | LevEq | G | PA | AB | R | H | X2B | X3B | HR | RBI | SB | CS | BB | SO | BA | OBP | SLG | OPS | Count |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2015 | 23 | 3 | 130 | 548 | 489 | 76 | 138 | 32 | 5 | 24 | 83 | 9 | 4 | 47 | 105 | 0.282 | 0.35 | 0.515 | 0.865 | 10 |
2016 | 24 | 4 | 132 | 551 | 492 | 77 | 140 | 32 | 5 | 23 | 77 | 8 | 4 | 49 | 107 | 0.285 | 0.354 | 0.51 | 0.864 | 9 |
2017 | 25 | 4 | 115 | 476 | 424 | 66 | 121 | 26 | 3 | 20 | 72 | 7 | 4 | 43 | 89 | 0.285 | 0.356 | 0.502 | 0.859 | 8 |
2018 | 26 | 4 | 129 | 543 | 478 | 76 | 135 | 31 | 3 | 22 | 73 | 7 | 4 | 56 | 93 | 0.282 | 0.363 | 0.498 | 0.861 | 6 |
2019 | 27 | 4 | 148 | 635 | 569 | 92 | 169 | 36 | 4 | 28 | 94 | 15 | 6 | 58 | 104 | 0.297 | 0.365 | 0.522 | 0.887 | 5 |
2020 | 28 | 4 | 151 | 647 | 574 | 87 | 174 | 38 | 3 | 25 | 94 | 11 | 3 | 64 | 109 | 0.303 | 0.376 | 0.51 | 0.886 | 5 |
2021 | 29 | 4 | 127 | 532 | 475 | 70 | 130 | 26 | 4 | 20 | 77 | 5 | 3 | 47 | 96 | 0.274 | 0.343 | 0.472 | 0.815 | 5 |
2022 | 30 | 4 | 143 | 591 | 525 | 81 | 141 | 32 | 3 | 21 | 81 | 10 | 3 | 56 | 104 | 0.269 | 0.345 | 0.461 | 0.806 | 5 |
2023 | 31 | 4 | 159 | 696 | 620 | 98 | 163 | 28 | 6 | 30 | 93 | 9 | 4 | 64 | 120 | 0.263 | 0.336 | 0.473 | 0.808 | 2 |
2024 | 32 | 4 | 118 | 440 | 380 | 48 | 93 | 22 | 1 | 14 | 63 | 3 | 1 | 46 | 60 | 0.245 | 0.327 | 0.418 | 0.745 | 2 |
Have feedback, questions, or want to see something else added? Check out the code I used to create this page or fork my repository to propose changes. Edit My Code