We all know what a home run looks like. It’s one of baseball’s most common, though perhaps not best, types of highlights.
It has always seemed to me that batters are more likely to swing at the first pitch after a home run.
This plot shows the likelihood of the first pitch after a home run being a called strike plotted against the likelihood that the first pitch of any plate appearance is a called strike.
Bullpens are notoriously difficult to predict. Baseball is volatile by nature, and relievers even more so. Despite this, many teams invested heavily in relievers over the offseason. Here’s what bullpens are doing so far, one quarter through the season.
First, we’ll compare how each team’s bullpen performed last year with how one would expect them to perform based on batted ball data. If a team lies above the line, then their actual results were not as bad as one would expect, and they were lucky.
Probably not.
The Mets are not healthy. Their five best starters would combine to make one of the better starting rotations in recent history. Unfortunately, it is seeming increasingly unlikely that all five will pitch at the same time again. Steven Matz finished 2016 with a surgery to remove a bone spur in his elbow. He hasn’t pitched yet this season. Matt Harvey had season-ending surgery to alleviate thoracic outlet syndrome after a disappointing start to the season.
Spin is one of the most-discussed relatively new Statcast metrics. It is especially relevant when looking at four-seam fastballs, as that pitch tends to be the most straightforward and therefore easiest to predict movement-wise. We know spin has a positive effect on fastballs, but what is the extent of that effect? Are there any potential cons to throwing a ball with such a high spin rate?
Here are the four-seam spin kings from 2015-20171:
Every pitcher performs differently in different scenarios. Some starters can’t finish innings. More specifically, some starters are pretty good at getting two outs, but much worse at getting the third. In looking at potential ways to identify starters who may work better as relievers, I came across Baseball Reference’s number of outs splits. To find good candidates for this analysis, I looked at Fangraphs’ splits leaderboard for worst ERA with two outs.
Kris Bryant was very good last year. The Reds were…not great last year. In fact, the Reds’ bullpen was historically bad. Early in the season they set a record for consecutive games allowing at least one run, and later in the season they set a record for home runs allowed.
Kris Bryant, of course, was the National League MVP.
This is what the MVP did against normal pitching in 699 plate appearances:
In an attempt to sharpen my data visualization skills, I decided to look at some of the top starters from the 2016 season by inning. The pitchers were selected by a combination of innings pitched, complete games, and personal preference. Each pitcher pitched at least 150 innings last season. One thing to keep in mind with all of these is that as the game goes on the sample size decreases. Starters rarely made it through the ninth inning this season, so later game results should be taken with a grain of salt.
This post was featured on the Fangraphs community blog here
This week I played around with the baseballr package, which provides easy access to FanGraphs, Baseball Reference, and Statcast data in R1.
I’ve been particularly intrigued recently by Carl Edwards Jr., a Cubs reliever who got called up last season. He had always seemed to be surprisingly good, but I wasn’t aware quite how good he was until I calculated wOBA allowed by pitchers in 2016 and found that he had the third lowest in the league, behind only Kenley Jansen and Zach Britton, and in front of Clayton Kershaw, Aroldis Chapman, and Andrew Miller.
This week I focused on improving the usability of the Shiny app. I used Shiny’s renderUI function to dynamically generate the options for the game select dropdown. I also set up a baseball API here, and added another API endpoint to get the list of games that happened on any given day.
One of the nicer things I added to the Shiny app was the use of the tryCatch function to hide confusing error messages from the user, like this:
This week I made a Shiny interface to the win probability and description services. It can be viewed here. To do this, I had to set up hosting for both Shiny and my Elixir backend service. I also switched from embedding/posting the plots on Plotly to embedding them in the new Shiny application, which should help reduce costs. Learning more about Shiny made implementing the interface relatively simple. The generation of the graphs is now abstracted into a function that grabs JSON data from the API server I set up, which makes it much easier to generate graphs for any desired game.
Some resources used for the posts on this site:
Fangraphs: Analytical articles and statistical database Baseball-Reference: Very thorough statistical database Baseball Savant: Statcast/advanced metrics search and leaderboards Brooks Baseball: PitchFX analysis and granular data From a technical side:
Plot.ly Hugo RStudio BaseballR package
This week I mostly focused on experimenting with the Retrosheet and MLB Gameday API formats. Using the baseball umbrella application combined with R and Plotly, I generated a win expectancy graph for a random game.
In any given situation, the win expectancy is equivalent to the percentage of games that teams in that exact situation went on to win. To calculate the win probability of a situation, the application converts the situation into a hash using the number of outs, inning, number and position of base runners, and the current score, then looks it up in a table of historical win probabilities.