Friday 7 July 2017

Batted ball analysis - ballpark factors

I am currently reading George Will's Men at Work, and at some point it mentions how hitters' successes are influenced by, or depending on, the ballparks they play the majority of their games in. It made me wonder how many of the homeruns (HRs) batters hit in a specific ballpark, would be HRs in the other parks as well. I created a dashboard in Tableau to visualize this concept, and this post is about how I did it.

DATA
Scope: To keep the size of the data sources somewhat manageable, I used 2017 season data from March until June.

Batted balls: I used Baseball Savant for batted ball data (see my previous post on how to get Baseball Savant data into Tableau), which I filtered to records with Type = "X" (batted balls only) and the Distance column could not be Null, as a value is required for the analysis.

Ball park dimensions: I used the general park dimensions from Andrew Clem's website. Although this is limited to LF, LC, CF, RC and RF is give a generic sense of the ballpark, that would have to be good enough for my analysis. At some point I want to refine these dimensions by calculating the spray / horizontal angle for every dimension change but then to be really accurate I would also have to consider the wall height, so for now I accepted the general dimensions as a starting point. Here are all ballpark outlines based on these assumptions, with Fenway park highlighted:

I divided the field in 4 areas (L/RF, LC, RC, CF) of 22.5 degrees, and split the LF and RF (to 11.25 deg. each). I then assigned the provided values to the specific field zones resulting in the outline per ballpark, as shown on the right.


Calculations: Based on the X and Y of where the batted baseball landed (fields Hc X and Hc Y) and the distance provided, I calculated the horizontal or spray angle with straight center field being 0 degrees. Wether or not a batted ball would have been a HR in a ballpark is calculated by comparing batted ball distance and "spray"angle to thee general ballpark outlines.


CAVEATS
This dashboard does not include weather, elevation / air pressure, opposition, ball seam height and such, but strictly looks at general ballpark dimensions, also disregarding fence height (and possibly some outfielders who could have caught the ball). It is definitely my plan to continue working on this, but including these factors significantly complicates the calculations (yet also makes the result more relevant).

DASHBOARD
After bringing in the data in Tableau, I used calculated fields for actual events (results of the batted ball in the park it was hit in) and potential events (results if the ball was hit in another park, with the caveats mentioned above).
Tab 1 (batted balls dashboard) shows the actual HR top 30, and the individual batted ball results for a selected player in every ballpark, ball locations, HR ratios and HR ratios per ballpark.

For example, Paul Goldschmidt hit a sharp flyball to LF (376 ft., -14.5 spray angle) that was caught for an out. This ball would have been a HR in 63% of all ballparks. The Homerun he hit on April 5th would have been a HR in 90% of all ballparks:

Tab 2 (ballpark dimensions) explains more about the used dimensions, and shows selected batted balls for a selected player in a selected ballpark of at least a selected amount of distance. Yes this dashboard has a number of filters that can be changed by the user.
Sticking with Goldschmidt, it shows that his actual HRs would have all been HRs in Fenway, but of his 149 "Outs & errors" batted balls, 17 would have been a HR as well, as would 4 of his doubles have been:

The last tab, potential HR hitters 2017, ranks every player and his batted ball result had he played in a specific stadium the whole time; had Votto played all his games in Fenway, he would have had 44 HRs. The first non-Fenway park on the list? Minute Maid Park, again with Joey Votto.

CONCLUSION
The work I did on this dashboard mostly gave me a better understanding of the influence of ballpark dimensions on batted ball outcomes. The caveats reduce the reliability of these outcomes to a point they are highly debatable, which is why further research is required. Using more detailed park dimensions, including fence height, and perhaps finding a way to incorporate realistic schedules (rather than 162 games in one park) will improve the significance of this analysis. But hey, if StatCast can get away with excluding walls (for now) I should be able to so as well, no?

I do believe it gives some insight how trades could impact player values based on the ballpark they would move to, even if not as large of an impact suggested in these dashboard.

If you like this dashboard, or have any comments or suggestions, look me up on Twitter: @rjweise
Cheers!

No comments:

Post a Comment