Is the Explosion of Big Data in Major League Baseball a Good or Bad Thing?

analytics text

Andrew Simon authored an article for exploring the top players in five categories often cited as the “five tools” noted by baseball scouts. These are power, hitting, speed, arm strength, and defensive fielding. While all the players mentioned here are stars, what’s more interesting to me is what Simon says before the meat of the article: “Love the stats, love the game.”

For power, you don’t need StatCast to tell you that Aaron Judge, he of the American League record 62 home runs in 2022, is the best power hitter in baseball. Vladimir Guerrero Jr. is the son of a literal Hall of Fame baseball player who always hit the ball hard, so you don’t need stats to tell you that his son is capable of hitting the ball hard more consistently than anyone else.

You don’t need StatCast to tell you that now Phillies shortstop Trea Turner is the most consistently fast and efficient baserunner currently in the game. Why else would the Phillies have committed a huge chunk of their payroll to him for the next decade-plus?

On defense, though, StatCast does highlight players that already look really good, but backs these looks up with actual data. Oneil Cruz demonstrated the most powerful non-pitcher arm in baseball in the field, which is something scouts all seem to agree on, but it’s good to see that his infield throws at max effort truly are as fast as they appear.

It also isn’t surprising that J.T. Realmuto is considered the most valuable defensive player in the game behind the plate thanks to his incredible pop time. That is, the time it takes him to get the ball from his glove to the fielder covering a base on a stolen base attempt. Whereas defensive metrics have long been considered highly variable and inconsistent, even based on pretty good data, StatCast tracks every single play in real-time.

Now, here’s where I need to ask the question: is having all of this publicly available data good for baseball fans? My personal feeling is that, yes, having this data allows fans to better understand why certain players are as good as they look and even discover players who are better than they at first might look. This data also makes some stars look considerably less good, which some fans feel is counter-productive to enjoying the game.

The way I see it, if you don’t enjoy the data, then don’t look at it. Yes, many TV, radio, and streaming baseball broadcasts now include lots of StatCast data. Why shouldn’t they? It is a way to explain what’s happening backed up by simple end user benchmarks like exit velocity, launch angle, sprint speed, and Outs Above Average.

For me, the numbers actually improve my enjoyment of the game. If nothing else, the StatCast data proves just how much luck goes into a lot of hitting. The data how much bad luck good pitchers can get, and how mediocre pitchers can look better than they are in reality based on the quality of contact hitters get off of their pitches. The data is good and bad for players, who can either use the data to up their game in deficient areas or go all in on the skills they already excel at.

If Big Data has ruined your enjoyment of baseball or sports in general, I understand why. If you feel that way, you obviously long for the days where you could simply watch a game without all of these advanced metrics being thrown at you left and right. While I agree that perhaps people push the numbers a bit too hard for the casual fan to digest, the fact remains that because StatCast exists, MLB and its teams are going to get every bit they can out of it to make their players look better. But, the data is also publicly available, so it’s not like they can use these numbers to paint an inaccurate picture.

My take is that Big Data is here to stay whether we like it or not. I choose to embrace it, especially when it’s provided in the easily digestible statistical outputs. These include expected wOBA, expected batting average, expected slugging percentage, and exit velocity, and launch angle for hitters. The StatCast outputs for pitchers are the same except with expected ERA for pitchers. It’s great that we can even better quantify things like fielding, arm strength, and running speed.

In general, sports may be leaning too hard on Big Data, but it’s one of those things that if you don’t use it or appreciate it for the stories you can tell with it, then you’re missing out. While it does seem that teams are now overvaluing players for top percentile StatCast outputs rather than more balanced and
“average” ball players, keep in mind that no matter what, there will always be something the market of any field overvalues. It’s up to somebody smart to see the data for what it really is, a tool, and use it to build a better team while using the data to help their current players improve.

On the balance, I like Big Data in baseball. It’s not going anywhere. If you don’t like it, I’d like to hear why, because there’s plenty of reasons not to like Big Data. I just don’t personally agree with many of them. Still, I’ll happily hear your opinions on this topic and regard them respectfully as your opinion matters, too.

Writing words, spreading love, Amelia Desertsong primarily writes creative nonfiction articles, as well as dabbling in baseball, Pokemon, Magic the Gathering, and whatever else tickles her fancy.
Back To Top
%d bloggers like this: