|
|
April 1, 1997
The Baseball Simulator FAQ
![]()
The problems and limitations of this simulation.
A Virtual Ball Game
Simulating Away Baseball's Ennui
The Simulator
Test how different slugging percentages and times spent at bat affect the length of a baseball game with the baseball simulator.The Code
Explore the Java code that runs the simulator.Last Time
For more on computer simulations, see New-Media Tools for Online Journalism, published Oct. 9, 1996.
his baseball simulator tries to determine how long a game takes to play when the batting average changes. This can answer some questions about the major culprits in creating the long ball game and it may be a useful tool in dispelling some notions. But the process of building a simulator involves making many assumptions that may raise more questions than it answers. Here are answers to some such questions:
- How do the simulated players hit?
- What's wrong with this approximation?
- Is it an accurate approximation for what happened?
- How do the base runners advance?
- What about fielder's choices?
- What about errors?
- What about double plays?
- How do these omissions affect the results?
- What about the pitchers?
- What about relief pitchers?
- What about the pitches themselves?
- Why do some games go long?
- What sources did you use?
How do the simulated players hit?
The simulator converts the season's batting statistics into percentages. First, the total number of trips to the plate is calculated by adding together the number of official at-bats with the walks handed out. Major league baseball's statisticians don't count a walk as as at-bat today, although they did for brief periods in the past. This is not completely accurate, because some subtle features of the game have been recorded differently over the years. Batters who hit sacrifice flies, for instance, were credited with an at-bat in some years and not credited in other years.
The chances of a home run, a triple or a double are computed by dividing the number of each by the number of trips to the plate. The number of singles is computed by subtracting the number of extra base hits from the total hits.
When a hypothetical player steps to the plate, a random number between 0 and 1 is chosen. This is used to select what happens. For instance, if there were 10,000 trips to the plate and 900 of them resulted in home runs, then a home run would correspond to the numbers between 0 and 0.09. If the random number turns out to be 0.0342, then the simulated player would hit a home run.
If it is a larger number, say 0.243, then it would correspond to another action, like a triple, a double or a single. Other ranges correspond to walks or outs. The computer does not bother to distinguish between the types of outs -- a strike out is the same as a pop fly.
Back to Top What's wrong with this approximation?
Plenty. Many batters don't hit the same way each trip to the plate. A .300 hitter doesn't get a hit 3 out of every 10 times at bat. Some players react very well to pressure situations when the team is behind and the tying run is in scoring position. Reggie Jackson earned his name, "Mr. October" because he played even better in post-season play.
Back to Top Is it an accurate approximation for what happened?
No, but it is close. Sacrifice flies and batters hit by a pitch are treated in many different ways. In 1931, sacrifice flies were "abolished" in the scoring changes. In 1939, sacrifices returned, and the rules were adjusted many times.
Many sources do not regularily state the number of sacrifices, so sacrifices were left out of this simulation. For the same reason, the simulator does not factor the number of batters hit by a pitch.
Back to Top How do the base runners advance?
The simulator's rules are wooden, rigid, and perhaps a bit too conservative. For instance, if a single is hit, someone on third will automatically score, even though that might not happen every time in life. Also, someone on second will score 50 percent of the time and hold up a third of the time. Someone on first will end up on second.
If a double is hit, runners on second and third will automatically score. Any runner on first will score 50 percent of the time and end up on third 50 percent of the time. If a triple is hit, everyone on base will score.
Clearly, these numbers are only approximations. In reality, the action depends upon where the ball is hit and how well the fielder handles it. Someone from second might score from second on a single that ends up in the warning track but not on a single that was bobbled by the third baseman.
There is no way for you to input these values directly into the applet running on the Web page. The interface was already confusing enough. If you want to change the behavior, you must recompile the code.
Back to Top If a fielder decides to throw one player out over another, then it may be recorded as a fielder's choice. These are recorded by the scorers, but not generally available in any easy-to-use statistics. So the simulation ignores them.
Back to Top These may be quite important in real life, but everyone knows that "computers don't make errors." The simulation could be programmed to randomly inject errors into the play, but no good statistics were available, and they may be hard to summarize with statistics. One bobbled home run call might not make a difference, but another could change the outcome of a season. To paraphrase Tolstoy, every home run is the same, but every error is different in its own way.
Back to Top If a batter generates an out while someone is on first base, then there is a 20 percent chance that there will be a double play. This is only a poor estimate of reality because some hitters are more likely to hit into double plays. Also, double plays can happen when players are on any base. This is another area where the simulator errs through approximation.
Back to Top How do these omissions affect the results?
It's hard to be certain, but many events like a dropped third strike are relatively rare. Players are often hit by a pitch or land safely on base because of an error, but this doesn't happen many times in a single game. On average, the simulator may be missing one or two at-bats per game. If the average player spends 90 seconds at bat,then the estimated total time is short by a few minutes.
This mismeasurement is also counteracted by other omissions. In this simulator, no base runner is ever thrown out sliding into home when someone hits a single. In reality, it happens. That means there are probably several possible outs that the simulator doesn't record each game. Another batter gets to hit in the simulator when the side might have been retired. So in these cases, the estimated total time may be long by a few minutes.
It is unclear whether these effects perfectly counteract each other. They may or they may not. A better simulation would take them into account. If you can find statistics, then you're welcome to extend it. Baseball is like the "X-Files" in this regard -- the truth is out there.
Of course, these details might not be that important on average. If increasing the batting average from .230 to .266 adds only a few minutes to the game, then a few errors and outs won't make much difference.
Back to Top The simulator makes no attempt to include any details about the pitcher in calculating whether or not a hit occurs. The earned run average of the pitcher doesn't affect the results.
But the quality of the pitching is already reflected in the batting averages. If the league's pitchers are doing well, the league-wide average declines. The simulator takes account of the quality of the pitching in the league by following the skill of the batters.
Back to Top One of the current trends in baseball is to use more and more pitchers. When a new pitcher is brought into a game in the middle of the inning, the game is interrupted because the new pitcher gets to throw several warm-up pitches.
A relief pitcher is brought in either because the starting pitcher is tired or because there may be some strategic advantage. It is now common in high-pressure situations for a relief pitcher to be brought in to face one batter. Anticipating these strategic moves would require a simulator of much greater sophistication than this one. It may never be possible to convert all of the logic that a manager uses into computer code.
But some rules are easier to understand. Starting pitchers will often leave after they cross a threshold of about 110 pitches. Ideally, the simulator would take this into account. If the batting average rises, the pitchers will face more batters, throw more pitches and wear out their arm sooner. This type of pitching replacement could be predicted and included in the simulator.
The simulator, however, doesn't need to add more complexity to incorporate these details. The amount of time spent replacing a pitcher can be amortized over the amount of time the batters spend at the plate. If each batter takes 6 pitches on average, then the starter will go through 114 pitches to retire the first 19 players. If changing the pitcher adds 190 seconds to the game, then the cost of changing the pitcher can be included merely by adding 10 seconds to the time of the average trip to the plate. This amortized solution is far from ideal, but it may be just as accurate as any complicated set of logical equations that tried to anticipate when a manager might change a pitcher.
Back to Top What about the pitches themselves?
The greatest weakness of this simulator is that it isn't able to offer much insight into the battle between the pitcher and the batter. A trip to the plate can end quickly if the pitcher is able to throw one or two strikes at the beginning, in which case the hitter's options are reduced and he can't simply wait for the pitcher to throw a hitable pitch.
On the other hand, if the pitcher throws several balls at the beginning, then the hitter can relax and be choosey.
This simulator does not attempt to simulate the individual pitches themselves, in part because there are no easily accessable statistics. A better stimulator might be able to include this information in its model and incorporate its effects into the answer.
Back to Top After running this simulator several thousand times, it seems as if a high percentage of games go long into extra innings. This effect hasn't been measured, but it seems to be there.
This might happen because real baseball players don't bat the same way in extra innings when hits are even more crucial. In the simulator, the robotic players always do the same thing. Perhaps the closeness of the game drives the better team to double up its efforts and win the game faster.
This effect could be important because it obviously controls the length of the longest game and adds to the average.
Back to Top Total Baseball: The Ultimate Encyclopedia of Baseball edited by John Thorn and Pete Palmer (Harper Collins, 1993).
Baseball Guide from the Sporting News. Annual editions.
|
|