Mlb Regression Analysis Data
By: Steve • Essay • 1,171 Words • November 29, 2009 • 1,964 Views
Essay title: Mlb Regression Analysis Data
Data
Log(Attendance) = B1wins + B2FCI + B3tktprice + B4payroll + B5state + B6earnspop
In order to explain the effect that winnings percentage has on attendance, I have created an adjusted economic model that I have specified above. In order to test my economic model, I have compiled data for each of the variables specified in the model from the years 2003 to 2005.
The question that I will be answering in my regression analysis is whether or not wins have an affect on attendance in Major League Baseball (MLB). I want to know whether or not wins and other variables associated with attendance have a positive impact on a team’s record. The y variable in my analysis is going to be attendance for each baseball team. I collected the data for each team’s average attendance for 2003-2005 from an internet site entitled www.baseballreference.com. The summary statistics for this variable show that the mean winning percentage for all MLB teams is 50.4 percent with a standard deviation of 7.6 percent. There is a minimum and maximum of 27 percent and 65 percent respectively. I am taking the log of attendance in order to explain relationships with the independent variables in the form of percent changes in our dependent variable.
The main independent variable that I am going to be looking at, as stated above is winning percentage, and its effects on a team’s attendance. I feel that winning percentage is positively related to attendance as a team with a higher winning percentage will be more likely to attract fans than a team with a low winning percentage. Fans want to see their teams perform well during the season and are therefore more likely to attend games when their team does so. I obtained my data for attendance from a site entitled www.baseballreference.com. The total attendance over the time period of 2003-2005 was 2,395,300 with a standard error of 686,650. There was a minimum attendance of 749,550 and a maximum total attendance of 4,090,700.
There are several other variables that I will be using in my analysis to help decide if winning percentage affects attendance. The first of these variables is the fan cost index (FCI) for each team. The fan cost index is a measure of how much the average person spends at a game. This will have a positive or negative effect on attendance depending on whether or not the index is high or low for a team. Obviously, if the FCI is low for a team, than attendance should be higher and therefore winning percentage should increase as well. A person is more likely to attend a home game if the price of going to that game is relatively low. I obtained my data for this variable from www.baseballreference.com. The summary statistics for this variable for the years 2003-2005 show an average cost of $157.08 per game with a standard deviation from the mean of $29.65. The highest average cost for a baseball team over this time was $276.24 and the lowest FCI was $100.13.
Another variable that I will be using is ticket price. Ticket price is similar to fan cost index in that it will either have a positive or negative effect on attendance depending on the average price of the tickets. If the average ticket price is low, than this should have a positive influence on attendance. However, teams with the lowest winning percentage tend to have cheaper ticket prices. I will be using a weighted average of all seats in stadium for ticket prices as I feel this gives the most accurate description of price. The data for this variable will come from a site entitled www.rodneyfort.com/SportsData. The average ticket price shown for this time period was $20.01 with a standard error of $5.88. The minimum and maximum average ticket prices were $10.08 and $44.56 respectively.
There is an issue of multicollinearity with the ticket price variable. The correlation coefficient between ticket price and FCI is .9577, which shows that these two variables may be too closely related and/or are measuring the same thing. Ticket price and Fan Cost Index may be a very strong linear combination of each other and therefore I may want to leave ticket price out of my regression analysis.
Payroll is another variable that I will be taking into consideration