Today:
 Review
Next Time:
 Final: Monday, March 19th at 8:00 a.m.
Today:
Next Time:
Posted by Mark Thoma on Wednesday, March 14, 2012 at 05:49 PM in Lectures, Winter 2012  Permalink  Comments (1)
We covered the following topics in the course:
1. Assumptions required for estimates to be BLUE
2. Hypothesis testing:
a. t tests (both onesided and twosided)
b. Joint hypotheses (FTests, Chisquare tests, etc.)i. Exclusion restrictions
ii. Linear combinations of parameters
3. Heteroskedasticity
a. What is heteroskedasticity?
b. How heteroskedasticity occur?
c. The consequences of estimating a heteroskedastic model with OLS
d. Testsi. LaGrange Multiplier Tests (Models 1, 2, and 3)
Model 1:
Model 2:
Model 3:ii. GoldfeldQuandt test
iii. White's teste. Corrections/Estimation procedures
i. Feasible GLS
Model 1:
Model 2:
Model 3:iii. White’s correction
8. Autocorrelation
a. What is it and why might it occur?
b. Consequences of ignoring serial correlation and estimating with OLSi. Model with serially correlated errors (model 1 in class)
ii. Model including a lagged dependent variable (model 2 in class)
iii. Model with both a lagged dependent variable and serial correlated errors (model 3 in class)c. Tests for serial correlation
i. The DurbinWatson test.
ii. Durbin's htest.
iii. The BreuschGodfrey test for higher order serial correlation.d. Corrections
i. Nonlinear estimation
9. Testing for ARCH errors
10. Stochastic Regressors and Measurement Errors
a. Assessing the bias and consistency of an estimator
b. Errors in variablesi. Consequences of estimating with OLS when there are errors in measuring the righthand side variables (i.e. errors in measuring the independent variables, the X's).
ii. Consequences of estimating with OLS when there are errors in measuring the dependent variable (i.e. in the measurement of Y).
iii. Application of errors in variables: Friedman's Permanent Income Hypothesis.c. Instrumental variable estimation
i. What is an instrument.
ii. How is IV performed?
iii. Show how IV estimation can solve the problem of correlation of the righthand side variables with the error term.
11. Simultaneous equation models
a. Structural equations (behavioral, identities, equilibrium conditions, technical equations) and reduced form equations. Endogenous, exogenous, and predetermined variables.
b. Consequences of ignoring simultaneity, i.e. demonstrate simultaneity bias.
c. Underidentified models, exactly identified models, and overidentified models
d. Estimation by 2SLS
12. Multicollinearity
a. What is multicollinearity and how does it affect OLS estimates and standard errors?
b. Detection of multicollinearity
c. What to do for perfect and imperfect multicollinearity.
13. Specification tests
a. LM test for adding a variable to a model
b. LM test for adding a variable in a system of equations
c. LM test for serial correlation in a system of equations
d. AIC and SIC criterion
14. Qualitative and limited dependent variables
a. Linear probability model
i. description of model, problems, and estimation
b. Probit model
i. description of model and estimation
c. Logit model
i. description of model, attractive properties, and estimation
d. Limited dependent variables
i. description of the model when the dependent variable is limited, problems with OLS, and estimation
15. Maximum likelihood
a. Brief description of what maximum likelihood estimation does.
b. properties of maximum likelihood estimators.
Posted by Mark Thoma on Tuesday, March 13, 2012 at 01:20 PM in Review, Winter 2012  Permalink  Comments (0)
1. [Practice problem] (a) After the data have been read in:
Do OLS on
COt = β_{0} + β_{1}YD_{t} + β_{2}CO_{t1} + ε_{1t}
by running
with the result
Next, do OLS on
I_{t} = β_{3} + β_{4}Y_{t} + β_{5}r_{t1} + ε_{2t}
by running
with the result
(b) Use 2SLS to estimate
COt = β_{0} + β_{1}YD_{t} + β_{2}CO_{t1} + ε_{1t}
by running
with the result
Next, use 2SLS to estimate
I_{t} = β_{3} + β_{4}Y_{t} + β_{5}r_{t1} + ε_{2t}
by running
with the result
Posted by Mark Thoma on Tuesday, March 13, 2012 at 01:17 PM in Homework, Winter 2012  Permalink  Comments (0)
Today:
Next Time:
Posted by Mark Thoma on Monday, March 12, 2012 at 08:17 PM in Lectures, Winter 2012  Permalink  Comments (0)
1. See here, second problem.
2. (a) Q equation: under ID, (b) P equation: exactly ID, (c) Y equation Over ID.
3. Graded individually.
Posted by Mark Thoma on Thursday, March 08, 2012 at 10:22 PM in Homework, Winter 2012  Permalink  Comments (0)
Today:
Next Time:
Posted by Mark Thoma on Wednesday, March 07, 2012 at 08:53 PM in Lectures, Winter 2012  Permalink  Comments (0)
Today:
Next Time:
Posted by Mark Thoma on Monday, March 05, 2012 at 08:13 PM in Lectures, Winter 2012  Permalink  Comments (0)
1. Finish your project. It is due at the beginning of class on Thursday, March 15.
Practice problem (this will NOT be graded): Consider the following simple Keynesian macroeconomic model of the U.S. economy. [Macro data set]
Y_{t} = CO_{t} + I_{t} + G_{t} + NX_{t}
COt = β_{0} + β_{1}YD_{t} + β_{2}CO_{t1} + ε_{1t}
YD_{t} = Y_{t} – T_{t}
I_{t} = β_{3} + β_{4}Y_{t} + β_{5}r_{t1} + ε_{2t}
r_{t} = β_{6} + β_{7}Y_{t} + β_{8}M_{t} + ε_{3t}
where:
Y_{t} = gross domestic product (GDP) in year t
CO_{t} = total personal consumption in year t
I_{t} = total gross private domestic investment in year t
G_{t} = government purchases of goods and services in year t
NX_{t} = net exports of goods and services (exports  imports) in year t
T_{t} = taxes in year t
r_{t} = the interest rate in year t
M_{t} = the money supply in year t
YD_{t} = disposable income in year t
Endogenous variables: Y_{t}, YD_{t}, CO_{t}, I_{t}, r_{t},
Exogenous and predetermined variables: G_{t}, NX_{t}, T_{t}, M_{t}, CO_{t1}, and r_{t1}
(a) Using OLS, estimate equations for CO_{t} and I_{t}.
(b) Using 2SLS, estimate equations for CO_{t} and I_{t}.
Posted by Mark Thoma on Monday, March 05, 2012 at 03:36 PM in Homework, Winter 2012  Permalink  Comments (0)
Here's a few general guidelines to help with the writeup of your empirical project. Let me stress once again that your main goal for the project is to show that you understand how to use the tools and techniques we learned in class:
1. Introduction
Introduce the problem and discuss the question you are trying to answer with your empirical project.
2. Theory and Hypotheses
Discuss the theory underlying your model and state the hypotheses you are going to test. You should also state the significance levels you will use in your tests.
3. Empirical Model and Data
Present the empirical model you are using to test your hypotheses. This is where specification issues should be addressed. For example, did you log your data? Did you include squared terms or interactions? Are there any important omitted variables? If so, what are the consequences? Did you use tests to see if variables you weren’t sure about belong in the model? You should also discuss the data and data sources in this section.
4. Violations of Assumptions
At this point, you have the basic empirical model specified and you have discussed specification issues. You should now discuss potential violations of the GuassMarkov conditions. The goal is to test for heteroskedasticity or serial correlation, and then either correct your model for the problem if it exists, or describe how you would have corrected the model had you found a problem. There are direct tests for heteroskedasticity and autocorrelation, but you should also discuss any other notable violations of the assumptions that may be present in your model and how those will be handled or accounted for. For example, are measurement errors a problem? Do you need to use instrumental variables to solve any endogeneity problems?
5. Results
Now that you have described the specification of the model, and described how you checked and corrected for any problems that exist, you are now ready to present estimates of your final model. After presenting the final estimates, you should discuss the overall fit of the model, and interpret the coefficients. What do the coefficients tell you? This is also the section where you should present the test results for the hypotheses you are examining, and then discuss the results.
6. Conclusion
What did you learn? Did the data support your hypotheses? How could you improve the model? What could you do in a followup study to learn more about this topic?
Posted by Mark Thoma on Friday, March 02, 2012 at 02:26 PM in Empirical Project, Winter 2012  Permalink  Comments (0)
Posted by Mark Thoma on Wednesday, February 29, 2012 at 08:05 PM in Homework, Winter 2012  Permalink  Comments (0)
Today:
Next Time:
Posted by Mark Thoma on Wednesday, February 29, 2012 at 07:34 PM in Lectures, Winter 2012  Permalink  Comments (0)
Today:
Next Time:
Posted by Mark Thoma on Monday, February 27, 2012 at 08:03 PM in Lectures, Winter 2012  Permalink  Comments (0)
This is due in lab on either 3/5 or 3/7.
1. Problem 9.4 on page 338 (problem 9.3, page 276 of the previous edition).
2. For the model
Q_{t} = a_{0} + a_{1}P_{t} + a_{2}Y_{t} + a_{3}X_{t} + a_{4}Z_{t} + u_{t}P_{t} = b_{0} + b_{1}Q_{t} + b_{2}Y_{t} + b_{3}W_{t} + v_{t}Y_{t} = c_{0} + c_{1}P_{t} + c_{2}W_{t} + w_{t}
Determine whether each equation is under, exactly, or over identified. Assume that Q, P, and Y are endogenous, and that the constant, X, Z, and W are exogenous.
3. Answer the following questions about your project:
(i) Do you expect any measurement problems, i.e. do you expect to have errors in variables problems? If so, what effect will that have on your estimates and test statistics (if you don't think this will be a problem, say that and explain why, and then say, but if I did have this problem it would cause the following difficulties and then describe the effect it would have on the estimates and test statistics). How can the problem be fixed?
(ii) Are there any important omitted variables? If so, what effect would the omitted variables have on the estimates and test statistics? (And again, even if you think you have every important variable, show that you understand this issue by explaining what types of problems it causes).
(iii) Do you expect problems with endogeneity bias (endogenous variables on the righthand side of the equation that are correlated with the error term)? Think hard about this one, and if you do have this problem, what is the solution?
Posted by Mark Thoma on Monday, February 27, 2012 at 03:18 PM in Homework, Winter 2012  Permalink  Comments (0)
1. There is not general solution to this one, each answer was graded individually based upon how well you completed the steps in the project outline.
2. The answer to problem 8.3 is in this pdf file.
3. What are the three requirements for a good instrumental variable?
An instrument should be (i) uncorrelated with the error term, (ii) correlated with the variable it is instrumenting for, and (iii) it should not be an explanatory variable itself.
Posted by Mark Thoma on Friday, February 24, 2012 at 11:58 AM in Homework, Winter 2012  Permalink  Comments (0)
Today:
Next Time:
Posted by Mark Thoma on Wednesday, February 22, 2012 at 08:03 PM in Lectures, Winter 2012  Permalink  Comments (0)
Today:
Next Time:
Posted by Mark Thoma on Monday, February 20, 2012 at 07:00 PM in Lectures, Winter 2012  Permalink  Comments (0)
Posted by Mark Thoma on Monday, February 20, 2012 at 02:14 PM in Midterms, Winter 2012  Permalink  Comments (0)
1. Problem 8.5 on page 313 (page 253 in the previous edition), part 1 only.
2. Consider the following simple Keynesian macroeconomic model of the U.S. economy.
Y_{t} = C_{t} + I_{t} + G_{t} + NX_{t}
C_{t} = β_{0} + β_{1}YD_{t} + β_{2}C_{t1} + ε_{1t}
YD_{t} = Y_{t} – T_{t}
I_{t} = β_{3} + β_{4}Y_{t} + β_{5}r_{t1} + ε_{2t}
r_{t} = β_{6} + β_{7}Y_{t} + β_{8}M_{t} + ε_{3t}
where:
Y_{t} = gross domestic product (GDP) in year t
C_{t} = total personal consumption in year t
I_{t} = total gross private domestic investment in year t
G_{t} = government purchases of goods and services in year t
NX_{t} = net exports of goods and services (exports  imports) in year t
T_{t} = taxes in year t
r_{t} = the interest rate in year t
M_{t} = the money supply in year t
YD_{t} = disposable income in year t
The endogenous variables are Y_{t}, C_{t}, I_{t}, YD_{t}, and r_{t}. The exogenous and predetermined variables are G_{t}, NX_{t}, C_{t1}, T_{t}, r_{t1}, and M_{t}. Find the reduced form equations for this model.
3. (a) For your project, what econometric model do you plan to estimate and what hypothesis or hypotheses do you plan to test? (b) Depending upon whether your data are timeseries or crosssectional, test the model for autocorrelation or heteroskedasticity. (c) If you find a problem with either, explain explicitly how you plan to correct for it. If the tests do not indicate a problem, explain how you would have corrected for the problem had the test come out the other way (that is, no matter how the test comes out, explain how to correct for the problem of heteroskedasticity or autocorrelation as appropriate for your model. You don't have to actually do the correction for this homework (though if it was present, you would do the correction for the project), just explain how to do it.).
Posted by Mark Thoma on Monday, February 20, 2012 at 11:33 AM in Homework, Winter 2012  Permalink  Comments (0)
Today:
Next Time:
Posted by Mark Thoma on Wednesday, February 15, 2012 at 11:55 PM in Lectures, Winter 2012  Permalink  Comments (0)
Homework 5
Due in lab week 7 (the week of Feb. 20th)
1. Complete steps 1 through 3 of the Empirical Project Outline (as discussed in class).
2. Problem 8.3 on page 312 (page 253 in the previous edition).
3. What are the three requirements for a good instrumental variable?
[Note: The *next* homework will ask: (a) For your project, what econometric model do you plan to estimate and what hypothesis or hypotheses do you plan to test? (b) Depending upon whether your data are timeseries or crosssectional, test the model for autocorrelation or heteroskedasticity. (c) If you find a problem with either, explain explicitly how you plan to correct for it. If the tests do not indicate a problem, explain how you would have corrected for the problem had the test come out the other way (that is, no matter how the test comes out, explain how to correct for the problem of heteroskedasticity or autocorrelation as appropriate for your model).]
Posted by Mark Thoma on Wednesday, February 15, 2012 at 03:56 PM in Empirical Project, Homework, Winter 2012  Permalink  Comments (0)
Midterm today
Posted by Mark Thoma on Tuesday, February 14, 2012 at 12:00 PM in Lectures, Winter 2012  Permalink  Comments (0)
Economics 421/521
Winter 2012
Solution to Homework #4
1. Perform a DurbinWatson test at the 5% level of significance for positive firstorder autocorrelation using the following regression output (standard errors in parentheses):
Y_{t} = 2.0 + 3.7*X_{1t}  4.4*X_{2t}, T = 42
(.7) (1.1) (2.8) DW = 1.22
At the 5% level of significance, the critical value of the DW statistic is (approximately, with interpolation) 1.41 at the lower end and 1.61 at the upper end. Since 1.22 is smaller than d_{L} = 1.41, we reject that ρ=0.
2. Recall the model from homework 1:
Given data on M2, real GDP, and the Tbill rate, estimate the following regression...:
M_{t} = β_{0} + β_{1}RGDP_{t} + β_{2}Tbill_{t} + e_{t}
Don't be surprised if the fit is very good  we'll explain why that may be misleading later in the course.
Here's the output:
Does model suffer from serial correlation? Use a DurbinWatson test to answer the question.
Yes, definitely. The value of d_{L} (5%) is, approximately, 1.63. The test statistic of .02891 is far below this value, so we reject that ρ=0.
Is the fit as good as the R^{2} and tstatistics indicate?
The tstatistics are biased upward so the fit is not nearly as good as the tstatistics might lead you to believe (which would be evident if we corrected for it). This is due to biased estimates of the residuals. So the the R^{2} and tstatistics give a misleading picture of how well the model fits the data.
3. Regress the change in the log of real consumption (C) on the change in the log of real disposable income (DI) and test for serial correlation using a DurbinWatson test. The data are here (the data are quarterly, and span the time period 1947:Q1  2007:Q3).
The first step is to log both consumption and disposable income, then difference. The transformed variables are named dlogc and dlogy below (e.g. dlogy = log(di)log(di(1)). Then, regress dlogc on a constant and dlogy:
Here is the regression output:
The critical value of the DurbinWatson statistic is approximately d_{L} = 1.65 and d_{U} = 1.69 (there are tables that go beyond T=100, but the value doesn't change much after 100 observations. I used the value for 100 observations from the table in the text). For a test of negative serial correlation the values are d_{L} = 4.001.65 = 2.35 and d_{U} = 4.001.69 = 2.31. The null of no positive serial correlation would not be rejected since the test statistic, 2.28, is above 1.65. the rejection point. Similarly, a null of no negative serial correlation cannot be rejected since 2.28 is less than 2.31, but it's a close call.
4. Explain why the DurbinWatson statistic is always between 0 and 4. Also explain why the DurbinWatson statistic is between 0 and 2 when there is positive serial correlation, between 2 and 4 when there is negative serial correlation, and equal to 2 when there is no correlation at all.
Start with the demonstration that d, the DurbinWatson statistic, is approximately (22ρ) in large samples:
[Click on figure for larger version]
Now, since ρ must lie between 1 and 1, the DurbinWatson statistic must lie between 0 and 4 [since 22(1)=0 and 22(1)=4].
To see the last part, note that when ρ=0, d=2. Then, as ρ increases from 0 to 1, d moves from 2 to 0, and as ρ moves from 0 to 1, d moves from 2 to 4.
5. Continuing with the model we used in problem 2, test for the presence of fourth order serial correlation.
To do this problem, first regress m2 on a constant, rgdp, and tbillrate:
[Click on figure for larger version]Save the residuals (I saved the resid series as uhat). Regress the estimated residual on four lags of the estimated residual and the other variables in the model (no constant), i.e. regress uhat_{t} on uhat_{t1}, uhat_{t2}, uhat_{t3}, uhat_{t4}, tbillrate, and rgdp:
[Click on figure for larger version]Finally, form the test statistic (TP)R^{2}, where T is the number of observations and P is the number of lags. This is distributed χ^{2}(4). The critical value for this test at 5% level of significance is 9.49.
The test statistic is (191)(.972914) = 185.83, so reject that there is no serial correlation in the model.
6. Continuing with the model we used in problem 3, use the AR(1) procedure in EViews to correct the model for the presence of firstorder serial correlation.
First, note that the test above did not indicate the presence of serial correlation, so technically there is no need for a correction. However, it's still worthwhile to do as a practice exercise.
To do this problem, just add the ar(1) command to the estimation shown in problem 3:
With the result:
Posted by Mark Thoma on Thursday, February 09, 2012 at 10:25 PM in Homework, Winter 2012  Permalink  Comments (0)
Today:
Next Time:
Posted by Mark Thoma on Wednesday, February 08, 2012 at 07:31 PM in Lectures, Winter 2012  Permalink  Comments (0)
[Note: Previous midterms are here.]
1. Assumptions required for estimates to be BLUE
2. Hypothesis testing:
a. t tests (both onesided and twosided)
b. Joint hypotheses (FTests, Chisquare tests, etc.)i. Exclusion restrictions
ii. Linear combinations of parameters
3. Heteroskedasticity
a. What is heteroskedasticity?
b. How heteroskedasticity occur?
c. The consequences of estimating a heteroskedastic model with OLS
d. Testsi. LaGrange Multiplier Tests (Models 1, 2, and 3)
Model 1:
Model 2:
Model 3:ii. GoldfeldQuandt test
iii. White's teste. Corrections/Estimation procedures
i. Feasible GLS
Model 1:
Model 2:
Model 3:iii. White’s correction
8. Autocorrelation
a. What is it and why might it occur?
b. Consequences of ignoring serial correlation and estimating with OLSi. Model with serially correlated errors (model 1 in class)
ii. Model including a lagged dependent variable (model 2 in class)
iii. Model with both a lagged dependent variable and serial correlated errors (model 3 in class)c. Tests for serial correlation
i. The DurbinWatson test.
ii. Durbin's htest.
iii. The BreuschGodfrey test for higher order serial correlation.d. Corrections
i. Nonlinear estimation
9. Testing for ARCH errors
10. Stochastic Regressors and Measurement Errors
a. Assessing the bias and consistency of an estimator
b. Errors in variablesi. Consequences of estimating with OLS when there are errors in measuring the righthand side variables (i.e. errors in measuring the independent variables, the X's).
ii. Consequences of estimating with OLS when there are errors in measuring the dependent variable (i.e. in the measurement of Y).
Posted by Mark Thoma on Wednesday, February 08, 2012 at 07:31 PM in Review, Winter 2012  Permalink  Comments (0)
Today:
Next Time:
Posted by Mark Thoma on Monday, February 06, 2012 at 07:39 PM in Lectures, Winter 2012  Permalink  Comments (0)
Economics 421/521
Winter 2012
Solution to Homework 3
1. Using this data set, repeat the example from class for the first of the three cases we discussed, i.e. first regress the log of salary on a constant and the two variables proxying for experience, years and years^{2}:
log(salary) = β_{0} + β_{1}*years + β_{2}*years^{2} + u_{t}
Then, form the estimated residual squared (resid^{2}) and perform the LM test for heteroskedasticity (note: resid is the estimated value of u_{t}).
To do this problem, first read the data in from the Excel spread sheet:
Then, transform the data to get the log of salary and years squared:
Regress the log of salary on a constant, years, and years squared:
To do the test, we need the square of the residuals from this regression:
Regress the squared residuals on a constant, years, and years squared:
The results are:
The test statistic is
NR^{2} = 222*.0747 = 16.59
This is distributes ChiSquare with 2 degrees of freedom, so the critical value (5%) is 5.99. Therefore, the null of no heteroskedasticity is rejected.
2. To do this problem, first regress the log of salary on a constant, years, and years squared:
Next, square the residuals to get an estimate of the variance for each observation, uhatsq:
Regress this on a constant, years, and years squared (this is the model of the variance):
Next, we need the predicted value of the variance. To get the predicted value, we can simply subtract the estimated residuals from the lefthand side variables (this uses that actual Y = predicted Y + predicted error):
Or, you can get exactly the same values by using the forecast button on the regression output:
The next step is to ensure that all the estimates of the variance are positive. If any are negative, they should be replaced by their absolute values:
We need the square root of this value to use in transforming the original data:
Use this value to transform the data:
Use the transformed values to obtain BLUE estimates
The result is:
3. The test statistic is N*R^{2}. In this case, N=30 and R^{2}=.9878 so that the test statistic is 29.624. The 5% critical value for this test is 11.07 (the test is ChiSquare with 5 degrees of freedom, 5 because if the error is homoskedastic, then the coefficients on G, Y, their squared values, and their crossproduct must all be zero). Because the test statistic exceeds the critical value, the null of homoskedasticity is rejected.
4. Here is the uncorrected regression from the last homework:
To do White's test, click the view button, then as follows:
Here's the result:
Looking at the line with Obs*Rsquared, we see that the probability is les than .05, the significance level, hence the null of no heteroskedasticity is rejected.
To correct the model, run a regression as usual:
Click on options to bring up this screen and click the boxes as shown:
Hit OK to return to this screen:
Hit OK to get this output:
Notice that the coefficient estimates are identical (compare to the original values given above), White's correction fixes the standard errors after the regression is estimated, but it doesn't change the estimates.
5. What are the consequences of estimating an autoregressive model using OLS?
The coefficients remain unbiased, but OLS is inefficient, and OLS results in biased estimates of the standard errors (so the test statistics, e.g. t's and F's, are wrong)
Posted by Mark Thoma on Sunday, February 05, 2012 at 11:37 AM in Homework, Winter 2012  Permalink  Comments (0)
Today:
Next Time:
Posted by Mark Thoma on Wednesday, February 01, 2012 at 07:06 PM in Lectures, Winter 2012  Permalink  Comments (0)
Here is a brief outline of the project. We will talk more about this in class:
1. Statement of theory and hypothesis
2. Specification of the econometric model
3. Obtain data
4. Estimation of the econometric model and diagnostic tests
5. Test hypotheses
6. Forecasting or prediction
It will take longer than you think to do the estimation stage, so give yourself plenty of time. When the project is finished, it may or may not turn out the way you hoped. That's okay, you will not be graded on how clever you are at finding an interesting hypothesis to investigate, or on whether you find out anything particularly noteworthy when you are done, though you might. The goal is for you to illustrate that you know how to use the tools and techniques that we learn in class, and that is the basis for the evaluation of the projects.
Posted by Mark Thoma on Wednesday, February 01, 2012 at 07:06 PM in Empirical Project, Winter 2012  Permalink  Comments (0)
Today:
Next Time:
Posted by Mark Thoma on Monday, January 30, 2012 at 07:35 PM in Lectures, Winter 2012  Permalink  Comments (0)
Economics 421/521
Winter 2012
Solution to Homework #1
Part I. Hypothesis Testing
1. Suppose that you estimate a model of house prices to determine the impact of having beach frontage on the value of a house. You do some research, and you decide to use the size of the lot instead of the size of the house for a number of theoretical and data availability reasons. Your results (standard errors in parentheses) are:
PRICE_{i} = 40 + 35.0 LOTi – 2.0 AGEi + 10.0 BEDi – 4.0FIREi + 100 BEACHi
(29) (5.0) (1.1) (10.0) (3.0) (9.0)
n = 30, R^{2} = .63
where,
PRICEi = the price of the ith house (in thousands of dollars)
LOTi = the size of the lot of the ith house (in thousands of square feet)
AGEi = the age of the ith house in years
BEDi = the number of bedrooms in the ith house
FIREi = a dummy variable for a fireplace (1 = yes for the ith house)
BEACHi = a dummy for having beach frontage (1 = yes for the ith house)
a) You expect the variables LOT, BED, and BEACH to have positive coefficients. Create and test the appropriate hypotheses to evaluate these expectations at the 5 percent level.
For LOT;
Ho: βLOT = 0
Ha: βLOT > 0t – score: (35.0) / (5.0) = 7.0
tcritical: 1.711 because d.f. is 24 and 5% level of significance.
Since 7.0 > 1.711, we can reject the null hypothesis that the true coefficient of LOT is not positive.
For BED;
Ho: βBED = 0
Ha: βBED > 0t – score: (10.0) / (10.0) = 1.0
tcritical: 1.711 because d.f. is 24 and 5% level of significance.
Since 1.0 < 1.711, we cannot reject the null hypothesis that the true coefficient of BED is not positive.
For BEACH;
Ho: βBEACH = 0
Ha: βBEACH > 0t – score: (100) / (0.9) = 11.1
tcritical: 1.711 because d.f. is 24 and 5% level of significance.
Since 10.0 > 1.711, we can reject the null hypothesis that the true coefficient of BEACH is not positive.
b) You expect AGE to have a negative coefficient. Create and test the appropriate hypothesis to evaluate these expectations at the 10 percent level.
For AGE;
Ho: βAGE = 0
Ha: βAGE < 0t – score: (  2.0) / (1.1) =  1.81
tcritical: 1.318 because d.f. is 24 and 10% level of significance.
Since   1.81  > 1.318, we can reject the null hypothesis that the true coefficient of AGE is not negative.
c) At first you expect FIRE to have a positive coefficient, but one of your friends says that fireplaces are messy and are a pain to keep clean, so you are not sure. Run a twosided ttest around zero to test these expectations at the 5 percent level.
For FIRE;
Ho: βFIRE = 0
Ha: βFIRE ≠ 0t – score: (  4.0 ) / (3.0) =  1.3
tcritical:: 2.064 because d.f. is 24 and 5% level of significance.
Since   1.3  < 2.064, we cannot reject the null hypothesis that the true coefficient of FIRE is not different from zero.
2. Consider the following regression:
log(Qci) = 921.6 – 1.3log(Pci) + 0.7log(Pai) + 11.4log(Inci)
(121) (0.3) (0.05) (2.8)
n = 30, R2 = 0.82
where,
Qci = the total sales of CAMRY in the ith city, the year of 2003
Pci = the price of a CAMRY in the ith city, the year of 2003 (in thousands)
Pai = the price of a ACCORD in the ith city, the year of 2003 (in thousands)
Inci = the average income in the ith city, the year of 2003 (in thousands)
Numbers in the parentheses are standard errors.
a) What does the constant term (= 921.6) mean?
The constant term means is the value of the dependent variable when all the independent variables are zero.
b) How would you interpret the coefficient on log(Pci). Be explicit and explain, in terms of economic theory, the importance of its magnitude.
Totally differentiate the estimated equation w.r.t. Qci and Pci to obtain the elasticity:
Elasticity. 1.3 > 1, so elastic. If the price of CAMRY increases by 1%, then the total sales of CAMRY decreases by 1.3%. (Thus, the coefficient gives the elasticity.)
c) Get tvalues of each of all coefficients in the regression. Are all of our coefficients statistically significant at the 5% level of significance? How about at the 1% level of significance?
Pci Pai Inci
tscore 4.3 14. 4.08Twotailed test at the 5% level of significance, tcritical = 2.056 because d.f. is 26.
Twotailed test at the 1% level of significance, tcritical = 2.779 because d.f. is 26.Therefore, all of our coefficients statistically significant at the both 1% and 5% level of significances.
d) Interpret R2. What does it mean?
R2 tells how well the sample regression line fits the data.
(TSS = ESS + RSS)
Thus, it is the ratio of the explained variation in the dependent variable divided by the total variation and hence measures the percentage of the total variation that is explained by the variable sin the model.
Part II. Short Answer
1. State the GaussMarkov Theorem and explain the term BLUE.
Given the classical assumptions (model is linear and correctly specified, X's are exogenous, no perfect multicollinearity, error has zero mean, homoskedasticity, errors are independent, errors and X's are uncorrelated, errors normally distributed), the OLS estimator is the minimum variance estimator from among the set of all linear unbiased estimators. OLS is BLUE. This means it is the Best (minimum variance) Linear Unbiased Estimator.
Part III. Estimation
1. Given data on M2, real GDP, and the Tbill rate, estimate the following regression and test whether the coefficients differ from zero. Do the coefficients have the expected signs?:
M_{t} = β_{0} + β_{1}RGDP_{t} + β_{2}Tbill_{t} + e_{t}
To do this problem, first create a new workfile by following these steps (some of the figures are popups that bring up larger, clearer versions):
Next, read in the data set (save it to your computer first by right clicking on the link in the homework set):
Finally, run the regression of M2 on a constant, RGDP and the TBillRate:
Running this regression gives:
The critical value at the 1% level is 2.576, thus the coefficients are significant. They also have the expected sign: People tend to demand more money as income (RGDP) increases. They also tend to demand less money as the Tbill rate increases – this is because as the Tbill rate goes up, the opportunity cost to holding money also goes up.
Economics 421/521
Winter 2012
Solution to homework #2
1. Using the EAEF data set, regress LGEARN on S, EXP, and ASVABC. Use Ftests to determine whether the coefficients on S and EXP are (a) jointly significant, and (b) equal. [Parts (a) and (b) are two separate tests.]
First, read in the data:
Then, take the log of earnings:
Next, estimate the unrestricted model This is needed for both parts (a) and (b). To estimate the UR model, regress lnearnings on a constant, exper, and asvabc:
The results are:
(a) The restricted model for the null hypothesis that the coefficients on s and exper are zero is:
F= [(149.31130.35)/2]/[130.35/(5404) =38.98
The critical value for this test is F(2,536)= 3.05 (approx. for 536), so reject that both coefficients are zero.
(b) For this part, the null hypothesis is that the coefficients on s and exper are equal. To impose this on the model, start with:
lnsalary = b_{1} + b_{2}*s + b_{3}*exper + b_{4}*asvabc + u
Then impose that b_{2}=b_{3}:
lnsalary = b_{1} + b_{2}*s + b_{2}*exper + b_{4}*asvabc + u
Group terms:
lnsalary = b_{1} + b_{2}*(s + exper) + b_{4}*asvabc + u
This is the model we need to estimate. Thus, the first step is to obtain data on the sum of s and exper:
Run the restricted regression:
Here are the results:
Finally, use these to calculate the Fstatistic:
F= [(138.29130.35)/1]/[130.35/(5404) = 32.65
The critical value for this test is F(1,536)= 3.90 (approx.), so reject that the coefficients are equal.
2. Problem 7.1 in the text.
In this case the F statistic is simply the ratio of the residual sum of squares (more generally it is the rss divided by nk, i.e. F = [rss_{2}/(n_{2}k)]/ [rss_{1}/(n_{1}k)], where n_{1} and n_{2} are the number of observations in each sample, but when n_{1}=n_{2} the terms cancel):
F=28,101/321 = 87.54
The critical value (5%) for and F(8,8) = 3.44, so the null of homoskedasticity is rejected.
3. Moved to the next homework.
Posted by Mark Thoma on Monday, January 30, 2012 at 07:29 PM in Homework, Winter 2012  Permalink  Comments (0)
Economics 421/521
Winter 2012
Homework #4
Due in lab next week
1. Perform a DurbinWatson test at the 5% level of significance for positive firstorder autocorrelation using the following regression output (standard errors in parentheses):
Y_{t} = 2.0 + 3.7*X_{1t}  4.4*X_{2t}, T = 42
(.7) (1.1) (2.8) DW = 1.22
2. Recall the model from homework 1:
Given data on M2, real GDP, and the Tbill rate, estimate the following regression...:
M_{t} = β_{0} + β_{1}RGDP_{t} + β_{2}Tbill_{t} + e_{t}
Don't be surprised if the fit is very good  we'll explain why that may be misleading later in the course.
Does model suffer from serial correlation? Use a DurbinWatson test to answer the question. Is the fit as good as the R^{2} and tstatistics indicate?
3. Regress the change in the log of real consumption (C) on the change in the log of real disposable income (DI) and test for serial correlation using a DurbinWatson test. The data are here (the data are quarterly, and span the time period 1947:Q1  2007:Q3).
4. Explain why the Durbin Watson statistic is always between 0 and 4. Also explain why the DurbinWatson statistic is between 0 and 2 when there is positive serial correlation, between 2 and 4 when there is negative serial correlation, and equal to 2 when there is no correlation at all.
5. Continuing with the model we used in problem 2, test for the presence of fourth order serial correlation.
6. Continuing with the model we used in problem 3, use the AR(1) procedure in EViews to correct the model for the presence of firstorder serial correlation.
Posted by Mark Thoma on Friday, January 27, 2012 at 11:51 AM in Homework, Winter 2012  Permalink  Comments (0)
Today:
Next Time:
Posted by Mark Thoma on Wednesday, January 25, 2012 at 10:13 PM in Lectures, Winter 2012  Permalink  Comments (0)
Here is an outline of the LM tests for Heteroskedasticity:
Posted by Mark Thoma on Wednesday, January 25, 2012 at 02:26 AM in Review, Winter 2012  Permalink  Comments (0)
Today:
Next Time:
Posted by Mark Thoma on Monday, January 23, 2012 at 09:14 PM in Lectures, Winter 2012  Permalink  Comments (1)
[Note: The assignment will be discussed in lab this week, and will be due in lab next week.]
Economics 421/521
Winter 2012
Homework #3
1. Problem 3 from Homework 2 (the problem that was canceled on the last set).
2. Using the first model of heteroskedasticity, i.e. that resid^{2} = α_{0} + α_{1}*years + α_{2}*years^{2}, correct the salary model in problem 3 from Homework 2 for heteroskedasticity and reestimate.
3. Problem 7.2 in the text.
4. Test the salary model in problem 3 from Homework 2 for heteroskedasticty using White's test. Correct the standard errors using White's correction. How do the coefficients and corrected standard errors compare to those obtained in problem 3 of Homework 2?
5. What are the consequences of estimating an autoregressive model using OLS?
[Note: pdf of problems 7.1 and 7.2, problem 7.1 was on the last homework.]
Posted by Mark Thoma on Monday, January 23, 2012 at 03:15 PM in Homework, Winter 2012  Permalink  Comments (0)
Today:
Next Time:
Posted by Mark Thoma on Thursday, January 19, 2012 at 10:50 AM in Lectures, Winter 2012  Permalink  Comments (0)
Today:
Next Time:
Posted by Mark Thoma on Monday, January 16, 2012 at 06:07 PM in Lectures, Winter 2012  Permalink  Comments (0)
Today:
Next Time:
Posted by Mark Thoma on Wednesday, January 11, 2012 at 07:18 PM in Lectures, Winter 2012  Permalink  Comments (0)
[Note: Homework 1 and homework 2 will both be due in lab in week 3.]
Economics 421/521
Winter 2012
Homework #1
Part I. Hypothesis Testing
1. Suppose that you estimate a model of house prices to determine the impact of having beach frontage on the value of a house. After researching the problem, you decide to use the size of the lot instead of the size of the house as your explanatory variable for a number of theoretical and data availability reasons. The results (standard errors in parentheses) are:
PRICEi = 40 + 35.0 LOT_{i} – 2.0 AGE_{i} + 10.0 BED_{i} – 4.0 FIRE_{i} + 100 BEACH_{i} (29) (5.0) (1.1) (10.0) (3.0) (9.0)
where n = 30, R2 = .63, and PRICE_{i} = the price of the i^{th} house (in thousands of dollars), LOT_{i} = the size of the lot of the i^{th} house (in thousands of square feet), AGE_{i} = the age of the i^{th} house in years, BED_{i} = the number of bedrooms in the i^{th} house, FIRE_{i} = a dummy variable for a fireplace (1 = yes for the ith house), and BEACH_{i} = a dummy for having beach frontage (1 = yes for the ith house).
a) You expect the variables LOT, BED, and BEACH to have positive coefficients. Test each of these hypotheses at the 5 percent level.
b) You expect AGE to have a negative coefficient. Test this hypothesis at the 10 percent level.
c) At first you expect FIRE to have a positive coefficient, but one of your friends says that fireplaces are messy and are a pain to keep clean, so you are not sure. Run a twosided ttest around zero to test the twosided hypothesis at the 5 percent level.
2. Consider the following regression:
log(Qc_{i}) = 921.6 – 1.3 log(Pc_{i}) + 0.7 log(Pa_{i}) + 11.4 log(Inc_{i}) (121) (0.3) (0.05) (2.8)
where n = 30, R2 = 0.82, and where Qc_{i} = the total sales of CAMRY in the ith city in 2003, Pc_{i} = the price of a CAMRY in the i^{th} city in 2003 (in thousands), Pa_{i} = the price of an ACCORD in the i^{th} city in 2003 (in thousands), and Inc_{i} = the average income in the i^{th} city, the year of 2003 (in thousands). The numbers in the parentheses are standard errors.
a) How is the constant term interpreted?
b) How would you interpret the coefficient on log(Pc_{i}). Be explicit and explain, in terms of economic theory, the importance of its magnitude.
c) Get tvalues for the coefficients in the regression. Are all of our coefficients statistically significant at the 5% level of significance? How about at the 1% level of significance?
d) Interpret R^{2}. Can we have a negative R^{2}?
Part II. Short Answer
1. State the GaussMarkov Theorem and explain the term BLUE.
Part III. Estimation
1. Given data on M2, real GDP, and the Tbill rate, estimate the following regression and test whether the coefficients differ from zero. Do the coefficients have the expected signs?:
M_{t} = β_{0} + β_{1}RGDP_{t} + β_{2}Tbill_{t} + e_{t}
Don't be surprised if the fit is very good. We'll explain why the good fit is misleading in this model later in the course.
Economics 421/521
Winter 2012
Homework #2
1. Using the EAEF data set, regress LGEARN on S, EXP, and ASVABC. Use Ftests to determine whether the coefficients on S and EXP are (a) jointly significant, and (b) equal. [Parts (a) and (b) are two separate tests.]
Note: Here are the variable definitions (see pages. 443444 in Appendix B of the text):
2. Problem 7.1 in the text.
3. Using this data set, repeat the example from class for the first of the three cases we discussed, i.e. first regress the log of salary on a constant and the two variables proxying for experience, years and years^{2}:
log(salary) = β_{0} + β_{1}*years + β_{2}*years^{2} + u_{t}
Then, form the estimated residual squared (resid^{2}) and perform the LM test for heteroskedasticity (note: resid is the estimated value of u_{t}).
Posted by Mark Thoma on Wednesday, January 11, 2012 at 09:15 AM in Homework, Winter 2012  Permalink  Comments (0)
Today:
Next Time:
Posted by Mark Thoma on Monday, January 09, 2012 at 06:44 PM in Lectures, Winter 2012  Permalink  Comments (0)
Introduction to Econometrics
Course: Economics 421/521
Professor: Mark Thoma
Office/Hours: PLC 471 on T/Th 3:304:30 p.m.
Phone/Email: (541) 3464673, [email protected]
Web Page: http://economistsview.typepad.com/economics421/
Course Description: This course is a continuation of the econometrics sequence. The first course, EC 420/520, introduces the linear regression model and discusses estimation and testing under (mostly) ideal conditions. This course looks at what happens when the conditions are less than ideal due to departures from the assumptions necessary for ordinary least squares to be the best linear unbiased estimator, and then provides alternative regression techniques that address problems arising from the violations of the basic assumptions.
Text: Dougherty, Christopher, Introduction to Econometrics, (Oxford: University Press)
Prerequisites: Economics 320 or the equivalent.
GTFs, Office Hours, Location, and Email Address:
Gulcan Cil  M 13 
PLC 431 
[email protected] 
Colin Corbett  T/Th 1111:50 
PLC 504 
[email protected] 
Lab Times:
CRN  Time  Day  Room  
22318  16001720  M  442 MCK  Gulcan Cil 
22319  17301850  M  442 MCK  Gulcan Cil 
22320  16001720  W  442 MCK  Gulcan Cil 
22321  17301850  W  442 MCK  Gulcan Cil 
Tests and Grading: There will be a midterm exam and a final. The midterm will be given Tuesday, February 14th. The final will be given on Monday, March 19th at 8:00 a.m. No makeup exams will be given. The midterm is worth 30% and the final is worth 40%. Grades will be assigned according to your relative standing in the class.
Empirical Project: There will be an empirical paper that will comprise 15% of your grade. The paper is due no later than Thursday, March 15 at the beginning of class. Details will be given during lecture.
Computer Labs: The statistical software package EViews will be used for estimation and testing. Labs will consist of instruction and examples helpful in completing the homework assignments, and other activities. The homework is worth 15% of your grade.
*Tentative* Course Outline:
We will cover the following chapters: 

Review of Multiple Regression and Hypothesis Testing  
Heteroscedasticity  Ch. 7 
Autocorrelation  Ch.12 
Stochastic Regressors and Measurement Errors  Ch. 8 
Simultaneous Equations Estimation  Ch. 9 
And, as time permits: 

Binary Choice Models and Maximum Likelihood Estimation  Ch. 10 
Models Using Time Series Data  Ch. 11 
More details on the readings, homework, homework due dates, etc. will be posted here on an ongoing basis, so please check back regularly.
Posted by Mark Thoma on Monday, January 09, 2012 at 11:19 AM in Syllabus, Winter 2012  Permalink  Comments (0)
Course materials for the last time the course was offered (Winter 2010) are available here.
Posted by Mark Thoma on Monday, January 09, 2012 at 09:40 AM in Winter 2012  Permalink  Comments (0)