第4章n多元回归估计与假设检验

上传人:ni****g 文档编号:568719431 上传时间:2024-07-26 格式:PPT 页数:89 大小:1.13MB
返回 下载 相关 举报
第4章n多元回归估计与假设检验_第1页
第1页 / 共89页
第4章n多元回归估计与假设检验_第2页
第2页 / 共89页
第4章n多元回归估计与假设检验_第3页
第3页 / 共89页
第4章n多元回归估计与假设检验_第4页
第4页 / 共89页
第4章n多元回归估计与假设检验_第5页
第5页 / 共89页
点击查看更多>>
资源描述

《第4章n多元回归估计与假设检验》由会员分享,可在线阅读,更多相关《第4章n多元回归估计与假设检验(89页珍藏版)》请在金锄头文库上搜索。

1、第4章n多元回归估计与假设检验第4章多元回归分析:估计与假设检验 Multiple Regression Analysisy = b0 + b1x1 + b2x2 + . . . bkxk + uEstimation and Inference2Parallels with Simple RegressionYi = b0 + b1Xi1 + b2Xi2 + . . . bkXik + uib0 is still the interceptb1 to bk all called slope parameters, also called partial regression coefficie

2、nts and any coefficient bj denote the change of Y with the changes of Xj as all the other independent variables fixed.u is still the error term (or disturbance)Still minimizing the sum of squared residuals, so have k+1 first order conditions3Obtaining OLS Estimates4Obtaining OLS Estimates, cont.The

3、above estimated equation is called the OLS regression line or the sle regression function (SRF)the above equation is the estimated equation, is not the really equation. The really equation is population regression line which we dont know. We only estimate it. So, using a different sle, we can get an

4、other different estimated equation line. The population regression line is5Interpreting Multiple Regression6An Exle (Wooldridge, p76)The determination of wage (dollars per hour), wage: Years of education, educYears of labor market experience, experYears with the current employer, tenureThe relations

5、hip btw. wage and educ, exper, tenure:wage=b0+b1educ+b2exper+b3tenure+uThe estimated equation as below:wage=-2.873+0.599educ+0.022exper+0.169tenure7A “Partialling Out” Interpretation8A “Partialling Out” Interpretation9“Partialling Out” continuedPrevious equation implies that regressing Y on X1 and X

6、2 gives same effect of X1 as regressing Y on residuals from a regression of X1 on X2This means only the part of Xi1 that is uncorrelated with Xi2 are being related to Yi , so were estimating the effect of X1 on Y after X2 has been “partialled out”10The wage determinationsThe estimated equation as be

7、low:wage=-2.873+0.599educ+0.022exper+0.169tenureNow, we first regress educ on exper and tenure to patial out the exper and tenures effects. Then we regress wage on the residuals of educ on exper and tenure. Whether we get the same result.?educ=exper+0.048tenure denote residuals residwage=5.896+0.599

8、residWe can see that the coefficient of resid is the same of the coefficien of the variable educ in the first estimated equation. So is in the second equation.11Goodness-of-Fit: R212Goodness-of-Fit13Goodness-of-Fit (continued) How do we think about how well our sle regression line fits our sle data?

9、 Can compute the fraction of the total sum of squares (SST) that is explained by the model, call this the R-squared of regression R2 = ESS/TSS = 1 RSS/TSS14Goodness-of-Fit (continued)15More about R-squaredwR2 can never decrease when another independent variable is added to a regression, and usually

10、will increasewBecause R2 will usually increase with the number of independent variables, it is not a good way to compare models16An ExlewUsing wage determination model to show that when we add another new independent variable will increase the value of R2.17Adjusted R-SquaredwR2 is simply an estimat

11、e of how much variation in y is explained by X1, X2,Xk. That is,wRecall that the R2 will always increase as more variables are added to the modelwThe adjusted R2 takes into account the number of variables in a model, and may decrease18Adjusted R-Squared (cont)wMost packages will give you both R2 and

12、 adj-R2w You can compare the fit of 2 models (with the same Y) by comparing the adj-R2awge=-3.391+0.644educ+0.070exper adj-R2=0.2222awge=-2.222+0.569educ+0.190tenure adj-R2=0.2992w You cannot use the adj-R2 to compare models with different ys (e.g. y vs. ln(Y)awge=-3.391+0.644educ+0.070exper adj-R2=

13、0.2222alog(wge)=0.404+0.087educ+0.026exper adj-R2=0.3059aBecause the variance of the dependent variables is different, the comparation btw them make no sense.19Assumptions for Unbiasedness20Assumptions for UnbiasednessPopulation model is linear in parameters: Y = b0 + b1X1 + b2X2 + bkXk + uWe can us

14、e a sle of size n, (Xi1, Xi2, Xik, Yi): i=1, 2, , n, from the population model, so that the sle model is Yi = b0 + b1Xi1 + b2Xi2 + bkXik + ui Cov(uXi)=0, E(uXi)=0 , i=1, 2, , n.E(u|X1, X2, Xk) = 0, implying that all of the explanatory variables are exogenous. E(u|X)=0, where X= (X1, X2, Xk), which w

15、ill reduce to E(u)=0 if independent variables X are not random variables.None of the Xs is constant, and there are no exact linear relationships among them.The new additional assumption.21About multicollinearityIt does allow the independent variables to be correlated; they just cannot be perfectly l

16、inear correlated.Student performance: colGPA=b0+b1 hsGPA+b2ACT+ b3 skipped + uConsumption function: consum=b0+b1inc+b2inc2+uBut, the following is invalid: log(consum)=b0+b1log(inc)+b2log(inc2)+uIn this case, we can not estimate the regression coefficients b1, b2 .22Unbiasedness of OLS estimationUnde

17、r the three assumptions above, we can get23Too Many or Too Few Variables24Too Many or Too Few VariablesWhat happens if we include variables in our specification that dont belong? There is no effect on our parameter estimate, and OLS remains unbiasedWhat if we exclude a variable from our specificatio

18、n that does belong?OLS will usually be biased 25Omitted Variable Bias26Omitted Variable Bias (cont)27Omitted Variable Bias (cont)28Omitted Variable Bias (cont)There are two cases where the estimated parameter is unbiased:If b2=0, so that X2 does not appear in the true modelIf tilde of d1=0, the tild

19、e b1 is unbiased for b1 29Summary of Direction of BiasCorr(X1, X2) 0Corr(X1, X2) 0Positive biasNegative biasb2 0 and H1: bj 0One-Sided Alternatives (cont)0ca(1 - a)Fail to rejectreject58An Exle: Hourly Wage EquationwWage determination: (wooldridge, p123)wlog(wge)=0.284 + 0.092educ + 0.0041exper + 0.

20、022tenurew (0.104) (0.007) (0.0017) (0.003)w n=526 R2=0.316wWhether the return to exper, controlling for educ and tenure, is zero in the population, against the alternative that it is positive.wH0: bexper= 0 vs. H1:bexper 0wThe t statistic is t =0.0041/0.00172.41wThe degree of freedom: df=n-k-1=526-

21、3-1=522wThe critical value of 5% is 1.645wAnd the t statistic is larger than the critical value, ie., 2.411.645wThat is, we will reject the null hypothesis and bexper is really positive.01.645(1 - a)Fail to reject5%reject59Another exle: Student Performance and School SizewWhether the school size has

22、 effect on student performance?amath10, math test scores, reveal the student performanceatotcomp, average annual teacher compensationastaff, the number of staff per one thousand studentsaenroll, student enrollment, reveal the school size.wThe Model Equationamath10=b0+b1totcomp+b2staff+b3enroll+uaH0:

23、 b3=0, H1:b3 -1.645, so we cant reject the null hypothesis.-1.645reject-09160One-sided vs Two-sidedwBecause the t distribution is symmetric, testing H1: bj 0 is straightforward. The critical value is just the negative of beforewWe can reject the null if the t statistic c, then we fail to reject the

24、nullwFor a two-sided test, we set the critical value based on a/2 and reject H0: bj = 0 if the absolute value of the t statistic c61yi = b0 + b1Xi1 + + bkXik + uiH0: bj = 0 H1: bj 0c0a/2(1 - a)-ca/2Two-Sided Alternativesrejectrejectfail to reject62Summary for H0: b bj = 0wUnless otherwise stated, th

25、e alternative is assumed to be two-sidedwIf we reject the null, we typically say “Xj is statistically significant at the 100a % level”wIf we fail to reject the null, we typically say “Xj is statistically insignificant at the 100a % level”63An Exle: Determinants of College GPA (wooldridge, p128)wVari

26、ables:acolGPA, college GPAaskipped, the average number of lectures missed per weekaACT, achievement test scoreahsGPA, high school GPAwThe estimated modelaolGPA = 1.39 + 0.412 hsGPA + 0.015 ACT 0.083 skippeda (0.33) (0.094) (0.011) (0.026)a n = 141, R2 = 0.234wH0: bskipped= 0, H1: bskipped 0wfd: n-k-

27、1=137, the critical value t137=1.96wThe t statistic is |-0.083/0.026|=3.19 t137=1.96, so we will reject the null hypothesis and the bskipped is signanificantly beyond zero.-1.96reject-3.191.96reject64Testing other hypotheseswA more general form of the t statistic recognizes that we may want to test

28、something like H0: bj = aj wIn this case, the appropriate t statistic is65An Exle: Cus Crime and Enrollment (wooldridge, p129)lVariableslcrime, the annual number of crimes on college cuseslenroll, student enrollment, reveal the size of college.lThe regression modelllog(crime) = b0 + b1log(enroll) +

29、ulWhether b1 = 1, that is H0: b1 = 1, H1: b1 1llog(crime) = -6.63 + 1.27 log(enroll) l (1.03) (0.11) n = 97 R2 = 0.585ldf: n-k-1=95, the critical value at 5% is t95=1.645lThe t-statistic is (1.27-1)/0.112.45t95=1.645lSo we reject the null hypothesis and the evidence prove that b1 1.66Confidence Inte

30、rvalswAnother way to use classical statistical testing is to construct a confidence interval using the same critical value as was used for a two-sided testw A 100(1 - a) % confidence interval is defined as67Computing p-values for t testswAn alternative to the classical approach is to ask, “what is t

31、he smallest significance level at which the null would be rejected?”wSo, compute the t statistic, and then look up what percentile it is in the appropriate t distribution this is the p-valuewp-value is the probability we would observe the t statistic we did, if the null were true68Stata and p-values

32、, t tests, etc.wMost computer packages will compute the p-value for you, assuming a two-sided testwIf you really want a one-sided alternative, just divide the two-sided p-value by 2wStata provides the t statistic, p-value, and 95% confidence interval for H0: bj = 0 for you, in columns labeled “t”, “

33、P |t|” and “95% Conf. Interval”, respectively69Testing a Linear CombinationSuppose instead of testing whether b1 is equal to a constant, you want to test if it is equal to another parameter, that is H0 : b1 = b2, or b1 - b2=0Use same basic procedure for forming a t statistic70Testing Linear Combinat

34、ion (cont)71Testing a Linear Combo (cont)So, to use formula, need s12, which standard output does not haveMany packages will have an option to get it, or will just perform the test for youIn Stata, after reg Y X1 X2 Xk you would type test X1 = X2 to get a p-value for the testMore generally, you can

35、always restate the problem to get the test you want72Exle:Suppose you are interested in the effect of caign expenditures on outcomesModel is voteA = b0 + b1log(expendA) + b2log(expendB) + b3prtystrA + uH0: b1 = - b2, or H0: q1 = b1 + b2 = 0b1 = q1 b2, so substitute in and rearrange voteA = b0 + q1lo

36、g(expendA) + b2log(expendB) log(expendA) + b3prtystrA + u73Exle (cont):This is the same model as originally, but now you get a standard error for b1 b2 = q1 directly from the basic regressionAny linear combination of parameters could be tested in a similar mannerOther exles of hypotheses about a sin

37、gle linear combination of parameters:b1 = 1 + b2 ; b1 = 5b2 ; b1 = -1/2b2 ; etc 74Multiple Linear RestrictionsEverything weve done so far has involved testing a single linear restriction, (e.g. b1 = 0 or b1 = b2 )However, we may want to jointly test multiple hypotheses about our parametersA typical

38、exle is testing “exclusion restrictions” we want to know if a group of parameters are all equal to zero75Testing Exclusion RestrictionsNow the null hypothesis might be something like H0: bk-q+1 = 0, . , bk = 0The alternative is just H1: H0 is not trueCant just check each t statistic separately, beca

39、use we want to know if the q parameters are jointly significant at a given level it is possible for none to be individually significant at that level76Exclusion Restrictions (cont)To do the test we need to estimate the “restricted model” without Xk-q+1, , Xk included, as well as the “unrestricted mo

40、del” with all Xs includedIntuitively, we want to know if the change in RSS is big enough to warrant inclusion of Xk-q+1, , Xk 77The F statisticThe F statistic is always positive, since the RSS from the restricted model cant be less than the RSS from the unrestricted.Essentially the F statistic is me

41、asuring the relative increase in RSS when moving from the unrestricted to restricted model q = number of restrictions, or dfr dfur n k 1 = dfur78The F statistic (cont)To decide if the increase in RSS when we move to a restricted model is “big enough” to reject the exclusions, we need to know about t

42、he sling distribution of our F statNot surprisingly, F Fq,n-k-1, where q is referred to as the numerator degrees of freedom and n k 1 as the denominator degrees of freedom 790c(1 - a)f(F)FThe F statistic (cont)rejectfail to rejectReject H0 at a significance level if F ca80Exle: the determinations of

43、 league baseball players salaries (wooldridge, p143)The regression modellog(salary)=b0+b1year+b2gamesyr+b3bavg+b4hrunsyr+b5rbisyr+usalary, the 1993 total salaryyears, years in the leaguegamesyr, average games played per yearbavg, career batting averagehrunsyr, home runs per yearrbisyr, runs battled

44、in per yearThe null hypothesis is H0:b3=0, b4=0,b5=0, which is called multiple hypotheses test or joint hypotheses test. The alternative hypothesis is H1:H0 is not true. The unrestricted model:log(salary)=11.19+0.0689year+0.0126gamesyr+0.00098bavg+0.0144hrunsyr+0.0108rbisyr (0.29) (0.0689) (0.0026)

45、(0.00110) (0.0161) (0.0072) n=353, SSR=183.186, R2=0.6278The restricted modellog(salary)=11.22+0.0713year+0.0202gamesyr (0.11) (0.0125) (0.0013) n=353, SSR=198.311, R2=0.597181Exle: the determinations of league baseball players salaries, cont.The restricted number and the degree of the freedom of re

46、stricted model is q=3;The degree of freedom of unrestricted model is 353-5-1=347;Then the F statistic is82The R2 form of the F statisticBecause the RSSs may be large and unwieldy, an alternative form of the formula is usefulWe use the fact that RSS = TSS (1 R2) for any regression, so can substitute

47、in for RSSu and RSSur83Exle: Parents Education in a Birth Weight Exle: Parents Education in a Birth Weight Equation (wooldridge, p150)Equation (wooldridge, p150)Variablesbwght, birth weight in pounds;cigs, average number of cigarettes the mother smoked per day during pregnancy;parity, the birth orde

48、r of this child;faminc, annual family income;motheduc, years of schooling for the mother;fatheduc, years of schooling for the father.Model: bwght=b0+b1cigs+b2parity+b3faminc+b4motheduc+b5fatheduc+uWhether the parents education has any effect on birth weight? This is stated as H0: b4=0, b5=0, so q=2.

49、bwght=cigs+1.788parity+0.056faminc-0.370motheduc+0.472fatheduc (3.728) (0.110) (0.659) (0.037) (0.320) (0.283) n=1191 R2=0.0387bwght=cigs+1.832parity+0.067faminc (1.656) (0.109) (0.658) (0.032) n=1191 R2=0.0364F Statistic is F=(0.0387-0.0364)/2/(1-0.0387)/(1191-5-1)=1.42 F2, 1185= 19.5We fail to rej

50、ect H0. In other words, motheduc and fatheduc are jointly insignificant in the birth weight equation84Overall SignificanceA special case of exclusion restrictions is to test H0: b1 = b2 = bk = 0Since the R2 from a model with only an intercept will be zero, the F statistic is simply85General Linear R

51、estrictions The basic form of the F statistic will work for any set of linear restrictions First estimate the unrestricted model and then estimate the restricted model In each case, make note of the RSS Imposing the restrictions can be tricky will likely have to redefine variables again86Exle:Use sa

52、me voting model as beforeModel is voteA = b0 + b1log(expendA) + b2log(expendB) + b3prtystrA + unow null is H0: b1 = 1, b3 = 0Substituting in the restrictions: voteA = b0 + log(expendA) + b2log(expendB) + u, soUse voteA - log(expendA) = b0 + b2log(expendB) + u as restricted model87F Statistic Summary

53、wJust as with t statistics, p-values can be calculated by looking up the percentile in the appropriate F distributionwStata will do this by entering: display fprob(q, n k 1, F), where the appropriate values of F, q,and n k 1 are usedwIf only one exclusion is being tested, then F = t2, and the p-values will be the same88SummaryThe meaning of partial regression coefficient , biSix CLM AssumptionsThe variance of the OLS estimatorsR2 and Adj - R2F test89

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 医学/心理学 > 基础医学

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号