《利用Excel进行统计分析-Chapter14-Introduction to Multiple Regression》由会员分享,可在线阅读,更多相关《利用Excel进行统计分析-Chapter14-Introduction to Multiple Regression(50页珍藏版)》请在金锄头文库上搜索。
1、Statistics for ManagersUsing Microsoft Excel 5th EditionChapter14IntroductiontoMultipleRegressionChap 14-1Learning ObjectivesInthischapter,youlearn:HowtodevelopamultipleregressionmodelHowtointerprettheregressioncoefficientsHowtodeterminewhichindependentvariablestoincludeintheregressionmodelHowtodete
2、rminewhichindependentvariablesaremostimportantinpredictingadependentvariableHowtousecategoricalvariablesinaregressionmodel2The Multiple Regression ModelIdea:Examinethelinearrelationshipbetween1dependent(Y)&2ormoreindependentvariables(Xi).MultipleRegressionModelwithkIndependentVariables:Y-interceptPo
3、pulationslopesRandomError3Multiple Regression EquationThecoefficientsofthemultipleregressionmodelareestimatedusingsampledataEstimated (or predicted) value of YEstimated slope coefficientsMultipleregressionequationwithkindependentvariables:EstimatedinterceptInthischapterwewillalwaysuseExceltoobtainth
4、eregressionslopecoefficientsandotherregressionsummarymeasures.4Multiple Regression EquationExamplewithtwoindependentvariablesYX1X2Slope for variable X1Slope for variable X25Multiple Regression Equation2 Variable ExampleAdistributoroffrozendessertpieswantstoevaluatefactorsthoughttoinfluencedemandDepe
5、ndentvariable:Piesales(unitsperweek)Independentvariables:Price(in$)Advertising($100s)Dataarecollectedfor15weeks6Multiple Regression Equation2 Variable ExampleSales=b0+b1(Price)+b2(Advertising)Sales=b0+b1X1+b2X2WhereX1=PriceX2=AdvertisingWeekPieSalesPrice($)Advertising($100s)13505.503.324607.503.3335
6、08.003.044308.004.553506.803.063807.504.074304.503.084706.403.794507.003.5104905.004.0113407.203.5123007.903.2134405.904.0144505.003.5153007.002.7Multipleregressionequation:7Multiple Regression Equation2 Variable Example, ExcelRegressionStatisticsMultipleR0.72213RSquare0.52148AdjustedRSquare0.44172S
7、tandardError47.46341Observations15ANOVAdfSSMSFSignificanceFRegression229460.02714730.0136.538610.01201Residual1227033.3062252.776Total1456493.333CoefficientsStandardErrortStatP-valueLower95%Upper95%Intercept306.52619114.253892.682850.0199357.58835555.46404Price-24.9750910.83213-2.305650.03979-48.576
8、26-1.37392Advertising74.1309625.967322.854780.0144917.55303130.708888Multiple Regression Equation2 Variable Exampleb1=-24.975:saleswilldecrease,onaverage,by24.975piesperweekforeach$1increaseinsellingprice,netoftheeffectsofchangesduetoadvertisingb2=74.131:saleswillincrease,onaverage,by74.131piesperwe
9、ekforeach$100increaseinadvertising,netoftheeffectsofchangesduetopricewhereSalesisinnumberofpiesperweekPriceisin$Advertisingisin$100s.9Multiple Regression Equation2 Variable ExamplePredictsalesforaweekinwhichthesellingpriceis$5.50andadvertisingis$350:Predictedsalesis428.62piesNotethatAdvertisingisin$
10、100s,so$350meansthatX2=3.510Coefficient of Multiple DeterminationReportstheproportionoftotalvariationinYexplainedbyallXvariablestakentogether11Coefficient of Multiple Determination (Excel)RegressionStatisticsMultipleR0.72213RSquare0.52148AdjustedRSquare0.44172StandardError47.46341Observations15ANOVA
11、dfSSMSFSignificanceFRegression229460.02714730.0136.538610.01201Residual1227033.3062252.776Total1456493.333CoefficientsStandardErrortStatP-valueLower95%Upper95%Intercept306.52619114.253892.682850.0199357.58835555.46404Price-24.9750910.83213-2.305650.03979-48.57626-1.37392Advertising74.1309625.967322.
12、854780.0144917.55303130.7088852.1%ofthevariationinpiesalesisexplainedbythevariationinpriceandadvertising12Adjusted r2r2neverdecreaseswhenanewXvariableisaddedtothemodelThiscanbeadisadvantagewhencomparingmodelsWhatistheneteffectofaddinganewvariable?WeloseadegreeoffreedomwhenanewXvariableisaddedDidthen
13、ewXvariableaddenoughindependentpowertooffsetthelossofonedegreeoffreedom?13Adjusted r2ShowstheproportionofvariationinYexplainedbyallXvariablesadjustedforthenumberofXvariablesused(wheren=samplesize,k=numberofindependentvariables)PenalizesexcessiveuseofunimportantindependentvariablesSmallerthanr2Useful
14、incomparingmodels14Adjusted r2RegressionStatisticsMultipleR0.72213RSquare0.52148AdjustedRSquare0.44172StandardError47.46341Observations15ANOVAdfSSMSFSignificanceFRegression229460.02714730.0136.538610.01201Residual1227033.3062252.776Total1456493.333CoefficientsStandardErrortStatP-valueLower95%Upper95
15、%Intercept306.52619114.253892.682850.0199357.58835555.46404Price-24.9750910.83213-2.305650.03979-48.57626-1.37392Advertising74.1309625.967322.854780.0144917.55303130.7088844.2%ofthevariationinpiesalesisexplainedbythevariationinpriceandadvertising,takingintoaccountthesamplesizeandnumberofindependentv
16、ariables15F-Test for Overall SignificanceF-TestforOverallSignificanceoftheModelShowsifthereisalinearrelationshipbetweenalloftheXvariablesconsideredtogetherandYUseFteststatisticHypotheses:H0:1=2=k=0(nolinearrelationship)H1:atleastonei0(atleastoneindependentvariableaffectsY)16F-Test for Overall Signif
17、icanceTeststatistic:whereFhas(numerator)=kand(denominator)=(n-k-1)degreesoffreedom17F-Test for Overall SignificanceRegressionStatisticsMultipleR0.72213RSquare0.52148AdjustedRSquare0.44172StandardError47.46341Observations15ANOVAdfSSMSFSignificanceFRegression229460.02714730.0136.538610.01201Residual12
18、27033.3062252.776Total1456493.333CoefficientsStandardErrortStatP-valueLower95%Upper95%Intercept306.52619114.253892.682850.0199357.58835555.46404Price-24.9750910.83213-2.305650.03979-48.57626-1.37392Advertising74.1309625.967322.854780.0144917.55303130.70888P-value for the F-Test18F-Test for Overall S
19、ignificanceH0:1=2=0H1:1and2notbothzero=.05df1=2df2=12TestStatistic:Decision:Conclusion:SinceFteststatisticisintherejectionregion(p-value.05),rejectH0ThereisevidencethatatleastoneindependentvariableaffectsY0 = .05F.05 = 3.89Reject H0Do not reject H0CriticalValue:F =3.885F19Residuals in Multiple Regre
20、ssionTwovariablemodelYX1X2Yi Yix2ix1iThebestfittinglinearregressionequation,Y,isfoundbyminimizingthesumofsquarederrors,e2 Sample observationResidual = ei = (Yi Yi)20Multiple Regression AssumptionsAssumptions:TheerrorsareindependentTheerrorsarenormallydistributedErrorshaveanequalvarianceei = (Yi Yi)E
21、rrors(residuals)fromtheregressionmodel:21Multiple Regression AssumptionsTheseresidualplotsareusedinmultipleregression:Residualsvs.YiResidualsvs.X1iResidualsvs.X2iResidualsvs.time(iftimeseriesdata)Usetheresidualplotstocheckforviolationsofregressionassumptions22Individual VariablesTests of HypothesisU
22、set-testsofindividualvariableslopesShowsifthereisalinearrelationshipbetweenthevariableXiandYHypotheses:H0:i=0(nolinearrelationship)H1:i0(linearrelationshipdoesexistbetweenXiandY)23Individual VariablesTests of HypothesisH0:j=0(nolinearrelationship)H1:j0(linearrelationshipdoesexistbetweenXiandY)TestSt
23、atistic:(df=nk1)24Individual VariablesTests of HypothesisRegressionStatisticsMultipleR0.72213RSquare0.52148AdjustedRSquare0.44172StandardError47.46341Observations15ANOVAdfSSMSFSignificanceFRegression229460.02714730.0136.538610.01201Residual1227033.3062252.776Total1456493.333CoefficientsStandardError
24、tStatP-valueLower95%Upper95%Intercept306.52619114.253892.682850.0199357.58835555.46404Price-24.9750910.83213-2.305650.03979-48.57626-1.37392Advertising74.1309625.967322.854780.0144917.55303130.70888t-valueforPriceist=-2.306,withp-value.0398t-valueforAdvertisingist=2.855,withp-value.014525Individual
25、VariablesTests of Hypothesisd.f. = 15 - 2 - 1 = 12 = .05t /2 = 2.1788H0:j=0H1:j0Theteststatisticforeachvariablefallsintherejectionregion(p-values.05)ThereisevidencethatbothPriceandAdvertisingaffectpiesalesat =.05RejectH0foreachvariableCoefficientsStandardErrortStatP-valuePrice-24.9750910.83213-2.305
26、650.03979Advertising74.1309625.967322.854780.01449Decision:Conclusion:Reject H0Reject H0/2=.025-t/2Do not reject H00t/2/2=.02526Confidence Interval Estimate for the SlopeConfidenceintervalforthepopulationslopeiExample:Forma95%confidenceintervalfortheeffectofchangesinprice(X1)onpiesales,holdingconsta
27、nttheeffectsofadvertising:-24.975(2.1788)(10.832):Sotheintervalis(-48.576,-1.374)CoefficientsStandardErrorIntercept306.52619114.25389Price-24.9750910.83213Advertising74.1309625.96732where t has (n k 1) d.f.Here, t has (15 2 1) = 12 d.f.27Confidence Interval Estimate for the SlopeConfidenceintervalfo
28、rthepopulationslopeiExample:Exceloutputalsoreportstheseintervalendpoints:Weeklysalesareestimatedtobereducedbybetween1.37to48.58piesforeachincreaseof$1inthesellingprice,holdingconstanttheeffectsofadvertising.CoefficientsStandardErrorLower95%Upper95%Intercept306.52619114.2538957.58835555.46404Price-24
29、.9750910.83213-48.57626-1.37392Advertising74.1309625.9673217.55303130.7088828Testing Portions of the Multiple Regression ModelContributionofaSingleIndependentVariableXjSSR(Xj|allvariablesexceptXj)=SSR(allvariables)SSR(allvariablesexceptXj)MeasuresthecontributionofXjinexplainingthetotalvariationinY(S
30、ST)29Testing Portions of the Multiple Regression ModelMeasuresthecontributionofX1inexplainingSSTFrom ANOVA section of regression forFrom ANOVA section of regression forContributionofaSingleIndependentVariableXj,assumingallothervariablesarealreadyincluded(considerherea3-variablemodel):SSR(X1|X2andX3)
31、=SSR(allvariables)SSR(X2andX3)30The Partial F-Test StatisticConsiderthehypothesistest:H0:variableXjdoesnotsignificantlyimprovethemodelafterallothervariablesareincludedH1:variableXjsignificantlyimprovesthemodelafterallothervariablesareincludedTestusingtheF-teststatistic:(with1andn-k-1d.f.)31Testing P
32、ortions of Model: ExampleTestatthe=.05leveltodeterminewhetherthepricevariablesignificantlyimprovesthemodelgiventhatadvertisingisincludedExample:Frozendessertpies32Testing Portions of Model: ExampleH0:X1(price)doesnotimprovethemodelwithX2(advertising)includedH1:X1doesimprovemodel=.05,df=1and12Fcritic
33、alValue=4.75(ForX1andX2)(ForX2only)ANOVAdfSSMSRegression229460.0268714730.01343Residual1227033.306472252.775539Total1456493.33333ANOVAdfSSRegression117484.22249Residual1339009.11085Total1456493.3333333Testing Portions of Model: ExampleConclusion:RejectH0;addingX1doesimprovemodel(ForX1andX2)(ForX2onl
34、y)ANOVAdfSSMSRegression229460.0268714730.01343Residual1227033.306472252.775539Total1456493.33333ANOVAdfSSRegression117484.22249Residual1339009.11085Total1456493.3333334Relationship Between Test StatisticsThepartialFteststatisticdevelopedinthissectionandthetteststatisticarebothusedtodeterminethecontr
35、ibutionofanindependentvariabletoamultipleregressionmodel.Thehypothesistestsassociatedwiththesetwostatisticsalwaysresultinthesamedecision(thatis,thep-valuesareidentical).Wherea=degreesoffreedom35Coefficient of Partial Determination for k Variable ModelMeasurestheproportionofvariationinthedependentvar
36、iablethatisexplainedbyXjwhilecontrollingfor(holdingconstant)theotherindependentvariables36Using Dummy VariablesAdummyvariableisacategoricalindependentvariablewithtwolevels:yesorno,onoroff,maleorfemalecodedas0or1AssumesequalslopesforothervariablesIfmorethantwolevels,thenumberofdummyvariablesneededis(
37、numberoflevels-1)37Dummy Variable ExampleLet:Y=piesalesX1=priceX2=holiday(X2=1ifaholidayoccurredduringtheweek)(X2=0iftherewasnoholidaythatweek)38Dummy Variable ExampleSame slopeX1(Price)Y (sales)b0+b2b0HolidayNoHolidayDifferent interceptHoliday(X2=1)NoHoliday(X2=0)IfH0:2=0isrejected,then“Holiday”has
38、asignificanteffectonpiesales39Dummy Variable ExampleSales:numberofpiessoldperweekPrice:piepricein$Holiday:1Ifaholidayoccurredduringtheweek0Ifnoholidayoccurredb2=15:onaverage,saleswere15piesgreaterinweekswithaholidaythaninweekswithoutaholiday,giventhesameprice40Dummy Variable ModelMore Than Two Level
39、sThenumberofdummyvariablesisonelessthanthenumberoflevelsExample:Y=houseprice;X1=squarefeetIfstyleofthehouseisalsothoughttomatterStyle=ranch,splitlevel,colonialThreelevels,sotwodummyvariablesareneeded41Dummy Variable ModelMore Than Two LevelsExample:Let“colonial”bethedefaultcategory,andletX2andX3beus
40、edfortheothertwocategories:Y=housepriceX1=squarefeetX2=1ifranch,0otherwiseX3=1ifsplitlevel,0otherwiseThemultipleregressionequationis:42Dummy Variable ModelMore Than Two LevelsWiththesamesquarefeet,aranchwillhaveanestimatedaveragepriceof23.53thousanddollarsmorethanacolonial.Withthesamesquarefeet,aspl
41、it-levelwillhaveanestimatedaveragepriceof18.84thousanddollarsmorethanacolonial.Considertheregressionequation:Foracolonial:X2=X3=0Foraranch:X2=1;X3=0Forasplitlevel:X2=0;X3=143Interaction Between Independent VariablesHypothesizesinteractionbetweenpairsofXvariablesResponsetooneXvariablemayvaryatdiffere
42、ntlevelsofanotherXvariableContainsatwo-waycrossproductterm44Effect of InteractionGiven:Withoutinteractionterm,effectofX1onYismeasuredby1Withinteractionterm,effectofX1onYismeasuredby1+3X2EffectchangesasX2changes45Interaction ExampleX2=1:Y=1+2X1+3(1)+4X1(1)=4+6X1X2=0:Y=1+2X1+3(0)+4X1(0)=1+2X1Slopesare
43、differentiftheeffectofX1onYdependsonX2valueX14 48 812120 00 01 10.50.51.51.5Y = 1 + 2X1 + 3X2 + 4X1X2 SupposeX2isadummyvariableandtheestimatedregressionequationis46Significance of Interaction TermCanperformapartialF-testforthecontributionofavariabletoseeiftheadditionofaninteractiontermimprovesthemod
44、elMultipleinteractiontermscanbeincludedUseapartialF-testforthesimultaneouscontributionofmultiplevariablestothemodel47Simultaneous Contribution of Independent VariablesUsepartialF-testforthesimultaneouscontributionofmultiplevariablestothemodelLetmvariablesbeanadditionalsetofvariablesaddedsimultaneous
45、lyTotestthehypothesisthatthesetofmvariablesimprovesthemodel:(whereFhasmandn-k-1d.f.)48Chapter SummaryDevelopedthemultipleregressionmodelTestedthesignificanceofthemultipleregressionmodelDiscussedadjustedr2DiscussedusingresidualplotstocheckmodelassumptionsInthischapter,wehave49Chapter SummaryTestedindividualregressioncoefficientsTestedportionsoftheregressionmodelUseddummyvariablesEvaluatedinteractioneffectsInthischapter,wehave50