实验八多元线性回归与逐步回归(2学时)一、 实验目的和要求1. 掌握逐步回归的思想与方法,掌握Matlab中stepwis命令的使用方法.二、 实验内容1. 主要语句:逐步回归命令 stepwise提供了交互式画面,可自由选择变量,进行统计分 析,格式stepwise(X,Y,in,penter,premove)乂是自变量数据,Y是因变量数据,分别为矩阵,in是矩阵X列数指标,给出初 始模型中包括的子集,缺省时设定为全部自变量不在模型中,penter为变量进入 时显著性水平,缺省时=0.05,premove为变量剔除时显著性水平,缺省=0.10. 在应用stepwise命令进行运算时,程序不断提醒将某个变量加入(Move in)回归 方程,或提醒将某变量从回归方程中剔除(Move out).注意:应用stepwise命令数据矩阵X第一列不需人工加一个全1向量,程序会 自动求出回归方程常数项(intercept).2. 实验数据与内容选取1989-2003年的全国统计数据,考虑的自变量包括:x1-工业总产值(亿元);x2-农业总产值(亿元);x3-建筑业总产值(亿元); x4-社会商品零售总额(亿元);x5-全民人口数(万人);x6-受灾面积;y- 国家财政收入(亿元)。
数据见表3-20,(1) 建立多元回归模型Y x x x x x x ,求0 1 1 2 2 3 3 4 4 5 5 6 6回归参数的估计;(2) 对上述回归模型和回归系数进行检验(要写出统计量);(3) 用逐步回归求y与6个因素之间的回归关系式.表3-20 1989-2003 年统计数据年份X1X2X3X4X5X6y19896484.004100.60794.008101.40112704.046991.002664.9019906858.004954.30859.408300.10114333.038474.002937.1019918087.105146.401015.109415.60115823.055472.003149.48199210284.505588.001415.0010993.70117171.051333.003483.37199314143.806605.102284.7012462.10118517.048829.004348.95199419359.609169.203012.6016264.70119850.055043.005218.10199524718.3011884.603819.6020620.00121121.045821.006242.20199629082.6013539.804530.5024774.10122389.046989.007407.99199732412.1013852.504810.6027298.90123626.053429.008651.14199833387.9014241.905231.4029152.50124761.050145.009875.95199935087.2014106.205470.6031134.70125786.049981.0011444.08200039047.3013873.605888.0034152.60126743.054688.0013395.23200142374.6014462.806375.4037595.20127627.052215.0016386.04200245975.2014931.507005.0042027.10128453.047119.0018903.64200353092.9014870.108181.3045842.00129227.054506.0021715.25解:(1)建立多元回归模型 建立多元线性回归模型Yx2X2 3^3 4X401)程序:data=[1989 6484.00 4100.60 794.00 8101.4C112704.046991.002664.9019906858.004954.30859.408300.10114333.038474.002937.1019918087.105146.401015.109415.60115823.055472.003149.48199210284.505588.001415.0010993.70117171.051333.003483.37199314143.806605.102284.7012462.10118517.048829.004348.95199419359.609169.203012.6016264.70119850.055043.005218.10199524718.3011884.603819.6020620.00121121.045821.006242.20199629082.6013539.804530.5024774.10122389.046989.007407.99199732412.1013852.504810.6027298.90123626.053429.008651.14199833387.9014241.905231.4029152.50124761.050145.009875.95199935087.2014106.205470.6031134.70125786.049981.0011444.08200039047.3013873.605888.0034152.60126743.054688.0013395.23200142374.6014462.806375.4037595.20127627.052215.0016386.04200245975.2014931.507005.0042027.10128453.047119.0018903.64200353092.9014870.108181.3045842.00129227.054506.0021715.25];[n,p]=size(data); %读取行数n为样本数,列数P为回归参数个数x=[ones(n,1),data(:,2:7)]; %建立设计矩阵,第一列全是1 y=data(:,8); %读取 Y[b,bint,r,rint,stats]=regress(y,x); %建立线,性回归模型,输出回归参数 b,回归参数b的置信区间,残差r,残差r的置信区间,输出几个统计量stats 结果输出:b,bint,r,rint,stats 结果:回归参数估计值1.0e+03 *-6.92260.0001-0.00090.00000.00060.0001-0.0000得 B(-6.9226,0. 0001,-0.0009,0.0000, 0.0006,0.0001,-0.0000)t回归参数置信区间:bint =1.0e+04 *-4.16302.7785-0.00010.0001-0.0001-0.0001-0.00030.00030.00000.0001-0.00000.0000-0.00000.0000得回归参数B的置信区间如上输出残差值r =-228.1801132.3052382.5207-382.0969-164.3261413.2697235.0416-64.6531-215.4275-83.5491-101.2389-476.3158462.614592.4558-2.4199得到残差向量8(-228.1801,132.3052 382.5207, - 382.0969, -164.3261,413.2697, 235.0416, -64.6531,-215.4275,-83.5491, -101.2389,-476.3158, 462.6145, 92.4558,-2.4199T输出随机误差项8 (「2,, n)T的置信区间rint =1.0e+03 *-0.7047 0.2484-0.3835 0.6481-0.1704 0.9354-1.0651 0.3009-0.7159 0.3872-0.2525 1.0791-0.4882 0.9583-0.8089 0.6795-0.7999 0.3690-0.8192 0.6521-0.8553 0.6528-1.1540 0.2014-0.2046 1.1299-0.5519 0.7368-0.4023 0.3974输出统计量结果:stats =1.0e+05 *0.0000 0.0062 0.0000 1.4152R 2 0.99785接近1,相关性强,F —SSR^p- 62056071 F 6,15 6 1 ,0 SSE/n p 1) 0.05p P{F(p,p 1) F0} 3.1585*10-5 0.05,均说明自变量对y线性关系显著。
2 14152401212注意:stats转成长格式数据命令和结果:format long gstats结果:99785.601208397862056071.9039513.15856319868346e-0514152401212.9742绘制残差图命令: rcoplot(r,rint)-600-800-10002 4 6 8 10 12 14Case Number残差示意图Residual Case Order Plot00000000 00000 00 0 8 6 4 2 逻1 - -slauaTseR残差示意图看出无异常点.(2)对上述回归模型和回归系数进行显著性检验%求可决系数,进行相关性检验,y是因变量丫数据TSS二y'*(eye(n)-1/n*ones(n,n))*y; %总离差平方和H=x*inv(x'*x)*x' %帽子矩阵ESS= y'*(eye(n)-H)*y; %计算残差平方和RSS= y'*(H-1/n*ones(n,n))*y; %计算回归平方和MSE=RSS/n-p-1; %计算均方残差R2=RSS/TSS; %计算样本决定系数RSS/TSS,相关性检验%F检验 检验回归方程的显著性F0=(RSS/p)/(ESS/(n-p-1)); % 计算 F0Fa=finv(0.95, p,n-p。