《样本均数的抽样误差与置信区间.doc》由会员分享,可在线阅读,更多相关《样本均数的抽样误差与置信区间.doc(7页珍藏版)》请在金锄头文库上搜索。
1、第三章 样本均数的抽样误差与置信区间 联系:数据/变量在离散点或区间上分布分布特征数应用样本数据x 频数分布表频数分布图描述指标()参考值范围随机变量X ,误差概率分布表概率分布图总体参数() ()置信区间3.1 样本均数的分布从同一总体中独立抽取多份样本, 他们的均数常大小不一, 这说明样本均数存在变异。通过电脑实验来认识样本均数的变异规律一、正态总体样本均数的分布实验3.1 从正态分布总体抽样的实验 假定正常男子的红血球计数服从正态分布N(4.6602, 0.57462),随机抽取1000份样本, 每份含n5个个体。样本均数依然是一个随机变量, 且 (1) 各样本均数未必等于总体均数(,误
2、差?); (2) 样本均数之间存在差异(,变异); (3) 样本均数的分布很有规律,围绕着总体均数,中间多、两边少, 左右基本对称(对称、正态?); (4) 样本均数的变异范围较原变量变异范围大大缩小(); (5) 随着样本量的增大, 样本均数变异范围逐渐缩小()。图3.1 从正态分布总体抽样的实验结果原正态总体N(4.6602, 0.57462);直方图是样本均数的分布(Luo: 这里横坐标为,若改为便是误差分布图的形状不变)3.74.14.54.95.35.73.74.14.54.95.35.73.74.14.54.95.35.7n=5 n=10 n=30(a) (b) (c)表3_2实3
3、_1a 表3.1 从N(4.6602, 0.57462)中随机抽样, 样本量为5, 100份独立样本的均数、标准差和总体均数的95%置信区间(单位:1012 /L)样本号均数标准差95%置信区间样本号均数标准差95%置信区间15.00.56884.2939, 5.7062514.48.40063.9827, 4.977324.72.34704.2891, 5.1509524.32.54873.6388, 5.001234.24.57633.5246, 4.9554534.88.37324.4167, 5.343444.64.59493.9014, 5.3786544.68.35244.2425
4、, 5.117554.60.40054.1028, 5.0972554.80.58664.0717, 5.528364.80.81863.7837, 5.8163564.52.35044.0850, 4.955074.68.45024.1211, 5.2389574.88.68694.0272, 5.732884.32.82253.2989, 5.3411584.80.52324.1505, 5.449594.72.59643.9796, 5.4604594.80.27944.4531, 5.1469104.40.44963.8418, 4.9582604.76.58234.0371, 5.4
5、830114.60.56833.8944, 5.3056614.76.70833.8807, 5.6394124.60.34014.1778, 5.0222624.12.57933.4008, 4.8392134.60.66483.7746, 5.4254634.72.44194.1714, 5.2686144.76.62743.9811, 5.5389644.44.28184.0902, 4.7898154.20.68863.3451, 5.0549654.921.02673.6454, 6.1947164.64.30914.2562, 5.0238664.80.71913.9073, 5.
6、6927174.96.42234.4357, 5.4843674.72.43614.1786, 5.2614184.96.40834.4532, 5.4669684.84.58734.1109, 5.5691194.68.58753.9506, 5.4094694.36.48923.7527, 4.9673204.84.53404.1771, 5.5030704.76.33534.3437, 5.1763214.92.28524.5659, 5.2741714.40.43093.8650, 4.9350224.60.45174.0392, 5.1608724.68.68803.8259, 5.
7、5341234.44.43333.9021, 4.9779734.60.43014.0661, 5.1339244.96.37114.4993, 5.4207744.48.64113.6841, 5.2759254.64.47424.0513, 5.228775*4.16.39273.6724, 4.6476264.96.53494.2959, 5.6241764.52.54873.8388, 5.2012274.48.47783.8868, 5.0732774.36.39303.8721, 4.8479284.68.38184.2061, 5.153978*5.04.20524.7853,
8、5.2947294.68.62893.8992, 5.4608794.56.99633.3231, 5.7969305.28.64674.4771, 6.0829804.80.62434.0249, 5.5751314.84.67244.0053, 5.674781*4.00.20903.7405, 4.2595324.52.32034.1224, 4.9176824.64.34144.2162, 5.0638334.76.58414.0348, 5.4852835.04.40504.5372, 5.5428344.48.20844.2213, 4.7388844.52.53533.8555,
9、 5.1845355.04.66464.2149, 5.8651854.44.32764.0333, 4.8467364.56.39124.0743, 5.0457864.60.37974.1287, 5.0713374.68.51834.0366, 5.3234874.48.28014.1322, 4.8278384.80.74453.8758, 5.7242884.64.24734.3330, 4.9471394.72.72603.8187, 5.621389*5.32.39824.8256, 5.8144404.68.85673.6165, 5.7435904.92.34734.4888
10、, 5.3512414.561.02413.2887, 5.8313914.72.29414.3548, 5.0852424.76.67863.9175, 5.6025924.44.42733.9096, 4.9704435.04.51764.3974, 5.6826934.48.35944.0338, 4.9262444.52.36584.0659, 4.9741944.92.44564.3668, 5.4732454.52.59443.7821, 5.2580954.64.47584.0494, 5.2306464.72.50244.0963, 5.3437964.76.85163.702
11、7, 5.8173475.12.63544.3312, 5.9088974.64.45604.0739, 5.2061484.76.58374.0354, 5.4846984.36.33683.9419, 4.778149*4.04.35953.5937, 4.4863994.56.61973.7907, 5.3293504.52.60943.7634, 5.27661004.60.45664.0331, 5.1669* 由这份样本估计的95%置信区间实际上并未复盖总体均数图3_1 表3.2 从N(4.6602, 0.57462)中随机抽取1000份独立样本, 其均数的频数分布组段下限(101
12、2 /L)频数频率(%)累积频率(%)3.60- 1 0.1 0.13.80- 5 0.5 0.64.00- 32 3.2 3.84.20-11711.715.54.40-22922.938.44.60-30430.468.84.80-21821.890.65.00- 76 7.698.25.20- 15 1.599.75.40- 3 0.3 100.0合计1000100.0理论上可以证明, 从正态分布N(m, s2)的总体中随机抽取含量为n的样本,其样本均数N(m, s2 /n)。样本均数的标准差习惯上又称为样本均数的标准误(standard error),简称标准误。值得注意的是如下的普遍规律:或 (3.1) 实际应用中往往总体标准差s未知, 人们只能用样本标准差S代替s,从而获得的估计值,则有 (3.2) 为方便计,可称为理论标准误,为样本标准误。二、非正态总体样本均数的分布实验3.2 从正偏峰的分布总体抽样的实验(1) 随着样本量的增大, 样本均数分布的对称性逐渐改善, 样本量为30时, 样本均数的分布接近正态分布; (2) 随着样本量的增大