应用统计学卡方检验

上传人:hs****ma 文档编号:585319244 上传时间:2024-09-02 格式:PPT 页数:33 大小:1.06MB
返回 下载 相关 举报
应用统计学卡方检验_第1页
第1页 / 共33页
应用统计学卡方检验_第2页
第2页 / 共33页
应用统计学卡方检验_第3页
第3页 / 共33页
应用统计学卡方检验_第4页
第4页 / 共33页
应用统计学卡方检验_第5页
第5页 / 共33页
点击查看更多>>
资源描述

《应用统计学卡方检验》由会员分享,可在线阅读,更多相关《应用统计学卡方检验(33页珍藏版)》请在金锄头文库上搜索。

1、BEO2255 Applied Statisticsfor BusinessWeek Six Analyzing categorical data: Chi-squared tests This week lecture will cover.Analysing categorical data (nominal) Chi-square test of differences between proportions Chi-square test of independenceSPSS单样本非参数检验总体分布的总体分布的chi-square检验检验(1)目的目的: 根据样本数据推断总体的分布与

2、某个已知分布是否有显著差异根据样本数据推断总体的分布与某个已知分布是否有显著差异-吻合性检验。吻合性检验。适用于分类资料的统计推断适用于分类资料的统计推断SPSS单样本非参数检验单样本非参数检验l总体分布的chi-square检验(2)基本假设: H0:总体分布与理论分布无显著差异(3)基本方法根据已知总体的构成比计算出样本中各类别的期望频数,计算实际观察频数与期望频数的差距,即:计算卡方值卡方值较小,则实际频数和期望频数相差较小.如果P大于a,不能拒绝H0,认为总体分布与已知分布无显著差异.反之SPSS单样本卡方检验总体分布的总体分布的chi-square检验检验(4)基本操作步骤基本操作步

3、骤:菜单:analyze-nonparametric test-chi square选定待检验变量入test variable list 框确定待检验个案的取值范围(expected range)get from data:全部样本use specified range:用户自定义个案范围指定期望频数(expected values)all categories equal:所有类别有相同的构成比value:用户自定义构成比Categorical variableVariables that describe categories of entitiesDealing with them al

4、l the time in statisticsMaking comparisons among variablesFor example, whether consumers prefer a particular brand of a product among other competing brands.Checking whether there is a relationship between two categorical variables Gender and preference for a product, whether the preference for a pr

5、oduct is independent from genderChi-square test for differences between proportionsThis test involves with nominal data produced by multinomial experimentIt is a generalisation of a binomial experimentThese test the null hypothesis that data in the target population has a particular probability dist

6、ribution.Example 1We might test whether consumers are indifferent to which of four materials (glass, plastic, steel or aluminium) that could be used to make soft drink containers.The null hypothesis is that they are indifferent (or that equal numbers prefer glass, plastic, steel and aluminium).Examp

7、le 1DataLet pG be the probability that an individual selected at random will nominate glass as his/her preference if required to make a choice. Similarly for pP (plastic), pS (steel) and pA (aluminium)HypothesesHO: pG = pP = pS = pA = 0.25.HA: at least one pi 0.25.The alternative is that at least on

8、e material is more preferred (or less preferred) than the others.Example 1cont.Procedure:Select a random sample of, say, 100 consumers and determine their preferences.Under the null hypothesisWe expect 25 consumers to nominate glass, 25 to nominate plastic, 25 to nominate steel and 25 to nominate al

9、uminiumThese are the expected frequencies, Ei.Ei = n pi.We compare the expected frequencies with the sample results or the observed frequencies, Oi. If they are approximately the same we would conclude that the null hypothesis is true.Oi Ei HO is probably true.Example 1cont., Chi squareWe require a

10、test statistic to decide whether the difference is large enough to reject the null hypothesis.We use chi square with G - 1 degrees of freedom where G is the number of groups.Suppose in our example, 39 prefer glass, 16 prefer plastic, 20 prefer steel and 25 prefer aluminium. Recall that the expected

11、frequencies were all 25.Obtain the critical value of chi square Critical 23 = 7.82. Obtain the critical value at 5% significance level at 3 d.f., (Table E4, page 742, Berenson et.al. 2013)i.e. there is only a 5 percent chance or less that 23 7.82 if HO is true. Comparison of chi square values23 = 12

12、.08 7.82 reject HO. Conclusion: at the 5% significance level there is sufficient evidence to reject the null hypothesis. At least one of the probabilities (pi) is different. The sample results indicate that the materials are not equally preferred by consumers in the target population. Thus, at least

13、 preferences for two materials are different.Chi square test using SPSSExample : Suppose that we want to test whether or not customers have a colour preference for packaging. Three different colours, Blue, Green & Purple, are considered. The null hypothesis is that they dont have colour preference.U

14、se Analyse/Nonparametric tests /Chi-Square.The default is that the probabilities are equal.Main display colour2630.0-4.03730.07.02730.0-3.090BlueGreenPurpleTotalObserved NExpected NResidualExample: We test the null hypothesis that consumers in the Example: We test the null hypothesis that consumers

15、in the target population have no preference for any of three target population have no preference for any of three colours of packaging.colours of packaging.Numbers of consumers actually choosing particular colours.Numbers of consumers expected to choose particular colours if the null is true.Main d

16、isplay colour2630.0-4.03730.07.02730.0-3.090BlueGreenPurpleTotalObserved NExpected NResidualDifferent but differentenough to reject the null? Test Statistics2.4672.291Chi-SquareadfAsymp. Sig.Main DisplayColour0 cells (.0%) have expected frequencies less than5. The minimum expected cell frequency is

17、30.0.a. Degrees of freedom,groups - 1Chi-square statisticTest Statistics2.4672.291Chi-SquareadfAsymp. Sig.Main DisplayColourCheck this to test the null.Ho: Consumers in the target population have no preference for any of three colours of Ho: Consumers in the target population have no preference for

18、any of three colours of packagingpackagingH1: Consumers in the target population have preference for at least one of three H1: Consumers in the target population have preference for at least one of three colours of packaging.colours of packaging.Check the sig value to test Ho Cannot reject the null

19、(Ho) that all three colours are equally preferredbecause Sig 0.05.Conclusion: At 5% significance level there is no sufficient evidence to conclude that consumers in the target population have preference for at least one of three colours of packaging. Tests of independence Chi-squared test of a conti

20、ngency tableThis test satisfies two different problem objectives :Are two nominal variables related? Are there differences among two or more population of nominal variables?Consider the following 3 featuresHeight in centimetres, Weight in kilograms & Colour of eyes.Whilst some people are tall and th

21、in, on average taller people weigh more than shorter people.Weight and height are not independent. It seems unlikely that people with blue eyes weigh more, on average, than people with brown eyes.Weight and eye colour are almost certainly independent.交叉分组下的频数分析目的 了解不同变量在不同水平下的数据分布情况 例:学习成绩与性别有关联吗?(两

22、变量)例:职业、性别、爱逛商店有关联吗?(三变量)分析的主要步骤产生交叉列联表分析列联表中变量间的关系产生交叉列联表什么是列联表列变量行变量地区控制变量频数产生交叉列联表基本操作步骤(1)菜单选项: analyze-descriptive statistics- crosstabs(2)选择一个变量作为行变量到row框.(3)选择一个变量作为列变量到column框.(4)可选一个或多个变量作为控制变量到layer框.控制变量的层次设置:同层为水平数加水平数加;不同层为水平数积水平数积.(5)是否显示各分组的棒图(display clustered bar charts )产生交叉列联表进一步计

23、算 cells选项:选择在频数分析表中输出各种百分比.row:行百分比(Row pct);column:列百分比(Col pct);total:总百分比(Tot pct); 分析列联表中变量间的关系目的: 通过列联表分析,检验行列变量之间是否独立。方法: 卡方检验:对品质数据的相关性进行度量分析列联表中变量间的关系卡方检验 年龄与工资收入交叉列联表 低 中 高 青 400 0 0 中 0 5000 老 0 0 600 低 中 高 青 0 0 500 中 0 6000 老 400 0 0分析列联表中变量间的关系卡方检验基本步骤(1)H0:行列变量之间无关联或相互独立(2)构造卡方统计量统计量服从

24、(r-1)*(c-1)个自由度的卡方分布count:观察(实际)频数expected count:期望频数(期望频数反映的是H0成立情况下的数据分布特征)Residual:剩余(观察频数-期望频数)不患肺癌不患肺癌患肺癌患肺癌总计总计不吸烟不吸烟7775427817吸烟吸烟2099492148总计总计98749199651、列联表2、三维柱形图3、二维条形图不患肺癌患肺癌吸烟不吸烟不患肺癌患肺癌吸烟不吸烟080007000600050004000300020001000从三维柱形图能清晰看出从三维柱形图能清晰看出各个频数的相对大小。各个频数的相对大小。从二维条形图能看出,吸烟者中从二维条形图能

25、看出,吸烟者中患肺癌的比例高于不患肺癌的比例。患肺癌的比例高于不患肺癌的比例。通过图形直观判断两个分类变量是否相关:通过图形直观判断两个分类变量是否相关:Tests of independence contExample 2Suppose we interviewed 400 people & asked themwhich of three age groups they are in (under 25, 25 to 60, and over 60).We also ask their response to the statement that “All imports of autom

26、obiles should be banned in order to protect the local industry” (agree, no view either way, disagree).attitudes towards banning importsagreeno viewdisagree Total age groupunder 2519 53 25 9725 - 6046 94 47 187over 6030 56 30 116Total95203102 400Tests of independence contExample 2 cont.Null hypothesi

27、s: The null hypothesis is that answers to the two questions are independent.Under the null:Probover 60 and agree = Probover 60 ProbagreeMultiplication rule for independent eventsExpected frequency= Probover 60 Probagree sample size.ProcedureWe set up a cross-tabulation showing the observed frequenci

28、es of answers to the two questions.We calculate the expected frequencies.TestOur test is based on a comparison of the observed and expected frequencies.Short-cut for expected frequenciesAge *attitude to banning imports Cross tabulation19.053.025.097.023.049.224.796.946.094.047.0187.044.494.947.7187.

29、030.056.030.0116.027.658.929.6116.195.0203.0102.0400.095.0203.0102.0400.0CountExpected CountCountExpected CountCountExpected CountCountExpected CountUnder 2525-60Over 60AgeGroupTotalAgreeNo viewDisagreeAttitude to ban importsTotalCalculation for expectedfrequency of agree and over 60,95 116 / 400Age

30、 *attitude to banning imports Cross tabulation19.053.025.097.023.049.224.796.946.094.047.0187.044.494.947.7187.030.056.030.0116.027.658.929.6116.195.0203.0102.0400.095.0203.0102.0400.0CountExpected CountCountExpected CountCountExpected CountCountExpected CountUnder 2525-60Over 60AgeGroupTotalAgreeNo

31、 viewDisagreeAttitude to ban importsTotalThe count (observed) and the expected are different, but different enough to reject the null?Chi-squared test for independenceRationale:Oij Eij HO is probably true.Test statisticWe require a test statistic to decide whether the difference is large enough to r

32、eject the null hypothesis.Chi-Square Tests1.438a4.8371.5174.8051.3071.758400Pearson Chi-SquareLikelihood RatioLinear-by-LinearAssociationN of Valid CasesValuedfAsymp. Sig.(2-sided)0 cells (.0%) have expected count less than 5. Theminimum expected count is 23.0.a. Calculated value ofChi-Square.Degree

33、s of freedom,(rows - 1) (columns - 1)Chi-Square Tests1.438a4.8371.5174.8051.3071.758400Pearson Chi-SquareLikelihood RatioLinear-by-LinearAssociationN of Valid CasesValuedfAsymp. Sig.(2-sided)0 cells (.0%) have expected count less than 5. Theminimum expected count is 23.0.a. Cannot rejectthe null that all attitude andage are independentbecause Sig 0.05.H0: attitudes and age are independent.H1: attitudes and age are dependent.Conclusion: At 5% significance level we are unable to conclude that age & attitudes towards banning automobile imports are dependent.

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 医学/心理学 > 基础医学

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号