the r book count data in tables

上传人:aa****6 文档编号:37107370 上传时间:2018-04-07 格式:PDF 页数:29 大小:300.25KB
返回 下载 相关 举报
the r book count data in tables_第1页
第1页 / 共29页
the r book count data in tables_第2页
第2页 / 共29页
the r book count data in tables_第3页
第3页 / 共29页
the r book count data in tables_第4页
第4页 / 共29页
the r book count data in tables_第5页
第5页 / 共29页
点击查看更多>>
资源描述

《the r book count data in tables》由会员分享,可在线阅读,更多相关《the r book count data in tables(29页珍藏版)》请在金锄头文库上搜索。

1、15Count Data in TablesThe analysis of count data with categorical explanatory variables comes under the heading of contingency tables. The general method of analysis for contingency tables involves log-linear modelling, but the simplest contingency tables are often analysed by Pearsons chi-squared,

2、Fishers exact test or tests of binomial proportions (see p. 365).15.1A two-class table of countsYou count 47 animals and find that 29 of them are males and 18 are females. Are these data sufficiently male-biased to reject the null hypothesis of an even sex ratio? With an even sex ratio the expected

3、number of males and females is 47/2 = 23.5. The simplest test is Pearsons chi-squared in which we calculate2=?(observed expected)2expected.Substituting our observed and expected values, we get2=(29 23.5)2+ (18 23.5)2 23.5= 2.574 468.This is less than the critical value for chi-squared with 1 degree

4、of freedom (3.841), so we conclude that thesex ratio is not significantly different from 50:50. There is a built-in function for this:observed |z|) (Intercept)3.15700.145921.64 | Chi |) 101.021e-14 210.11715-1-0.117150.73215There is no interaction between seed colour and seed shape (p = 0.732 15) so

5、 we conclude that the two traits are independent and the phenotypes are distributed 9:3:3:1 as predicted. The p value is slightly different because the ratios of the two dominant traits are not exactly 3:1 in the data: round to wrinkled is exp(1.089 04) = 2.971 42 and yellow to green is exp(1.157 02

6、) = 3.180 441:summary(model2)Coefficients: EstimateStd. Errorz valuePr( | z |) (Intercept)4.600270.0901351.04Chi) 10 0.00000000 21 0.00079137 -1 -0.000791370.9776This shows very clearly that the interaction between caterpillar attack and leaf holing does not differ fromtree to tree (p = 0.977 56). N

7、ote that if this interaction had been significant, then we would have stopped the modelling at this stage. But it was not, so we leave it out and continue. What about the main question? Is there an interaction between aphid attack and leaf holing? To test this we delete the Caterpillar by Aphid inte

8、raction from the model, and assess the results using anova:model3 Chi) 120.0040853 210.00079141 0.0032940.9542There is absolutely no hint of an interaction (p = 0.954). The interpretation is clear: this work provides no evidence at all for induced defences caused by early season caterpillar feeding.

9、 But look what happens when we do the modelling the wrong way. Suppose we went straight for the interaction of interest, Aphid by Caterpillar. We might proceed like this:wrong Chi) 14550.19 25556.85 -1-6.6594 0.009864 *606THE R BOOKThe Aphid by Caterpillar interaction is highly significant (p = 0.01

10、), providing strong evidence for induced defences. This is wrong! By failing to include Tree in the model we have omitted an important explanatory variable. As it turns out, and as we should really have determined by more thorough preliminary analysis, the trees differ enormously in their average le

11、vels of leaf holing:as.vector(tapply(Count,list(Caterpillar,Tree),sum)1/tapply(Count,Tree,sum) 1Tree1 0.01963439as.vector(tapply(Count,list(Caterpillar,Tree),sum)3/tapply(Count,Tree,sum) 2Tree2 0.08182241Tree2 has more than four times the proportion of its leaves holed by caterpillars. If we had bee

12、n paying more attention when we did the modelling the wrong way, we should have noticed that the model containing only Aphid and Caterpillar had massive overdispersion, and this should have alerted us that all was not well.The moral is simple and clear. Always fit a saturated model first, containing

13、 all the variables of interest and all the interactions involving the nuisance variables (Tree in this case). Only delete from the model those interactions that involve the variables of interest (Aphid and Caterpillar in this case). Main effects are meaningless in contingency tables (they do nothing

14、 more than constrain the marginal totals), as are the model summaries. Always test for overdispersion. It will never be a problem if you follow the advice of simplifyingdown from a saturated model, because you only ever leave out non-significant terms, and you never delete terms involving any of the

15、 nuisance variables.15.7Quasi-Poisson and negative binomial models comparedThe data on red blood cell counts are read from a file:data | t |) (Intercept)0.181150.021678.360|z|) (Intercept)0.181150.021608.388 | Chi |) 10-5.329e-15 213.08230-1-3.082300.07915The interaction is not significant (p = 0.07

16、9), indicating similar gender by discipline relationships in the twoyear groups. We finish the analysis at this point because we have answered the question that we were asked to address.610THE R BOOK15.9Schoeners lizards: A complex contingency tableIn this section we are interested in whether lizards show any niche separation across various ecological factors and, in particular, whether there are any interactions for example, whether they show different habitat separati

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 学术论文 > 毕业论文

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号