文档详情

作文整体评分与分项评分方法的比较研究

E****
实名认证
店铺
PDF
909.76KB
约85页
文档ID:114817378
作文整体评分与分项评分方法的比较研究_第1页
1/85

湖南大学 硕士学位论文 作文整体评分与分项评分方法的比较研究 姓名:罗娟 申请学位级别:硕士 专业:外国语言学及应用语言学 指导教师:肖云南 20070423 硕士学位论文 I 摘 要 摘 要 写作测试作为一种行为测试在近年得到了很大重视, 并广泛应运于各种大小型测试 中写作测试以其高效度著称,但写作测试属于主观性测试,不可避免地受到评分过程 中的评分人的主观因素的影响,因此评测结果缺乏信度 无论采取哪种评分方法,评分信度是行为测试中的重点之一对于写作评分过程中 使用的评分方法 —— 整体评分法与分项评分法,在语言测试界中存在着很大的争议 使用哪种评分方法可达到更高的评分信度,至今尚无定论,且相关的实证性研究为数不 多 本文主要通过在实验中使用整体法和分项法两种评分标准, 采用概化理论及分析工 具GENOVA,SPSS13.0, 对比两种评分方法的优缺点,特别是评分效率和评分信度、以 及对测试决策的影响,进行整体与分项两种作文评分方法的定性定量的比较研究 研究步骤分为三步:1. 写作测试的实施及60份样卷的随机抽取;2. 4位有丰富写作 评分经验的评分员在严格控制下完成2组评分实验;3. 实验后对评分员进行问卷调查。

研究结果表明:(1) 整体评分法的评分效率显著高于分项评分法,整体评分法在经济性 与可操作性方面比分项评分法更具优势; (2) 整体评分法的评分结果概化系数及可靠性 系数与分项评分法相比都更高; (3) 因此,在大规模的写作测试评分中, 使用整体评分 标准将更加合适 以上结论为怎样使用两种评分方法更好进行写作评分提出了建议 本文研究的价值 主要有 2 个方面:首先,本研究在 Bachman 及 McNamara 的理论的基础上,指出了评分 方法是影响写作测试结果的一个因素;其次,为实施语言测试中的写作评分提出了参考 意见,并且基于评分结果的准确性对测试决策的公平性做出了新的贡献 最后, 作者还提出了本文研究的局限和不足以及在此领域有待进一步研究的问题 关键词:关键词:作文测评;信度;评分方法;对比研究 作文整体评分与分项评分方法的比较研究 II Abstract As one important form of performance tests, writing has been attached great importance in recent years. It is widely used in large-scale English examinations, as well as in small-scale ones. Writing is featured by high validity, however, it lacks reliability in scoring. Because writing is a kind of subjective test, the answers are open-ended and thus provide discrepancy between different raters. No matter what the test method is, the reliability of ratings is one of the major issues in performance-based assessment. The evaluation and assessment of second-language writing has been one major area in testing writing, because there exist great controversies in the reliability of two scoring methods - holistic scoring and analytic scoring, but empirical studies on it are comparatively rare. This paper examines the strengths and weaknesses of holistic and analytic scoring methods, emphasizing on the comparison of the scoring efficiency, scoring reliability and testing decision of the two scoring methods. With the application of generalizability theory and specially developed program GENOVA, SPSS 13.0, the research has been carried out with 60 writing samples and 4 raters. The whole process of study can be divided into three steps: Firstly, 60 writing papers were collected, randomly selected and ordered; Secondly, four raters were asked to mark the 60 writing papers in the same period of time; Thirdly, they finished the questionnaire in a language lab. The analysis of the results prove at last: (1) Comparison shows that the holistic scoring is far less time consuming than the analytic one,and possesses significant advantage in economy and practicality; (2) The reliability coefficients – generalizability coefficients and Dependability coefficients – derived from holistic scoring are significantly higher than those of analytic scoring; (3) Due to its higher reliability, the conclusion of this study is that holistic scoring is more appropriate in high-stakes assessment program, if detailed information of candidates’ writing ability is not required. These results indicate how we can apply the two scoring methods to better assess writing 硕士学位论文 III in tests. The value of this research lies in 2 aspects: (1) Extension of the findings of Bachman and McNamara. (2) Suggestions to advance writing scoring administration and contribution to the current discussion about test fairness. In the end, the paper shows research drawbacks and areas that more attentions are to be given in future scoring methods research. Key words:: writing assessment; reliability; scoring method; comparative study 作文整体评分与分项评分方法的比较研究 IV Tables Table 2.1 Components of communicative language ability…………………….…………6 Table 2.2 A comparison of holistic and analytic scales in terms of six qualities of test usefulness…………………………………………………………………14 Table 2.3 Experimental research on the comparison of two scoring methods…………18 Table 3.1 Experiment schedule…………………………………………………………29 Table 4.1 Data collected in the experiment……………………………….……………32 Table 4.2 Scoring time and cost with analytic and holistic methods…………….………33 Table 4.3 Correlation with scores on speaking and reading……………………………34 Table 4.4 Correlations between raters in the holistic scorings and analytic scorings……36 Table 4.5 Wilcoxon signed – ranked test………………………………………….……37 Table 4.6 Spearman Rank correlation between two scoring methods…………….……38 Table 4.7 Wilcoxon signed-ranked test between two tasks……………………….……40 Table 4.8 G study variance components for p×i×r design……………….………………41 Table 4.9 D Study for p×i×r design…………………………………………………43 Table 4.10 Design of different testing context in D study……………………………….44 Table 4.11 Changes in G and phi coefficients by raters in D study……………………46 Table 4.12 General perception of the two scoring methods……………………………48 硕士学位论文 V Figures Figure 2.1 Factors in writing assessment……………………………………….…………8 Figure 3.1 G Study sources of variance for P*R*T design…….……………………….25 Figure 4.1 Comparison of G coefficients of two scoring results in the same testing conte。

下载提示
相似文档
正为您匹配相似的精品文档