经典真分数测量理论

资源描述

《经典真分数测量理论》由会员分享，可在线阅读，更多相关《经典真分数测量理论（7页珍藏版）》请在金锄头文库上搜索。

1、经典真分数测量理论Classical True Score Measurement Theory(CTS)人们将以真分数理论为核心理论假设的测量理论及其方法体系统称为经典测验理论(CTT)，也称真分数理论(CTS)。真分数是指被测者在所测特质(如能力、知识、个性等) 上的真实值，即真分数 (True Score) 。而通过一定测量工具(如测验量表和测量仪器 )进行测量，在测量工具上直接获得的值 (读数) ，叫观测值或观察分数(Observed Score)。由于有测量误差存在，所以，观察值并不等于所测特质的真实值，即观察分数中含有真分数和误差分数(Error Score)。而要获得对真实分数的

2、值，就必须将测量的误差从观察分数中分离出来。真分数理论三个假设及两个推论真分数理论假设 (1)：真分数具有不变性这一假设的实质是指真分数所指代的被测者的某种特质必须具有某种程度的稳定性，至少在所讨论的问题范围内，或者在一个特定的时间内，个体具有的特质为一个常数，保持恒定。真分数理论假设 (2)：真误差是完全随机的【假设公理一】：测量误差是一个平均数为零的正态随机变量。在多次测量中，误差有正有负。如果测量误差为正值，观测分数就会高于其实际的分数（真分数）；如果测量误差为负值，则观测分数就会低于其实际的分数，即观察分数会出现上下波动的现象。但是，只要重复测量次数足够多，这种正负偏差就会两相抵消，

3、测量误差的平均数恰好为零。用数学式表达为：E(E)=0 。【假设公理二】：测量误差分数与所测的特质或者说真分数之间相互独立。不仅如此，测量误差之间、测量误差与所测特质外其它变量间，也相互独立。或者说，他们之间的相关为零【注释：如果承认这种交互作用，则只能用 GT 来解释和计算】。真分数理论假设 (3)：观测分数是真分数与误差分数的和S【含义】：观察分数与真实分数之间是线性关系，而不是其它关系。相差的就是误差分数。真分数理论推论 (1)真分数等于观察分数的平均数（T=E(X) ）（Gulliksen，1950）【含义】：若一个人的某种心里特质可以用平行的测验反复测量足够多次，则其观察分数的平

4、均值会接近于真分数。真分数理论推论 (2)在一组测量分数中，观察分数的变异数（方差）等于真分数的变异数（方差）与误差分数的变异数（方差）之和。S2X= S2T + S2E【注释】：这里的误差分数方差是随机误差的方差，系统误差的方差包含在真分数方差中，可以理解为：真分数方差=与测量目的相关方差*与测量目的无关的系统性方差经典测量理论在真分数理论假设的基石上构建起了它的理论大厦，主要包括信度、效度、项目分析、常模、标准化等基本概念。Measurement Error Measurement error (or error variance) is a term that describe the

5、VARIANCE in scores on a test that is not directly related to the purpose of the test. The performances of students on any test will tend to vary from each other, but their performances can vary for a variety of reasons. These variables fall into two general sources of variance: (a) those creating va

6、riance related to the purpose of the test (called meaningful variance), and (b) those generating variance due to other extraneous sources (called measurement error, or error variance).In order to minimize all those undesirable test-purpose-unrelated variance in students scores, test developers must

7、use the following tables as carefully as possible.为保证有效性抽样，一般得先从目标能力 A 中选出一个有效的能力抽样 a ，然后找出能表征这个能力抽样 a 的行为 b，那么这些行为就应该是全部目标行为的有效抽样了。假设命题（1）B -A 假设命题（2）a -A 假设命题（3）b - a 推导命题（4）b - B 上述（1），（2），（3）假设关系确定后，我们推出 b-B 之间的命题关系。推导命题（5） b -A 根据所测试的行为抽样推论出目标能力。考试就此结束了吗？语言测量是对语言行为的属性进行量化；所以语言行为抽样 b 的

8、测量最终要体现在分数或等级上；即测量结果反馈 F。假设命题（6）：F 是 b 的正确标示，即 F - b 假设命题（1）B -A 假设命题（2）a -A 假设命题（3）b - a 推导命题（4）b - B 上述（1），（2），（3）假设关系确定后，我们推出 b-B 之间的命题关系。推导命题（5） b -A 根据所测试的行为抽样推论出目标能力。假设命题（6） F -b 语言行为抽样 b 的测量最终要体现在分数或等级上推导命题（7） F -AIn general test reliability is defined as the extent to which the resu

9、lts can be considered consistent or stablePersonal attributes that are not related to language ability include: individual characteristics such as- cognitive style and - knowledge of particular content areas group characteristics such as- sex- race- ethnic backgroundRandom factors are largely unpred

10、ictable and temporary such as1) Mental alertness or emotional state, and2) Uncontrolled differences in test method facets e.g., changes of test environment from one day to the nextThe degree to which a test is consistent, or stable, can be estimated by calculating a reliability coefficient. 两个原则性问题：

11、针对信度，回答问题：How much variance in test scores is due to measurement error?针对效度，回答问题：What specific abilities account for the reliable variance in test scores? The point is that, a test can be reliable without being valid. In other words, a test can consistently measure something other than that for whic

12、h it was designed (这是因为信度是考试分数本身的属性，而效度是对考试分数解释和使用的准确性，所以两者虽密切联系，却性质不同). Hence test reliability and validity, though related, are different test characteristics. In fact, reliability can be viewed as a precondition for validity, that is, a test cannot be valid unless it is first reliable. Validity i

13、s especially important when it is involved in the decisions that teachers regularly make about their students. Teachers certainly want to base their admissions, placement, achievement, and diagnostic decisions on tests that are actually testing what they claim to measure. Adopting, developing, and a

14、dapting tests for such decisions is difficult enough without having to also worry about whether the tests are measuring the wrong student characteristics, abilities, proficiencies, etc.【基本问题】1) 测量什么属性；2) 对所欲测量的属性所测到的程度。1）效度是针对测验结果而言的。即测验效度是测验结果的有效性程度。不是测验本身。（2）效度是针对测验特定目的而言的。它不具备普遍性。所以在评价一个测验的效度时，必须

15、考虑到其特殊用途，指明其对测量什么有效。（3）效度只有程度上的差异。它不是“有”和“无”的差别。使用“高” 、 “中” 、“低”来描述。考试效度研究并不是检验考试内容本身，也不是检验考试分数本身的“效度”( 考试分数本身不存在效度，仅仅存在信度问题 -LP)，而是检验解释和使用考试分数的方式的效度。Content Relevance involves the specification of ability domain (Bachman, 1990:42-4, about operationally defining constructs); requires the specificat

16、ion of the test method facets (ibid:119)(e.g., what it is that the test measures, the attributes of the stimuli that will be presented to the test-takers, the nature of the responses that the test taker is expected to make);Content Coverage wish to have a well-defined domain that specified the entire set, or population, of possible test tasks; then, we could follow a standard procedure

展开阅读全文