《大学英语四级考试网考语言能力结构初探》由会员分享,可在线阅读,更多相关《大学英语四级考试网考语言能力结构初探(78页珍藏版)》请在金锄头文库上搜索。
1、multimedia innovative item types that are not feasible in the PBT format, and the abilityto measure response time (Boo Klein Bennett, 2001,2002; Parshall, Spray, Kalohn, Swinton and Powers, 1980; Bachman and Palmer, 1981; 1982; Upshur andHomburg, 1983; Vollmer, 1983; Vollmer and Sang, 1983; Sang et
2、al., 1986; Boldt, 1988),as well as specific studies on oral communication (Hinofotis, 1983), and pronunciation(Purcell, 1983), have confirmed that language proficiency is multi-componential and notunitary as what was proposed by Oller. In addition, results in subsequent researches, withmore powerful
3、 factor analytic approaches used, were not in compliance with the unitarytrait hypothesis that one general factor sufficiently accounts for all of the commonvariances in language tests either (e.g., Bachman Carroll, 1983;Bachman, Davidson, Ryan Kunnan, 1995; Sasaki, 1996; Shin, 2005).Douglas (2000:
4、25) concluded that “As has become clear in recent years throughempirical studies conducted by language testers and others, language knowledge is multi-componential; however, what is extremely unclear is precisely what those componentsmay be and how they interact in actual language use.” Recent studi
5、es on the competencestructure of language tests such as the computer-based TOEFL (for example, Stricker, etal., 2005, 2008; Sawaki, et al., 2008, 2009) identify specific first-order factors such aslistening, reading, speaking and writing factors corresponding to the four language skills,providing mo
6、re support for the view that language ability is divisible.The current consensus in the field of language testing is that second language abilityis multi-componential, with a general factor as well as smaller factors (Oller, 1983;Carroll, 1983). In general, recent researches agree that language prof
7、iciency mostprobably consists of a general higher-order factor and several distinct first-order abilityfactors (e.g. Bachman Carroll, 1983; Bachman, Davidson Fouly, et al., 1990; Bachman, Davidson, Ryan Sasaki,1996; Choi, et al., 2003; Shin, 2005; Sawaki, et al., 2008, 2009; Stricker, et al, 2008).H
8、owever, there was no consensus in terms of the exact factor structures that wereidentified. Some studies found correlated first-order factors (e.g., Bachman Sang, et al., 1986; Kunnan, 1995; Stricker, et al., 2005;), while others found first-order factors as well as a higher-order general factor (e.
9、g., Bachman Sasaki, 1996; Shin, 2005; Stricker, et al., 2008; Sawaki, et al., 2008, 2009; Bae Parshall et al., 2002).2.2.1 Comparability between PBT Bugbee, 1996): Scores from conventional and computer administrations may beconsidered equivalent when (a) the rank orders of scores of individuals test
10、ed inalternative modes closely approximate each other, and (b) the means, dispersions andshapes of the score distributions are approximately the same, or have been madeapproximately the same by rescaling the score from the computer mode.Since the 90s of the last century, score comparability between
11、the paper-based testand the computer-based test has been extensively investigated. However, the resultscontradicted in previous researches. Some studies indicated that the CBT scores wereequivalent to the PBT scores (Bergstrom, 1992; Bugbee, 1996; Boo Wang, Newman, Choi Johnson Godwin, 1999; Pommeri
12、ch Coon, Mcleod, and Thissen, 2002).For the purpose of this paper, reviews of thecomparability studies between the paper-based and the computer-based tests focus onthose on the commonality of score meaning across different delivery modes, that is, onpaper or by computer.Choi and Tinkler (2002) evalu
13、ated Oregon students in the third and tenth grades(800 students for each grade) with multiple-choice reading and mathematics items7delivered on paper and by computer. They found that items presented on computer weremore difficult than those presented on paper, and that this difference was more signi
14、ficantfor third-grader students and for reading items. Coon, McLeod, and Thissen (2002)conducted a similar research on students from the North Carolina Department of PublicInstruction. They assessed third graders in reading items and fifth graders in math itemsin multiple-choice form, with roughly 1
15、,300 students in each grade taking paper testforms and 400 students taking the same test forms on computer. Results indicated thatscores were not comparable for either grade; the scale scores of the paper tests werehigher than those of the online tests. In addition, the mode differences were not the
16、 sameacross forms within grades, the delivery-mode by ethnic-group interaction, for example,which indicated that mode differences might vary among population groups. The authorfurther pointed out that such lack of consistency suggested that comparability betweenthese particular tests in two administration modes could not be achieved by a simplescore equivalence using data from the total population groups.Choi, et al. (2003) addressed the issue of the comparability between the comp