浙江大学肖忠华语料库4课件

上传人:人*** 文档编号:567296803 上传时间:2024-07-19 格式:PPT 页数:44 大小:420.50KB
返回 下载 相关 举报
浙江大学肖忠华语料库4课件_第1页
第1页 / 共44页
浙江大学肖忠华语料库4课件_第2页
第2页 / 共44页
浙江大学肖忠华语料库4课件_第3页
第3页 / 共44页
浙江大学肖忠华语料库4课件_第4页
第4页 / 共44页
浙江大学肖忠华语料库4课件_第5页
第5页 / 共44页
点击查看更多>>
资源描述

《浙江大学肖忠华语料库4课件》由会员分享,可在线阅读,更多相关《浙江大学肖忠华语料库4课件(44页珍藏版)》请在金锄头文库上搜索。

1、Introducing Corpus LinguisticsCorpus LinguisticsRichard Xiao浙江大学肖忠华语料库(4)Module descriptionSince the 1990s, the corpus methodology has revolutionized nearly all branches of linguisticsCorpus analysis can be illuminating in “virtually all branches of linguistics or language learning.” (Leech 1997)One

2、 of the strengths of corpus data lies in its empirical and attested nature pools together the intuitions of a great number of speakers makes linguistic analysis more objective This module introduces the theoretical and practical issues of using corpora in linguistic studies explores how the corpus-b

3、ased approach and other methodologies can be combined in linguistic studies浙江大学肖忠华语料库(4)Aims of the moduleThe module aims to provide an introduction to corpus linguistics;familiarise students with major corpus resources and tools;pass on essential knowledge and skills for building DIY corpora; to ke

4、ep students up to date with the latest developments in corpus research;develop students ability in corpus-based language studies.浙江大学肖忠华语料库(4)Contents1)Introducing corpus linguistics2)Corpus design and types of corpora3)Data capture and markup4)Corpus annotation5)Making statistic claims6)Corpus anal

5、ysis (1): concordance and wordlist7)Corpus analysis (2): keyword analysis8)Corpora in lexicographic and lexical studies9)Corpora in grammatical studies10) Corpora in diachronic studies11)Corpora in language variation research12)Corpora in sociolinguistic studies13)Corpora in language education14)Cor

6、pora in literary and stylistic studies15)Corpora in critical discourse analysis16)Corpora in contrastive and translation studies浙江大学肖忠华语料库(4)Learning outcomesOn successful completion of the module, students will be able tounderstand the major theoretical frameworks in corpus linguistics and formulat

7、e research questions that are amenable to corpus research;think critically about the strengths and weaknesses of the corpus methodology and decide when and how to interface it with other methodologies;get familiar with major corpus resources and tools and to develop DIY corpora when necessary;apply

8、the corpus-based approach in their own research.浙江大学肖忠华语料库(4)Teaching/learning strategiesWith a dual focus on why and how to in corpus-based language studies, this practical module will be delivered through a series of lectures and hands-on lab sessionsThe module also engages students in extensive r

9、eading and interaction with corpus data outside of class浙江大学肖忠华语料库(4)AssessmentOption AA 1,000-word essay that critically reviews a corpus exploration tool or a corpus-based study (40%)A 2,500-word project report (60%)Option BOne 3,500-word essay based on a research project of your own choice (100%)

10、Deadline: SubmissionA Word copy as email attachment 浙江大学肖忠华语料库(4)Reading listSet textMcEnery, A., Xiao, R. and Tono, Y. (2006) Corpus-Based Language Studies: An Advanced Resource Book. London: Routledge.Wynne, M. (2005) Developing Linguistic Corpora. Oxford: Oxbow Books. Available online at Recommen

11、ded readingSee the module syllabus at the course website浙江大学肖忠华语料库(4)Outline of this sessionLecture: introducing key concepts and debates in corpus linguisticsWhat is and is not a corpus?Why use corpora?Corpora vs. intuitionsThe corpus methodologyA brief history of Corpus LinguisticsNature and appli

12、cations of corpus-based studiesLab: testing your intuitions + exploring online resources浙江大学肖忠华语料库(4)What is a corpus?The word corpus comes from Latin (“body”) and the plural is corporaA corpus is a body of naturally occurring languagebut rarely a random collection of textCorpora “are generally asse

13、mbled with particular purposes in mind, and are often assembled to be (informally speaking) representative of some language or text type.” (Leech 1992)“A corpus is a collection of (1) machine-readable (2) authentic texts (including transcripts of spoken data) which is (3) sampled to be (4) represent

14、ative of a particular language or language variety.” (MXT 2006: 5)浙江大学肖忠华语料库(4)What is not a corpus?A list of words is not a corpusBuilding blocks of languageA text archive is not a corpusA random collection of textsA collection of citations is not a corpusA short quotation which contains a word or

15、phrase that is the reason for its selectionA collection of quotations is not a corpusA short selection from a text chosen on internal criteria by human beingsA text is not a corpusIntending to be read in different waysThe Web is not a corpusIts dimensions unknown, constantly changing, not designed f

16、rom a linguistic perspectiveSinclair (2005)浙江大学肖忠华语料库(4)What is a corpus for?A corpus is made for the study of language in a broad senseTo test existing linguistic theory and hypothesesTo generate and verify new linguistic hypothesesThe purpose is reflected in a well-designed corpus浙江大学肖忠华语料库(4)Why

17、use corpora?Even expert speakers have only a partial knowledge of a languageA corpus can be more comprehensive and balancedEven expert speakers tend to notice the unusual and think of what is possibleA corpus can show us what is common and typicalEven expert speakers cannot quantify their knowledge

18、of languageA corpus can readily give us accurate statistics浙江大学肖忠华语料库(4)Why use corpora?Even expert speakers cannot remember everything they knowA corpus can store and recall all the information that has been stored in itEven experts speakers cannot make up natural examplesA corpus can provide us wi

19、th a vast number of examples in real communication contextEven expert speakers have prejudices and preferences and every language has cultural connotations and underlying ideologyA corpus can give you more objective evidence浙江大学肖忠华语料库(4)Why use corpora?Even expert speakers are not always available t

20、o be consultedA corpus can be made permanently accessible to allEven expert speakers cannot keep up with language changeA constantly updated corpus can reflect even recent changes in the languageEven expert speakers lack authority: they can be challenged by other expert speakersA corpus can encompas

21、s the actual language use of many expert speakers浙江大学肖忠华语料库(4)Intuitions as an alternativeIntuitions are always useful in linguisticsTo invent (grammatical, ungrammatical, or questionable) example sentences for linguistic analysisTo make judgments about the acceptability / grammaticality or meaning

22、of an expressionTo help with categorization浙江大学肖忠华语料库(4)Intuitions as an alternativeIntuitions should be applied with cautionPossibly biased as they are likely to be influenced by ones dialect or sociolectIntrospective data is artificial and may not represent typical language use as one is conscious

23、ly monitoring ones language productionIntrospective data is decontextualized because it exists in the analysts mind rather than in any real linguistic contextIntuitions are not observable and verifiable by everyone as corpora areExcessive reliance on intuitions blinds the analyst to the realities of

24、 language usage because we tend to notice the unusual but overlook the commonplaceThere are areas in linguistics where intuitions cannot be used reliably e.g. language variation, historical linguistics, register and style, first and second language acquisitionHuman beings have only the vaguest notio

25、n of the frequency of a construct or a word浙江大学肖忠华语料库(4)Benefits of corpus dataCorpus data is more reliableA corpus pools together linguistic intuitions of a range of language speakers, which offsets the potential biases in intuitions of individual speakersCorpus data is more naturalIt is used in re

26、al communications instead of being invented specifically for linguistic analysisCorpus data is contextualizedAttested language use which has already occurred in real linguistic contextCorpus data is quantitativeCorpora can provide frequencies and statistics readilyCorpus data can find differences th

27、at intuitions alone cannot perceiveE.g. synonyms totally, absolutely, utterly, completely, entirely浙江大学肖忠华语料库(4)Corpora vs. intuitionsNot necessarily antagonistic, but rather corroborate each other and can be gainfully viewed as being complementaryArmchair linguists and corpus linguists “need each o

28、ther. Or better, the two kinds of linguists, wherever possible, should exist in the same body.” (Fillmore 1992)“Neither the corpus linguist of the 1950s, who rejected intuitions, nor the general linguist of the 1960s, who rejected corpus data, was able to achieve the interaction of data coverage and

29、 the insight that characterize the many successful corpus analyses of recent years.” (Leech 1991)The key to using corpus data is to find the balance between the use of corpus data and the use of ones intuitions浙江大学肖忠华语料库(4)The corpus methodologyIt is debatable whether CL is a methodology or a branch

30、 of linguisticsCL goes well beyond this methodological role and has become an independent disciplineIn spite of the name, CL is indeed a methodology rather than an independent branch of linguistics in the same sense as phonetics, syntax, semantics or pragmaticsThese latter areas of linguistics descr

31、ibe, or explain, a certain aspect of language useCorpus linguistics, in contrast, is not restricted to a particular aspect of language - it can be employed to explore almost any area of linguistic research浙江大学肖忠华语料库(4)A brief history of CLThe term corpus linguistics first appeared only in the early

32、1980s, but corpus-based language study has a substantial historyThe history of CL can be split into two periods: before and after Chomsky浙江大学肖忠华语料库(4)A brief history of CLBefore ChomskyField linguists and linguists of the structuralist tradition used “shoebox corpora” shoeboxes filled with paper sli

33、psTheir methodology was essentially “corpus-based” in the sense that it was empirical and based on observed dataThe work of early corpus linguistics was underpinned by two fundamental, yet flawed assumptions The sentences of a natural language are finite. The sentences of a natural language can be c

34、ollected and enumerated.Most linguists saw the “corpus” as the only source of linguistic evidence in the formation of linguistic theories浙江大学肖忠华语料库(4)A brief history of CLChomsky revolution: Between 1957 and 1965 Chomsky changed the direction of linguistics from empiricism towards rationalism“Any na

35、tural corpus will be skewed. Some sentences wont occur because they are obvious, others because they are false, still others because they are impolite. The corpus, if natural, will be so wildly skewed that the description would be no more than a mere list.” (Chomsky 1962)Our internal knowledge of la

36、nguage in human brain (competence, I-language) replaces observed data (performance, E-language)Intuitions started to be relied on as evidenceXiao, R. (2008) “Theory-driven corpus research: using corpora to inform aspect theory”. In A. Ldeling & M. Kyto (eds.) Corpus Linguistics: An International Han

37、dbook. Berlin: Mouton de Gruyter浙江大学肖忠华语料库(4)A brief history of CLRevival of CLCorpus research was continued in a few centres (Brown, Lancaster) in the 60s-70sThe Brown University Standard Corpus of Present-day American English (Brown corpus)Lancaster-Oslo-Bergen Corpus of BrE (LOB)The hardware stil

38、l imposed some restrictions until the real development started in the 1980sThe marriage of corpora with computer technology rekindled interest in the corpus methodologySince then, the number and size of corpora and corpus-based studies have increased dramaticallyNowadays, the corpus methodology enjo

39、ys widespread popularity, and has opened up or foregrounded many new areas of research浙江大学肖忠华语料库(4)Areas that have used corporaLexicographyLexical studiesGrammatical studiesRegister/genre analysisLanguage variationContrastive analysisTranslation studiesLanguage changeLanguage teachingSemanticsPragma

40、ticsStylisticsLiterary studySociolinguisticsDiscourse analysisForensic linguisticsComputational linguistics浙江大学肖忠华语料库(4)Nature of corpus-based approachIt is empirical, analysing the actual patterns of use from natural textsIt utilises a large and principled collection of natural texts as the basis f

41、or analysisIt makes extensive use of computers for analysis, using both automatic and interactive techniquesIt integrates both quantitative and qualitative analytical techniques(Biber et al 1998: 4-5)浙江大学肖忠华语料库(4)Why use computers?Development of computer technology has revived CLMachine-readability

42、is a de facto attribute of modern corporaElectronic corpora have advantages unavailable to their “shoebox” ancestorsIt is the use of computerized corpora, together with computer programs which facilitate linguistic analysis, that distinguishes modern electronic corpora from early drawer-cum-slip cor

43、pora浙江大学肖忠华语料库(4)Why use computers?Computerized corpora can be processed and manipulated rapidly at minimal costE.g. searching, selecting, sorting and formattingComputers can process machine-readable data accurately and consistentlyComputers can avoid human bias in an analysis, thus making the resul

44、t more reliableMachine-readability allows further automatic processing to be performed on the corpus so that corpus texts can be enriched with various metadata and linguistic analysesCorpus markup and corpus annotation浙江大学肖忠华语料库(4)A question for Deep Thought“Alright,” said the computer Deep Thought.

45、 “The Answer to the Great Question.” “Yes.!”“Of Life, the Universe and Everything .” said Deep Thought. “Yes.!”“Is.”“Yes.!.?” “Forty-two,” said Deep Thought, with infinite majesty and calm.It was a long time before anyone spoke. “Forty-two!” yelled someone in the audience. “Is that all youve got to

46、show for seven and a half million years work?”“I checked it very thoroughly,” said the computer, “and that quite definitely is the answer. I think the problem, to be quite honest with you, is that youve never actually known what the question is.” Hitchhikers Guide to the Galaxy by Douglas AdamsWhat

47、can we learn from this story?浙江大学肖忠华语料库(4)What corpora cannot doCorpora do not provide negative evidenceCannot tell us what is possible or not possibleCan show what is central and typical in languageCorpora can yield findings but rarely provide explanations for what is observedInterfacing other meth

48、odologiesThe use of corpora as a methodology also defines the boundaries of any given studyImportance of amenable research questionsThe findings based on a particular corpus only tell us what is true in that corpusGeneralisation vs. representativenessSee Unit B2 for pros and cons of corpora浙江大学肖忠华语料

49、库(4)Ask corpora the right questionsCorpus linguistics as a methodology is only one of the (many) ways of doing things “doing linguistics”The usefulness of corpora depends upon the research question being investigated“They are invaluable for doing what they do, and what they do not do must be done in

50、 another way.” (Hunston 2002: 20)The development of the corpus-based approach as a tool in language studies has been compared to the invention of telescopes in astronomyIf it is ridiculous to criticize a telescope for not being a microscope, it is equally pointless to criticize the corpus-based appr

51、oach for not doing what it is not intended to doIt is up to you to formulate research questions amenable to corpus-based investigation and to decide how to combine corpora with other resources浙江大学肖忠华语料库(4)Testing your intuitions with VIEW浙江大学肖忠华语料库(4)Most common noun in EnglishSearch for n*Top 10: t

52、ime, people, way, years, year, work, government, day, man, world浙江大学肖忠华语料库(4)Most common noun in advertsSearch for nn* in w-advertTop 20: hotel, centre, time, world, holiday, day, service, year, house, facilities, range, club, bar, years, information, rooms, people, city, life, castle浙江大学肖忠华语料库(4)To

53、p 10 adj. in nonfiction vs. fictionTop 10 in Nonfiction: aggregate, regulatory, offline, Keynesian, non-executive, macroeconomic, no-arbitrage, nationalised, short-run, pioneeringTop 10 in Fiction: Sabine, narrowed, unsmiling, flushed, clammy, navy-blue, sidelong, muttered, strangled, froggy浙江大学肖忠华语

54、料库(4)Distribution of phrasal verbs浙江大学肖忠华语料库(4)“Talk” as a noun/verb in different registers浙江大学肖忠华语料库(4)Synonyms: utter vs. sheer浙江大学肖忠华语料库(4)Semantic prosody of causedNoun collocates of “CAUSE”:problems, damage, death, trouble, harm, concern, injury, problem, difficulties, loss, confusion, pain浙江大学

55、肖忠华语料库(4)data singular or plural?Per million words:Singular: 776Academic: 21misc: 9.2spoken: 1.9newspaper: 1.6fiction: 0.3Per million words:Plural: 1,035academic: 42.5misc: 8.8spoken: 0.2fiction/news: 0.1浙江大学肖忠华语料库(4)How are women and men described?浙江大学肖忠华语料库(4)reason for vs. reason to浙江大学肖忠华语料库(4)E

56、xtra Practice with BNC VIEW1) What are the top 5 modal verbs in English?2) Is there any difference between verbs destroy, ruin, and demolish? If so, what is it?3) Do you think the adjectives in “utterly + adjective” have anything in common? If so what is that?4) Can we use the plural form of research as in “his researches”?浙江大学肖忠华语料库(4)Where to find whatBYU-BNCBank of English (56M sample)David Lees CL bookmarksCorpus linguistics, translation, and language learningCorpus4u Community浙江大学肖忠华语料库(4)

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 医学/心理学 > 基础医学

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号