《EnrichingWordAlignmentwithLinguisticTagsLinguisticData》由会员分享,可在线阅读,更多相关《EnrichingWordAlignmentwithLinguisticTagsLinguisticData(18页珍藏版)》请在金锄头文库上搜索。
1、Enriching Word Alignment with Linguistic Tags Linguistic Data Consortium, IBMXuansong Li, Niyu Ge, Stephen Grimes, Stephanie M. Strassel, Kazuaki Maedaxuansong, sgrimes, strassel, maeda Outline uMotivationsuApproaches and methodologiesuLinguistic tags uInter-annotator agreementuConclusionsMotivation
2、suTo improve automatic word alignment qualityuTo reduce data amount needed for statistic modelsuSupervised models outperform traditional models uA part of GALE by DARPA: manually aligned and tagged data Chinese-English WAUnified Annotation Schemealignment frameworktagging frameworkminimum translatio
3、n unitslinguistic tagsattachment approachminimum match approachMinimum Match ApproachMinimum translation units: atomic我 买 鲜 花 。I buy fresh flowers . Oneto OneHappyMany to One快 乐春 节ManytoManyChinese New Year Attachment Approach Unattach sentence-level/discourse- level unaligned words我们也 没有想去伤害他 We di
4、dnt want to hurt him Attach phrase-level unaligned words他 带 了书He brought the books unalignedattachedunattachedunaligned- for unaligned wordsTagging Framework-Tag unaligned words-Tag aligned linksMethodologies: using linguistic tags Goal: tackle insertion/deletion problems Tags for unattached words (
5、2 types) Tags for attached words(12 types) Specific-feature links: Chinese-DE的 (3) Context-free links (2) Context-dependent links (3)Context-free Links在at于LinksFunction on Taihang MountainLinksSemantic 学 校school 太 行 山屹 立standing tallgrammaticallyinferredlinkcontextually inferred link把这项成果变成turn this
6、 success into欢 迎 收 看 CCTV Welcome to CCTVContext-dependent LinksSpecific Links: 的(DE) 经 历 过 战 争 的 人those who have experienced wars新 技 术 的 实 质the essence of the new technology将 军 的 高 度 警 惕 great attention from the generalDE-clauseDE-modifierDE-possessiveAligned Word Tags Omni-func-prepositionTense/Pa
7、ssive PossessiveMeasure word Clause markerRhetorical Sentence markerCo-reference DeterminerTO-infinitive DE-modifier Local context Context-obligatory Non-context-obligatory& UnalignedExamples: Word Tags Word TagExamples Possessive the head of the branch Measure-word一根(one) 柱子 (pillar) one pillar Ten
8、se/Passive 提交(submit)的报告(report) report submitted Context- obligatory不(not)好(easy)掌握(control),凭 (by)经验(experience) It is not easy to control, you do by experience Non-context- obligatory他(he)都已经(already)签(sign) 合 同了(contract) He already signed a contractInter-Annotator Agreement(1)Chinese-English Al
9、ignmentData SourceChar- CountPrecision RecallF-scoreNW130697.3%95.7%96.5% NW218595.3%96.2%95.7% NW336590.4%91.2%90.8% NW443190.8%92.6%91.2%Inter-Annotator Agreement(2)Chinese-English TaggingData SourceChi. CharEng. Word Link CountSame TagAgreeNW130623318668394.2%NW218513110539293.1%Conclusion uUnifi
10、ed annotation schemeuManually aligned and tagged corpora at LDC uAnnotation guidelines available at:http:/projects.ldc.upenn.edu/gale/task_specifications/ uAnnotation toolkit available soonuOn-going project: more data in pipelineuAcknowledgements to GALE of DARPAThank You!Chinese-English Aligned and
11、 Tagged Corpora at LDCGenreFileCharSegment Newswire579225645 5015 Broadcast News28183400 6376Broadcast Conversation 34306497 12050Weblog747229799 9382 Total1388 945341 32823Annotation RateuFirst pass alignment: 10,000w/10huSecond pass alignment: 10,000w/6huFirst pass tagging: 10,000w/7huSecond pass tagging: 10,000w/5hAverage skill, speed and difficulty level