Data Quality from Crowdsourcing(conference)

资源描述

《Data Quality from Crowdsourcing(conference)》由会员分享，可在线阅读，更多相关《Data Quality from Crowdsourcing(conference)（73页珍藏版）》请在金锄头文库上搜索。

1、NAACL HLT 2009Active Learning forNatural Language Processing(ALNLP-09)Proceedings of the WorkshopJune 5, 2009Boulder, ColoradoProduction and Manufacturing byOmnipress Inc.2600 Anderson StreetMadison, WI 53707USAEndorsed by the following ACL Special Interest Groups: SIGNLL, Special Interest Group for

2、 Natural Language Learning SIGANN, Special Interest Group for Annotationc2009 The Association for Computational LinguisticsOrder copies of this and other ACL proceedings from:Association for Computational Linguistics (ACL)209 N. Eighth StreetStroudsburg, PA 18360USATel: +1-570-476-8006Fax: +1-570-47

3、6-0860aclaclweb.orgISBN 978-1-932432-40-4iiIntroductionWelcome to the workshop on Active Learning for Natural Language Processing!We started organizing this workshop in mid-2008 after strong encouragement in response to some ofour own work in the area. As we gathered members of the program committee

4、, the timeliness of thetopic resonated with several of them: the growing body of knowledge on active learning and on activelearning for NLP in particular makes this topic one worth exploring in a focused workshop rather thanin isolated papers in occasional, far-flung conferences.Labeled data is a pr

5、erequisite for many popular algorithms in natural language processing and machinelearning. While it is possible to obtain large amounts of annotated data for well-studied languagesin well-studied domains and well-studied problems, labeled data are rarely available for less commonlanguages, domains,

6、or problems. Unfortunately, obtaining human annotations for linguistic data islabor-intensive and typically the costliest part of the acquisition of an annotated corpus. It has beenshown before that active learning can be employed to reduce annotation costs but not at the expense ofquality. While di

7、verse work over the past decade has demonstrated the possible advantages of activelearning for corpus annotation and NLP applications, active learning is not widely used in many ongoingdata annotation tasks. Much of the machine learning literature on the topic has focused on activelearning for class

8、ification problems with less attention devoted to the kinds of problems encounteredin NLP. Related topics such as distributed “human computation”, cost-sensitive machine learning, andsemi-supervised learning of all kinds are also growing in number as we search for the best ways toovercome the data a

9、cquisition bottleneck.We were interested in bringing together researchers to explore the challenges and opportunities ofactive learning for NLP tasks, language acquisition, and language learning, and we have been rewardedwith excellent submissions and a promising program. The workshop received sixte

10、en submissions,eight of which are included in the final program. Two of the accepted papers are short papers whichaddress ongoing work and pertinent issues. We hope that this gathering and these proceedings begin toshed more light on active learning for NLP classification tasks, sequence labeling, p

11、arsing, semantics,and other more complex tasks. The papers in the program also begin to address issues involving theapplication of active learning in real annotation projects.We are especially grateful to the diverse and helpful program committee, whose reviews were carefuland thoughtful. We are als

12、o grateful to all of the researchers who submitted their work for consideration.For the record, more information about the workshop is available online at http:/nlp.cs.byu.edu/alnlp/.Best regards,Eric Ringger, Robbie Haertel, and Katrin TomanekiiiOrganizers:Eric Ringger, Brigham Young University (US

13、A)Robbie Hertel, Brigham Young University (USA)Katrin Tomanek, University of Jena (Germany)Program Committee:Shlomo Argamon, Illinois Institute of Technology (USA)Jason Baldridge, University of Texas at Austin (USA)Markus Becker, SPSS (UK)Ken Church, Microsoft Research (USA)Hal Daume, University of

14、Utah (USA)Robbie Haertel, Brigham Young University (USA)Ben Hachey, University of Edinburgh (UK)Udo Hahn, University of Jena (Germany)Eric Horvitz, Microsoft Research (USA)Rebecca Hwa, University of Pittsburgh (USA)Ashish Kapoor, Microsoft Research (USA)Mark Liberman, University of Pennsylvania/LDC

15、(USA)Prem Melville, IBM T.J. Watson Research Center (USA)Ray Mooney, University of Texas at Austin (USA)Miles Osborne, University of Edinburgh (UK)Eric Ringger, Brigham Young University (USA)Kevin Seppi, Brigham Young University (USA)Burr Settles, University of Wisconsin (USA)Victor Sheng, New York

16、University (USA)Katrin Tomanek, University of Jena (Germany)Jingbo Zhu, Northeastern University (China)Invited Speakers:Burr Settles, University of Wisconsin (USA)Robbie Haertel, Brigham Young University (USA)vTable of ContentsActive Learning for Anaphora ResolutionCaroline Gasperin . 1On Proper Unit Selection in Active Learning: Co-Selection Effects for Named Entity RecognitionKatrin Tomanek,

展开阅读全文