an improved extraction pattern representation model for automatic ie pattern acquisition

上传人:luoxia****01801 文档编号:69712548 上传时间:2019-01-14 格式:PDF 页数:8 大小:143.73KB
返回 下载 相关 举报
an improved extraction pattern representation model for automatic ie pattern acquisition_第1页
第1页 / 共8页
an improved extraction pattern representation model for automatic ie pattern acquisition_第2页
第2页 / 共8页
an improved extraction pattern representation model for automatic ie pattern acquisition_第3页
第3页 / 共8页
an improved extraction pattern representation model for automatic ie pattern acquisition_第4页
第4页 / 共8页
an improved extraction pattern representation model for automatic ie pattern acquisition_第5页
第5页 / 共8页
点击查看更多>>
资源描述

《an improved extraction pattern representation model for automatic ie pattern acquisition》由会员分享,可在线阅读,更多相关《an improved extraction pattern representation model for automatic ie pattern acquisition(8页珍藏版)》请在金锄头文库上搜索。

1、An Improved Extraction Pattern Representation Model for Automatic IE Pattern Acquisition Kiyoshi Sudo, Satoshi Sekine, and Ralph Grishman Department of Computer Science New York University 715 Broadway, 7th Floor, New York, NY 10003 USA ? sudo,sekine,grishman ? cs.nyu.edu Abstract Several approaches

2、 have been described for the automatic unsupervised acquisi- tion of patterns for information extraction. Each approach is based on a particular model for the patterns to be acquired, such as a predicate-argument structure or a de- pendency chain. The effect of these al- ternative models has not bee

3、n previously studied.In this paper, we compare the prior models and introduce a new model, the Subtree model, based on arbitrary sub- trees of dependency trees.We describe a discovery procedure for this model and demonstrate experimentally an improve- ment in recall using Subtree patterns. 1Introduc

4、tion Information Extraction (IE) is the process of identi- fying events or actions of interest and their partici- pating entities from a text. As the fi eld of IE has de- veloped, the focus of study has moved towards au- tomatic knowledge acquisition for information ex- traction, including domain-sp

5、ecifi c lexicons (Riloff, 1993; Riloff and Jones, 1999) and extraction pat- terns (Riloff, 1996; Yangarber et al., 2000; Sudo et al., 2001). In particular, methods have recently emerged for the acquisition of event extraction pat- terns without corpus annotation in view of the cost of manual labor f

6、or annotation. However, there has been little study of alternative representation models of extraction patterns for unsupervised acquisition. Inthe prior workon extractionpatternacquisition, the representation model of the patterns was based on a fi xed set of pattern templates (Riloff, 1996), or pr

7、edicate-argument relations, such as subject-verb, and object-verb (Yangarber et al., 2000). The model of our previous work (Sudo et al., 2001) was based on the paths from predicate nodes in dependency trees. In this paper, we discuss the limitations of prior extraction pattern representation models

8、in relation to their ability to capture the participating entities in scenarios. We present an alternative model based on subtrees of dependency trees, so as to extract enti- ties beyond direct predicate-argument relations. An evaluation on scenario-template tasks shows that the proposed Subtree mod

9、el outperforms the previous models. Section 2 describes the Subtree model for extrac- tion pattern representation.Section 3 shows the method for automatic acquisition. Section 4 gives the experimental results of the comparison to other methods and Section 5 presents an analysis of these results. Fin

10、ally, Section 6 provides some concluding remarks and perspective on future research. 2Subtree model Our research on improved representation models for extraction patterns is motivated by the limitations of the prior extraction pattern representations. In this section, we review two of the previous m

11、odels in detail, namely the Predicate-Argument model (Yan- garber et al., 2000) and the Chain model (Sudo et al., 2001). The main cause of diffi culty in fi nding entities by for Computational Linguistics, July 2003, pp. 224-231. Proceedings of the 41st Annual Meeting of the Association extraction p

12、atterns is the fact that the participating entities can appear not only as an argument of the predicate that describes the event type, but also in other places within the sentence or in the prior text. In the MUC-3terrorism scenario, WEAPONentities occur in many different relations to event predicat

13、es in the documents. Even if WEAPON entities appear in the same sentence with the event predicate, they rarely serve as a direct argument of such predicates. (e.g., “One person was killed as the result of a bomb explosion.”) Predicate-ArgumentmodelThePredicate- Argument model is based on a direct sy

14、ntactic rela- tion between a predicate and its arguments1(Yan- garber et al., 2000). In general, a predicate provides a strong context for its arguments, which leads to good accuracy. However, this model has two major limitations in terms of its coverage, clausal bound- aries and embedded entities i

15、nside a predicates arguments. Figure 12shows an example of an extraction task in the terrorism domain where the event template consists of perpetrator, date, location and victim. With the extraction patterns based on the Predicate- Argument model, only perpetrator and victim can be extracted. The lo

16、cation (downtown Jerusalem) is embedded as a modifi er of the noun (heart) within the prepositional phrase, which is an adjunct of the main predicate, triggered3. Furthermore, it is not clear whether the extracted entities are related to the same event, because of the clausal boundaries.4 1Since the case marking for a nominalized predicate is sig- nifi cantly different from the verbal predicate, which makes it hard to regularize the nominalized predicates automatically, the constraint for the

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 外语文库 > 英语读物

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号