ImranElbassuoniCastillo - Extracting Information Nuggets from Disaster-Related Messages in Social Media (2013) - 0(1)

上传人:yanm****eng 文档编号:594829 上传时间:2017-04-09 格式:PDF 页数:10 大小:602.89KB
返回 下载 相关 举报
ImranElbassuoniCastillo - Extracting Information Nuggets from Disaster-Related Messages in Social Media (2013) - 0(1)_第1页
第1页 / 共10页
ImranElbassuoniCastillo - Extracting Information Nuggets from Disaster-Related Messages in Social Media (2013) - 0(1)_第2页
第2页 / 共10页
ImranElbassuoniCastillo - Extracting Information Nuggets from Disaster-Related Messages in Social Media (2013) - 0(1)_第3页
第3页 / 共10页
ImranElbassuoniCastillo - Extracting Information Nuggets from Disaster-Related Messages in Social Media (2013) - 0(1)_第4页
第4页 / 共10页
ImranElbassuoniCastillo - Extracting Information Nuggets from Disaster-Related Messages in Social Media (2013) - 0(1)_第5页
第5页 / 共10页

《ImranElbassuoniCastillo - Extracting Information Nuggets from Disaster-Related Messages in Social Media (2013) - 0(1)》由会员分享,可在线阅读,更多相关《ImranElbassuoniCastillo - Extracting Information Nuggets from Disaster-Related Messages in Social Media (2013) - 0(1)(10页珍藏版)》请在金锄头文库上搜索。

1、Imran, Elbasuoni, Castilo, Diaz and Meier 2013 Extracting Information Nugets Procedings of the 10thInternational ISCRAM Conference Baden-Baden, Germany, May 2013 T. Comes, F. Fiedrich, S. Fortier, J. Gelderman and L. Yang, eds. 1 Extracting Information Nugets from Disaster-Related Mesages in Social

2、Media Muhamad Imran1Shady Elbasuoni Carlos Castilo Fernando Diaz Patrick Meier University of Trento American Univ. of Beirut QCRI Microsoft Research QCRI ABSTRACT Microbloging sites such as Twiter can play a vital role in spreading info

3、rmation during “natural” or man-made disasters. But the volume and velocity of twets posted during crises today tend to be extremely high, making it hard for disaster-affected comunities and professional emergency responders to process the information in a timely maner. Furthermore, posts tend to va

4、ry highly in terms of their subjects and usefulnes; from mesages that are entirely of-topic or personal in nature, to mesages containing critical information that augments situational awareness. Finding actionable information can acelerate disaster response and alleviate both property and human loss

5、es. In this paper, we describe automatic methods for extracting information from microblog posts. Specificaly, we focus on extracting valuable “information nugets”, brief, self-contained information items relevant to disaster response. Our methods leverage machine learning methods for clasifying pos

6、ts and information extraction. Our results, validated over one large disaster-related dataset, reveal that a careful design can yield an effective system, paving the way for more sophisticated data analysis and visualization systems. Keywords Supervised clasification, Information Extraction, Social

7、Media, Twiter INTRODUCTION Microbloging platforms have become an important way to share information on the Web, especially during time-critical events such as “natural” and man-made disasters. In recent years, Twiter2has been used to spread news about casualties and damages, donation eforts and aler

8、ts, including multimedia information such as videos and photos (Balana, 2012; Pew 2012; Blanchard, Carvin, Whitaker, Fitzgerald, Herman and Humphrey, 2010). Given the importance of on-topic twets for time-critical situational awareness, disaster-affected comunities and professional responders may be

9、nefit from using an automatic system to extract relevant information from the Twitter Firehose.3An automatic system for disaster-related information extraction requires two components: Clasification of twets and Extraction from twets. First, because the mesages generated during a disaster vary great

10、ly in value, an automatic system needs to filter out mesages that do not contribute to situational awareness. These include those that are of personal nature and those not relevant to the disaster. As a result, we design a system for detecting informative mesages. Once a system has detected twets li

11、kely to contain relevant information, it must analyze candidate tweets to decide the type of information to extract (e.g. donation offers, casualty reports). The final system output consists of information nugets, brief, self-contained pieces of information most likely to augment situational awarene

12、s4. This paper is organized as folows. First, a short overview of the dataset is provided. Next, the ontology and proces for generating training data for the automatic clasifiers and extractors is described. The later are then evaluated on a real-world dataset. The paper concludes by comparing the f

13、indings with that previous research. THE JOPLIN DATASET The dataset consists of twets posted during the Joplin 201 tornado that struck Joplin, Missouri in the late 1Work done while the author was at QCRI. 2An online microbloging service that enables milions of users to share text-based short mesages

14、. 3http:/iR 4While we describe our system for the case of twets, it can be aplied to any sort of social media without any fundamental changes to the system components. Imran, Elbasuoni, Castilo, Diaz and Meier 2013 Extracting Information Nugets Procedings of the 10thInternational ISCRAM Conference B

15、aden-Baden, Germany, May 2013 T. Comes, F. Fiedrich, S. Fortier, J. Gelderman and L. Yang, eds. 2 afternon of Sunday, May 2, 201. The dataset was originally constructed by researchers at the University of Colorado at Boulder5. The 206,764 unique twets were selected by monitoring the Twiter Streaming

16、 API using the hashtag #joplin a few hours after the tornado hit. This monitoring process continued until the number of twets about the tornado became particularly sparse6. DISASTER-RELATED MESAGE ONTOLOGY The system neds to detect messages that may ad situational awareness informationthat is, twets that provide “tactical, actionable information that can a


当前位置:首页 > 学术论文 > 其它学术论文

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号