在多种视频领域检测事件的通用框架.

资源描述

《在多种视频领域检测事件的通用框架.》由会员分享，可在线阅读，更多相关《在多种视频领域检测事件的通用框架.（10页珍藏版）》请在金锄头文库上搜索。

1、A GenericFrameworkfor Event Detection in VariousVideo Domains Tianzhu Zhang 1, 2, Changsheng Xu 1, 2 , GuangyuZhu 3 , Si Liu 1,2 ,3, Hanqing Lu 1,2 1 NationalLab of PatternRecognition,Instituteof Automation, ChineseAcademyof Sciences,Beijing 100190,P. R. China 2 China-SingaporeInstitute of Digital M

2、edia, Singapore119615,Singapore 3 Departmentof Electricaland ComputerEngineering,NationalUniversityof Singapore,Singapore tzzhang, csxu, sliu, ,elezhugnus.edu.sg ABSTRACT Event detection is essentialfor the extensively studied video anal- ysis and understanding area. Although various approaches have

3、 beenproposedfor eventdetection, thereis a lack of a genericevent detection framework that canbe applied to various video domains (e.g. sports, news, movies, surveillance). In this paper,we present a generic eventdetection approachbasedon semi-supervisedlearn- ing andInternet vision. Concretely, aGr

4、aph-based Semi-Supervised MultipleInstance Learning (GSSMIL)algorithm is proposed to jointlyexplore small-scale expert labeled videos and large-scale unlabeled videos to train the event models to detect video event boundaries. The expert labeled videosare obtainedfrom theanaly- sis andalignment of w

5、ell-structured video related text (e.g. movie scripts, web-casting text, close caption).The unlabeled data are obtained by querying related events from the video search engine (e.g. YouTube) in order to give more distributive information for event modeling.A critical issue of GSSMIL in constructing

6、a graph is the weight assignment, where theweight of anedgespeci- ?es the similarity between two data points. To tackle this problem, we proposea novel Multiple Instance Learning Induced Similar- ity (MILIS)measureby learning instancesensitive classi?ers. We perform the thorough experiments in three

7、 popular video domains: movies, sports and news. The results compared with the state-of- the-arts are promising and demonstrate our proposed approach is performance-effective. Categories and Subject Descriptors H.3.1 InformationStorage and Retrieval :Content Analysis and Indexing - Abstracting metho

8、ds,Indexing methods. General Terms Algorithm, Measurement, Performance,Experimentation Keywords Event Detection, Graph, Multiple InstanceLearning, Semi-supervised Learning, Broadcast Video, Internet, Web-casting Text Permission to make digital or hard copies of all or part of this work for personal

9、or classroom use is granted without fee provided that copies are not made or distributed for pro?t or commercial advantage and that copies bear this notice and the full citation on the ?rst page. To copy otherwise, to republish, to post on serversor to redistribute to lists, requires prior speci?c p

10、ermission and/or afee. MM 10, October 2529,2010, Firenze, Italy. Copyright2010 ACM 978-1-60558-933-6/10/10.$10.00. 1.INTRODUCTION With the explosive growth of multimedia content on broadcast and Internet, it is urgently required to make the unstructured mul- timedia data accessibleandsearchablewith

11、great easeand ?exibil- ity. Event detection is particularly crucial to understanding video semantic concepts for video summarization, indexing and retrieval purposes. Therefore, extensive researchefforts have beendevoted to eventdetection for video analysis 23, 26, 33. Most of the existing event det

12、ection approachesrely on video features and domain knowledge, and employ labeled samples to train event models. The semantic gap between low-level features and high-level events of different kinds of videos, the ambiguous video cues, background clutter and variant changesof camera mo- tion, etc., fu

13、rther complicate the video analysis and impede the implementation of event detection systems. Moreover, due to the diverse domain knowledge in different video genres and insuf?- cient training data, it is dif?cult to build a generic framework to unify event detection in different video domains (e.g.

14、 sports, news, movies, surveillance) with ahigh accuracy. To solve theseissues,most of techniquesfor event detection cur- rently rely on video content and supervised learning in the form of labeled video clips for particular classesof events. It is neces- sary to label a large amount of samplesin th

15、e training processto achievegood detection performance. In order to reducethe human labor-intension, onecan exploit theexpert supervisory information in text source 3, 15, 23, such asmovie scripts, web-casting text and closed captions, which can provide useful information to lo- catepossible events

16、in video sequences.However, it is very cost- expensiveand time-consuming to collect large-scale training data by text analysis, andthere arestill many videos without the corre- sponding text information for use. The Internet, nevertheless,is a rich information source with many event videos taken under vari- ous conditions and roughly annotated. For example, the surround- ing text is animportant clue used by searchengines. Our intuition is that it is convenient to obtain a large-scalecollection

展开阅读全文