基于多模的新闻人物自动标识

资源描述

《基于多模的新闻人物自动标识》由会员分享，可在线阅读，更多相关《基于多模的新闻人物自动标识（10页珍藏版）》请在金锄头文库上搜索。

1、 - 1 - 中国中国科技论文在线科技论文在线 Cross-modality based Face Naming For News image Collection# Jinye Peng, Xueping Su, Xiaoyi Feng, Jun Wu, Jianping Fan* 5 (School of electronics and information,Northwestern Polytechnical University,Xian,71019) Foundations: Research fund for the doctoral program of higher educ

2、ation of china(No.20096102110025); Brief author introduction:Jinye Peng: male,1964,Professor,Research interests:image retrieval, face recognition,machine learning. E Abstract: For automatically mining the underlying relationships between different famous persons in daily news, for example, building

3、a news person based network with the faces as icons to facilitate face-based person finding, we need a tool to automatically label faces in new images as their real names. This paper studies the problem of linking names with faces from large-scale news images with 10 captions. In our previous work,

4、we proposed a method called Person-based Subset Clustering which is mainly based on face clustering for all face images derived from the same name. The location where a name appears in a caption, as well as the visual structural information within a news image provided informative cues such as who a

5、re really in the associated image. By combining the domain knowledge from the captions and the corresponding image we propose a novel cross-modality approach to further 15 improve the performance of linking names with faces. The experiments are performed on the data sets including approximately half

6、 a million news images from Yahoo! news, and the results show that the proposed method achieves significant improvement over the clustering-only methods. Key words: Image processing ; Cross-modality; Rank aggregation; Face Naming 20 0 Introduction Words and pictures are often naturally linked. Examp

7、les include: collections of museum material, digital library collections, and images collected from the web with their enclosing web pages, and captioned news images. The amount of multi-modal data accessible on the web is enormous and literally growing exponentially. With the growing popularity of

8、sites like Flickr, Google Video, and 25 YouTube, the amount of visual data associated with some sort of text will increase in coming years. News images as important sources for stories is related to the person, attributed one of the most challenging data sets for face recognition. However, face reco

9、gnition in news images (see Fig. 1) are difficult using traditional methods. Faces in the news images are captured in real-life conditions and low resolution, occlusion, nonrigid deformations, a large variety of poses, illuminations and 30 expressions make face recognition unreliable. On the other h

10、and, the context in news collections provides powerful cues as to who are exactly in the associated image. In general, a person visually appears when her/his name is mentioned in captions. Therefore, the common approach to find a person is to search his/her name in the associated caption of news ima

11、ges 1. 35 Fig.1 Sample faces from news images However, such text-based approach is likely yield incorrect results since the name in the caption finds no corresponding faces in the news image. A more difficult problem arises when multiple names in the 40 caption correspond to multiple faces in the ne

12、ws image, since the ambiguity problem can arise in establishing the relation between names and faces (Fig. 2). - 2 - 中国中国科技论文在线科技论文在线 Fig.2 Sample news photograph and their associated caption (multiple faces are associated with multiple names). Two important observations should be noted about the re

13、sults of text-based systems: (a) Often news 45 images that share the same name in their associated captions also share the same face. (b) We found that the number of the same face images corresponding to the given name is much greater than that of the other face images (Fig. 3). So we have an assump

14、tion that faces in the largest cluster belong to the given name . Moreover, news images and their associated captions provide complementary information. (a)The location in which a name appears in a caption provides powerful cues as to who is 50 in the associated news images, for example, the earlier

15、 the name appears, the high probability its corresponding face appears in the new images. (b)The visual structural and layout information provides powerful cues as to who is in the associated captions 3. For example, the large area the face is, the more likely the name appears in the new caption. In

16、 this paper, by cross-modality from the domain knowledge, we achieve a much better results on a large-scale real-world dataset. 55 It should be noted that the proposed method is not a solution to the general face recognition problem. Rather, on the news image data sets which contain names and faces, it is better than caption-only based systems that ignore visual information entirely. Besides that, the only requirement of our proposed method is t

展开阅读全文