视觉对象分类：多核多示例学习

资源描述

《视觉对象分类：多核多示例学习》由会员分享，可在线阅读，更多相关《视觉对象分类：多核多示例学习（74页珍藏版）》请在金锄头文库上搜索。

1、中国科学技术大学硕士学位论文视觉对象分类：多核多示例学习姓名：王孟月申请学位级别：硕士专业：信号与信息处理指导教师：陈卫东;宋彦 2011-05-06 摘要 I 摘摘要要视觉对象分类是对一组视觉图像自动地进行对象分类或者判定某幅图像是否属于某个类别, 定位并提取出图像中感兴趣的目标，这是计算机视觉和模式识别领域中一个热点难点问题，对图像内容理解、图像检索等有着重要的意义。由于在现实世界中图像是千变万化的，存在着视角、亮度、尺度等变化，且其数据量在与日俱增，使得传统的手工视觉对象提取非常困难。因而需要引入机器学习的方法，根据图像的底层视觉特征对其语义概念进行分类和学习

2、，建立复杂的视觉对象分类模型。目前国内外通常使用图像的底层视觉特征如颜色、纹理、形状以及对象的空间关系等信息来表示图像的内容，但从计算机所表达出的视觉特征到图像的实际语义表达之间存在着巨大的“语义鸿沟” 。本文的研究方向是视觉对象分类，主要针对在使用传统的机器学习方法时使用人工标记费时费力的缺点，以及在“Bag of Words”图像表示模型中存在的语义描述能力有限的缺点，对现有的多示例学习算法进行了改进。本文的主要研究内容如下： 1.结合分割区域的多示例学习。该算法是在 MILES 算法的基础上，与结合分割进行多示例学习并进行目标检测与提取。该方法在“Bag of Words

3、”图像表示模型的基础上，将一副图像看作一个包，表示该图像的若干视觉单词作为包中示例，并把视觉单词辞典作为特征空间，通过对包中示例个数统计将其映射到特征空间中，考虑到 1-norm SVM 具有较好的稀疏性，随后用其来挑选重要特征的同时对图像进行分类；此后为了实现目标的提取，需要对判定为正的图像进行示例判定，然后根据判定为正的示例所在位置作为相应的目标“种子” 点，进一步与图像分割结果相结合，最终实现了目标提取。在 Caltech 101 标准图像集上进行实验的实验证明了该算法的有效性。 2.基于视觉短语的多示例学习。针对“Bag of Words”图像表示模型中，视觉单词的产生过

4、程仅采用无监督聚类方法，忽略了视觉单词相互之间的空间信息，导致其语义描述能力有限且区分性能弱等缺点，本章提出了一种高阶的视觉特征取代视觉单词，即通过视觉单词在空间中的空间相互关系构建具有语义区分能力的视觉短语，可以提高“Bag of Words” 图像表示模型的准确性。鉴于传统的基于“Bag of Words”模型的分类方法性能容易受到图像中背景、遮挡、尺度变化明显等因素影响导致分类精度较低等问题，本文在视觉短语的基础上，结合多示例学，提出了一种用于图像分类的多视觉短语学习方法，使最终的分类模型可以反映出图像类别的区域特性。在一些标准的图像测试集合 Caltech 101 和

5、Scene 15 进行实验，实验结果表明该算法的具有很好摘要 II 的分类性能，与现有算法相比分类准确率相对提高了约 9%和 7%左右。 3.多核多示例学习。视觉对象往往需要多种特征来进行描述的，在采用一种特征的情况的下分类会不准确，考虑到多示例学习可以处理微弱标记的图像且分类精度较高，然而在多示例学习中，通常只可以用一个特征对示例进行描述。因而考虑采用多核的方法在多示例学习中引入多种特征。因而，提出了一个多核多示例学习框架，用于解决多示例情况下的多特征学习问题。该框架是在多示例的基础上，使用多种特征对示例进行描述，训练的同时学习各种特征的权重。该框架融合了多种特征的优点，且分

6、类精度高。在标准的图像测试集合 Caltech 101 上进行了实验，实验结果表明该框架具有很好的分类性能。关键词：关键词：视觉对象分类图像分类视觉短语多示例学习多核学习多核多示例学习 Abstract III ABSTRACT Visual object classification is to classify visual objects or determine the category which the image belongs to automatically, locate and extract the region of interest in the

7、 image. This is a hot and difficult issue in the field of computer vision and pattern recognition, and has great significance to the field of the analysis and understanding for the image content. As in real world scenes, the visual objects may vary in viewpoint, brightness and scale; in addition, th

8、e number of images has been growing day and day, making the traditional manual object extraction becoming difficult. Therefore Machine Learning methods are introduced to classify and learn the semantic concept according to the low level visual feature of images, and build complex visual object class

9、ification model. Now the low-level visual features such as color, texture, shape and the spatial relationship are usually used to present the content of images. However, there exists huge semantic gap, which occurs between the low level features represented by computers and the high level semantic f

10、eatures understood by human. The research direction of this thesis is visual object classification. It is mainly to address the issue of traditional learning methods in tackling the manual extraction of visual object and the limited discriminative ability of bag of words model. This thesis improves

11、the existing multiple instance learning methods. The main research contents of this thesis are described as follows. 1. Multiple instance learning combined segmentation. Based on MILES algorithm, we propose a novel multiple instance learning approach which combines segmentation for object detection

12、and extraction. This approach uses “Bag of Words” model. The whole image is regarded as a multiple instance bag. The visual words that represent the image are regarded as the instances in the bag. The approach maps each bag into a feature space defined by visual vocabulary via the histogram over vis

13、ual words. Next, 1-norm SVM is applied to select important features as well as classify images simultaneously. Then we will classify instances coming from the bag classified as positive, and take the positive instances for object “seed” points. After that segmentation is combined to realize object e

14、xtraction. Experiments on Caltech 101 dataset show that this approach achieves high efficiency. 2. Multiple instance learning based visual phrase. Due to the limited descriptive Abstract IV and discriminative ability of bag of visual words and the problem that traditional learning methods may suffer

15、 from background clutters and large appearance variations. We propose a MVPL (Multiple Visual Phrase Learning) method for image classification. In MVPL, the visual phrase is first generated from over-segmented image regions of homogeneous appearance and visual words within each region, which may pro

16、vide enhanced descriptive ability by introducing the spatial coherency. Then a devised MIL algorithm is applied to efficiently learn from the weakly labeled image data. The experiment results on benchmark dataset Caltech 101and Scene 15 show that our proposed method significantly outperforms the state-of-the-art algorithms about 9% and 7% respectively. 3. Multiple kernel multiple instance learning. Visual object is often associated with multiple visual measurements If the object is represent

展开阅读全文