利用计算机生成图像来理解深度特征

上传人:豆浆 文档编号:24903451 上传时间:2017-12-08 格式:PDF 页数:10 大小:2.27MB
返回 下载 相关 举报
利用计算机生成图像来理解深度特征_第1页
第1页 / 共10页
利用计算机生成图像来理解深度特征_第2页
第2页 / 共10页
利用计算机生成图像来理解深度特征_第3页
第3页 / 共10页
利用计算机生成图像来理解深度特征_第4页
第4页 / 共10页
利用计算机生成图像来理解深度特征_第5页
第5页 / 共10页
点击查看更多>>
资源描述

《利用计算机生成图像来理解深度特征》由会员分享,可在线阅读,更多相关《利用计算机生成图像来理解深度特征(10页珍藏版)》请在金锄头文库上搜索。

1、Understanding deep features with computer-generated imageryMathieu AubryEcole des Ponts ParisTech UC Berkeleymathieu.aubryimagine.enpc.frBryan C. RussellAdobe RAbstractWe introduce an approach for analyzing the variation offeatures generated by convolutional neural networks (CNNs)with respect to sce

2、ne factors that occur in natural images.Such factors may include object style, 3D viewpoint, color,and scene lighting configuration. Our approach analyzesCNN feature responses corresponding to different scene fac-tors by controlling for them via rendering using a largedatabase of 3D CAD models. The

3、rendered images are pre-sented to a trained CNN and responses for different layersare studied with respect to the input scene factors. We per-form a decomposition of the responses based on knowledgeof the input scene factors and analyze the resulting compo-nents. In particular, we quantify their rel

4、ative importancein the CNN responses and visualize them using principalcomponent analysis. We show qualitative and quantitativeresults of our study on three CNNs trained on large imagedatasets: AlexNet 18, Places 40, and Oxford VGG 8.We observe important differences across the networks andCNN layers

5、 for different scene factors and object categories.Finally, we demonstrate that our analysis based on computer-generated imagery translates to the network representationof natural images.1. IntroductionThe success of convolutional neural networks(CNNs) 18, 21 raises fundamental questions onhow their

6、 learned representations encode variations invisual data. For example, how are different layers in a deepnetwork influenced by different scene factors, the task forwhich the network was trained for, or the choice in networkarchitecture? These questions are important as CNNs withdifferent architectur

7、es and trained/fine tuned for differenttasks have shown to perform differently 17, 40 or havedifferent feature response characteristics 39. An analysisof the features may help with understanding the tradeoffsacross different trained networks and may inform the designof new architectures. It may also

8、 help the choice of CNNfeatures for tasks where training or fine tuning a network isnot possible, e.g. due to lack of labeled data.Prior work has focused on a part-based analysis of thelearned convolutional filters. Examples include associat-ing filters with input image patches having maximal re-spo

9、nse 12, deconvolution starting from a given filter re-sponse 38, or by masking the input to recover the recep-tive field of a given filter 39 to generate “simplified im-ages” 6, 31. Such visualizations typically reveal the partsof an object 38 (e.g. “eye” of a cat) or scene 39 (e.g.“toilet” in bathr

10、oom). While these visualizations reveal thenature of learned filters, they largely ignore the question ofthe dependence of the CNN representation on continuousfactors that may influence the depicted scene, such as 3Dviewpoint, scene lighting configuration, and object style.In this paper, we study sy

11、stematically how different scenefactors that arise in natural images are represented in atrained CNN. Example factors may include those intrin-sic to an object or scene, such as category, style, and color,and extrinsic ones, such as 3D viewpoint and scene lightingconfiguration. Studying the variatio

12、ns associated with suchfactors is a nontrivial task as it requires (i) input data wherethe factors can be independently controlled and (ii) a proce-dure for detecting, visualizing, and quantifying each factorin a trained CNN.To overcome the challenges associated with obtaininginput data, we leverage

13、 computer-generated (CG) imageryto study trained CNNs. CG images offer several benefits.First, there are stores of 3D content online (e.g. Trimble 3DWarehouse), with ongoing efforts to curate and organize thedata for research purposes (e.g. ModelNet 35). Such dataspans many different object categori

14、es and styles. Moreover,in generating CG images we have control over all renderingparameters, which allows us to systematically and denselysample images for any given factor. A database of naturalimages captured in controlled conditions and spanning differ-ent factors of variations, e.g. the NORB 22

15、, ETH-80 23and RGB-D object 20 datasets, where different objects arerotated on a turntable and lighting is varied during imagecapture, are difficult and costly to collect. Moreover theydo not offer the same variety of object styles present in 3Dmodel collections, nor the flexibility given by renderi

16、ng.1Given a set of rendered images generated by varying oneor more factors, we analyze the responses of a layer for atrained CNN (e.g. “pool5” of AlexNet 18). We perform adecomposition of the responses based on knowledge of theinput scene factors, which allows us to quantify the relativeimportance of each factor in the representation. Moreover,we visualize the responses via principal componentanalysis (PCA).Contributions. Our tech

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 商业/管理/HR > 其它文档

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号