基于聚类正则化的线性判别分析

资源描述

《基于聚类正则化的线性判别分析》由会员分享，可在线阅读，更多相关《基于聚类正则化的线性判别分析（67页珍藏版）》请在金锄头文库上搜索。

1、基于聚类正则化的线性判别分析Linear Discriminant Analysis Based onClustering Regularization学科专业：信息与通信工程研究生：王爽指导教师：庞彦伟教授天津大学电子信息工程学院二零一三年十二月独创性声明本人声明所呈交的学位论文是本人在导师指导下进行的研究工作和取得的研究成果，除了文中特别加以标注和致谢之处外，论文中不包含其他人已经发表或撰写过的研究成果，也不包含为获得天津大学或其他教育机构的学位或证书而使用过的材料。与我一同工作的同志对本研究所做的任何贡献均已在论文中作了明确的说明并表示了谢意。学位论文作者签名：签字日期：年月日学位论文版

2、权使用授权书本学位论文作者完全了解天津大学有关保留、使用学位论文的规定。特授权天津大学可以将学位论文的全部或部分内容编入有关数据库进行检索，并采用影印、缩印或扫描等复制手段保存、汇编以供查阅和借阅。同意学校向国家有关部门或机构送交论文的复印件和磁盘。（保密的学位论文在解密后适用本授权说明）学位论文作者签名：导师签名：签字日期：年月日签字日期：年月日摘要近些年，多媒体和网络技术发展迅速，促使了图像数据数量的大幅度增长，因此如何能够快速、准确地获取图像数据中的有用信息成为急需解决的问题，而维数约简技术作为其中的一种解决方案，现已成为一个非常热门的研究方向。到目前为止，最具有标志性的两种方法分别为主

3、成分分析（ Principal ComponentAnalysis, PCA）和线性判别分析（Linear Discriminant Analysis, LDA）。LDA是一种有监督的维数约简方法，其基本思想是找到一个最佳的投影方向，使投影到该方向上的样本数据的类间离散度最大，同时类内离散度最小。但当每个类别中的训练样本数目较少时，LDA方法却存在着严重的过拟合问题，而造成这种现象的主要原因是根据有限数目的训练样本计算得到的类间散布矩阵和类内散布矩阵与理想的类间散布矩阵和类内散布矩阵之间存在着较大的偏差。为了解决这个问题，本文提出在不增加训练样本数目的情况下，充分利用给定训练数据自身的结构信息

4、，先运用 k-均值聚类算法形成新的样本数据，计算新样本数据的类间散布矩阵和类内散布矩阵。然后利用新样本数据的类间散布矩阵来归一化原始样本数据的类间散布矩阵，同时，利用新样本数据的类内散布矩阵来归一化原始样本数据的类内散布矩阵。这里，新样本数据的类间散布矩阵和类内散布矩阵对结果的贡献值是与每个类别中训练样本的数目成反比例关系的。最后，本文分别在 AR人脸数据库和 FERET人脸数据库以及 Carreira-Perpinan人耳数据库上做了大量的实验，证明了本文提出的算法的优势之处。关键词：LDA维数约简特征提取人脸识别ABSTRACTIn recent years, multimedia and

5、 netwok techniques develop rapidly, and itpromotes the number of image data to increase at an amazing rate. So how to obtaionthe useful information quickly and accurately from a lot of image data becomes aurgent problem, and dimension redution technique which is as a kind of solution hasbeen a very

6、hot research topic. So far, there are two important methods, one isPrincipal Component Analysis (PCA) and the other is Linear Discriminant Analysis(LDA).LDA is as a supervised dimensionality reduction technique, its main idea is thatfinding an optimal projection direction firstly, and that then proj

7、ecting the sample datato this direction to ensure that the new between-class dispersion is largest and the newwithin-class dispersion is smallest respectively and simultaneously. However, whenthe number of training samples per class is small, LDA has serious overfittingproblem. The main reason is th

8、at the between-class and within-class scatter matricescomputed from the limited number of training samples deviate from the underlyingones greatly.To solve the above problem without increasing the number of training samples,we propose making use of the structure of the given training data, and using

9、 k-meansalgorithm to generate the new clustered data, then calculating the between-class andwithin-class scatter matrices of the new clustered data, and using them to regularizethe original between-class and wihin matrices, respectively and simultaneously. Thecontributions are inversely proportional

10、 to the number of training samples per class.The advantages of the proposed method become more remarkable as the number oftraining samples per class decreases.Experimental results on AR face databases, FERET face database, andCarreira-Perpinan ear database demonstrate the effectiveness of the proposed method.KEY WORDS： LDA, cluster, dimension reduction, feature extraction, facerecognition目录目录 . 1第一章绪论 . 11.1研究背景及研究意义 . 11.2国内外研究现状 . 31.3本文的研究内容及结构安排. 41.4本章小结 . 5第二章常用的图像预处理算法 . 62.1引言 . 62.2图像灰度化 . 62.3 z-score标准化18 . 72.4直方图均衡化 .

展开阅读全文