选择性集成迁移算法研究（学位论文-工学）

资源描述

《选择性集成迁移算法研究（学位论文-工学）》由会员分享，可在线阅读，更多相关《选择性集成迁移算法研究（学位论文-工学）（86页珍藏版）》请在金锄头文库上搜索。

1、密级代号 10701 学号 1011120727分类号 TP391 公开题（中、英文）目选择性集成迁移算法研究Research on selective ensemble transfer algorithms作者姓名刘婷婷指导教师姓名、职务王爽教授学科门类工学学科、专业电路与系统提交论文日期二一三年三月西安电子科技大学学位论文创新性声明秉承学校严谨的学风和优良的科学道德，本人声明所呈交的论文是我个人在导师指导下进行的研究工作及取得的

2、研究成果。尽我所知，除了文中特别加以标注和致谢中所罗列的内容以外，论文中不包含其他人已经发表或撰写过的研究成果；也不包含为获得西安电子科技大学或其它教育机构的学位或证书而使用过的材料。与我一同工作的同志对本研究所做的任何贡献均已在论文中做了明确的说明并表示了谢意。申请学位论文与资料若有不实之处，本人承担一切的法律责任。本人签名：日期西安电子科技大学关于论文使用授权的说明本人完全了解西安电子科技大学有关保留和使用学位论文的规定，即：研究生在校攻读学位期间论文工作的知识产权单位属西安电子科技大学。学校有权保留送交论文的复印件，允许查阅和借阅论文；学校可以公布论文

3、的全部或部分内容，可以允许采用影印、缩印或其它复制手段保存论文。同时本人保证，毕业后结合学位论文研究课题再攥写的文章一律署名单位为西安电子科技大学。本人签名：日期导师签名：日期摘要 I摘要传统的机器学习分类算法大多需要满足训练数据和测试数据服从相同分布的条件，根据已有的标记样本建立分类模型，对测试样本进行分类预测。但是在实际情况中，这样的同分布假设往往无法得到满足，当数据分布发生改变时，就会导致学习到的模型无法很好的应用到测试数据上，于是传统的机器学习算法需要从零开始，重新标注大量的训练数据，但是标注新数据需要耗费大量的财力、人力及物力，若丢弃掉那些大量的不同分布下的过期的训练数据也造

4、成了资源的浪费。这时，迁移学习变得尤为重要，因为它可以从一个任务的数据中获取知识，来帮助新任务的学习。迁移学习是一种新的机器学习框架，旨在将一个环境中的知识应用到新环境的领域与任务中。因此，迁移学习不需要服从数据同分布假设的条件。本文将较全面的介绍迁移学习的研究概况，并结合集成学习对迁移算法做了如下的研究：(1) 提出了选择性集成迁移学习算法。根据目标域有标记样本的信息，对源域进行初步的筛选，然后对筛选后的源域进行随机采样获取 N 个源域子集，由它们在目标域训练样本上的经验误差进一步筛选源域子集，最终将选取的源域训练子集分别与目标域训练样本组合成相应的多个训练集，训练分类器并对目标域测试数据集

5、多数投票预测。该算法充分利用了源域中的有用信息及多样性，通过训练集的重组，提高目标域训练样本在训练集合中的贡献率，取得了一定的效果。(2) 提出了基于维数约简的集成迁移学习算法。对于特征维数庞大的样本分类，如果对特征进行降维，可以降低源域与目标域之间的差异性，更好地迁移源域的标签信息，同时可以降低算法的时间复杂度。新算法通过将源域数据进行Boostrap 采样 T 遍，并与目标域测试数据结合成对应的 T 个数据集，通过对每个数据集分别进行 SVD 维数约简，在

6、对应的各个低维空间中，分别由 K 近邻预测目标测试数据，最终再集成投票。结合了集成和降维后，新算法表现出比较好的性能。(3) 提出了基于半监督的集成迁移学习算法。前两个算法都是通过有标记的训练数据建立模型，然后用于预测测试数据。通过引入了目标领域中的无标记样本参与训练，采用半监督自训练模式，并利用动态数据集重组集成思想，提出一种新的半监督的集成迁移算法，在剪切源域中差异性大的样本的同时，添加目标域无标记样本扩充目标训练样本集，与一些迁移算法相比，分类效果有一定的提高。关键词：迁移学习集成学习选择性维数约简半监督II 选择性集成迁移算法研究Abstract I

7、IIAbstractMost of traditional machine learning algorithms are based on the assumption thatthe training and test data must be in the same feature space and follow the samedistribution. Use the existing marked training sample to set up classification model,andpredict the test data by this model. But i

8、n fact, this assumption with the samedistribution often cannot be satisfied, when the data distribution change, it may causethat the learned model cannot apply to test data very well, in such case, the traditionalmachine learning algorithms need to start from scratch, mark a large number of training

9、data again , which is very expensive and time-consuming. This leaves plenty of labeledexamples that are under a similar but different distribution, which is a waste throw awayentirely. In this situation, transfer learning becomes important to take the role of lever-aging these existing data knowledg

10、e.Transfer learning, as a new learning framework, aims at building a system to applyknowledge and skills learned in previous tasks to novel tasks. Thus, transfer learningdoes not make the identical distribution assumption as tractional machine learningalgorithms. In this thesis, we comprehensively i

11、ntroduced the general situation oftransfer learning, and combine ensemble learning to propose three different methods forour subject. The achievements are as follows:(1) Propose one selective ensemble transfer algorithm. Carry on the initialscreening of the source data to remove the samples with big

12、 difference according to thetarget labeled samples. Then obtain N source subsets by random sampling from thecurrent source data,choose the subsets according to the errors on the target trainingdata,at last combain each chosen subsets and target training data to compose newtraning subsets ,train the

13、corresponding classifiers and predict the test data by majorityvoting. This algorithm makes full use of useful information and the diversity fromsource domain data , and improve the contribution rate of target training sample byusing training set restructuring, it has obtained the certain classifica

14、tion effect .(2) Propose one ensemble transfer algorithm based on dimension reduction, in thesituation where there are only lots of labeled source domain samples but no labeledtarget samples. For the samples with dimension, dimension reduction can not onlyreduce the difference between source data an

15、d target data and then transfer labelinformation of source domain , but also decrease time complexity. First get T sourcesubsets by boostrap sampling, and combine each subsets with test data to compose TIV Research on selective ensemble transfer algorithmsnew data subsets, carry on dimension reducti

16、on for each data subsets by SVD,in eachcorresponding low-dimensional space, use k-Nearest Neighbor to predict test data, atlast obtain the result by T classifiers majority vote.(3) Both of the former two algorithms set up models through labeled training datato predict test data. We propose one ensemble transfer algorithm based onsemi-supervised learning ,introduce unlabeled target samples

展开阅读全文