基于主分量的分析和多分类器组合手写数字识别技术的研究

上传人:小** 文档编号:89507155 上传时间:2019-05-26 格式:DOCX 页数:61 大小:725.48KB
返回 下载 相关 举报
基于主分量的分析和多分类器组合手写数字识别技术的研究_第1页
第1页 / 共61页
基于主分量的分析和多分类器组合手写数字识别技术的研究_第2页
第2页 / 共61页
基于主分量的分析和多分类器组合手写数字识别技术的研究_第3页
第3页 / 共61页
基于主分量的分析和多分类器组合手写数字识别技术的研究_第4页
第4页 / 共61页
基于主分量的分析和多分类器组合手写数字识别技术的研究_第5页
第5页 / 共61页
点击查看更多>>
资源描述

《基于主分量的分析和多分类器组合手写数字识别技术的研究》由会员分享,可在线阅读,更多相关《基于主分量的分析和多分类器组合手写数字识别技术的研究(61页珍藏版)》请在金锄头文库上搜索。

1、内容摘要随着计算机的诞生和信息技术的飞速发展,字符识别的内涵已经由最初的一种阅读辅助工具发展成为一种处理数据和信息的重要手段,得到了极为广泛的应用。手写数字识别作为光学字符识别的一个分支,到目前为止还没有得到令人满意的解决方案。本文以主分量分析法和多分类器组合法为基础,主要对手写数字识别技术中的图像预处理、特征提取和分类器设计进行了研究。手写数字图像预处理包括:二值化、数字串分割、字符的倾斜校正和归一化。论文讨论了能够有效的确定图像二值化过程中的灰度阈值的灰度直方图法;介绍了通过谷点搜索算法来确定分割点实现数字串分割的垂直投影法;分析了能够对倾斜角度小于直角的字符进行有效校正的递归倾斜校正法;

2、研究了字符的位置归一化、大小归一化和笔划粗细归一化,其中笔划粗细归一化以数学形态学为基础,通过先后对字符进行骨骼化和膨胀运算来实现。手写数字特征包括:主分量特征、字符高宽比特征和欧拉特征。不同类型的特征表现了数字字符不同方面的特性。研究表明:主分量特征描述了数字字符结构特征的统计信息,特征向量的维数可通过数字类的重构均方差或协方差矩阵的特征值来确定;倾斜校正后数字字符的高宽比特征可有效的将数字“1”和其它数字分开;欧拉特征体现了手写数字的拓扑特性,可有效的将数字“3”和“8”区分开。论文还应用Parzen窗函数法和最大似然法对每个数字类高宽比特征的概率密度进行了估计。手写数字分类器包括:单个分

3、类器和组合分类器。论文介绍了以贝叶斯决策规则为核心的贝叶斯分类器,其适用于特征维数较低且概率密度容易估计的情况;研究了识别率可达到 87.90的最小重构偏差分类器,该分类器与主分量分析相对应;分析了具有良好非线性映射能力的三层前馈神经网络分类器,其分类性能与隐层单元个数和训练终止误差有着密切的关系;提出了多分类器组合的原则和方法。组合原则的中心问题是当参与组合的成员分类器出现不一致时,如何解决这种冲突。对于同种类型的分类器组合方法有投票法和线性组合法,正确识别率可达 87.73;对于不同类型的分类器组合方法为乘法,正确识别率可达 90.73。同时,论文探讨了对类别属性不明确的数字采取拒绝分类的

4、方法,该方法能够满足实际应用中对高识别精度的要求。关键词:光学字符识别手写数字识别主分量分析多分类器组合AbstractWith computers emergence and the fast development of information technology, theintension of character recognition has changed from the original reading assistantinstrument to the important measure of processing data and information. It has

5、 beenapplied widely in practice. Handwritten numeral recognition, one domain of OpticalCharacter Recognition (OCR), is still one open problem by far. Based-on PrincipalComponent Analysis (PCA) and multi-classifiers combination, this paper mainlyresearches on the image preprocess, feature extraction

6、and classifier design of handwrittennumeral recognition.Handwritten numeral image preprocess contains: binarization, numeral stringsegmentation, character slant-correction and normalization. This thesis discusses the grayhistogram, which can effectively confirm the gray threshold in the process of h

7、andwrittennumeral image binarization; introduces the approach of vertical projection, which gainssegment points using the valley searching algorithm to segment numeral string; analysesthe recursive slant-correction algorithm, which can effectively correct the askewhandwritten numerals whose slant an

8、gles are smaller than right-angle; studies locationnormalization, size normalization and stroke thickness normalization. The normalization ofstroke thickness firstly skeletonizes the numerals and then dilates them based-onMathematical Morphology.Handwritten numeral feature contains: PCA feature, cha

9、racter height-width ratio andEuler feature. The different style features represent the different aspects of numeralcharacter. Research shows that the PCA feature describes the statistical information ofnumeral structural feature, and the dimension of feature vector can be confirmed by thereconstruct

10、ion square error or the eigenvalue of covariance matrix in numeral class; theheight-width ratio of numeral after slant-correction can effectively distinguish digit “1”from the other digits; Euler feature describes the topological characteristic of handwrittennumeral, it can effectively separate digi

11、t “3” and digit “8”. This thesis applies theParzen-window function approach and maximum-likelihood estimation method to estimatethe probability density of height-width ratio in every numeral class.The classifier of handwritten numeral contains: single classifier and combiningclassifiers. This thesis

12、 introduces the Bayes classifier, whose kernel is Bayes decision rule,and it can be applied when the feature dimensions are low and the density estimation iseasy; studies the minimum reconstruction bias classifier, which is corresponding to PCA,and its recognition rate is 87.90%; analyses the three

13、layers forward neural network, whichhas the favorable ability of nonlinear mapping, and its classification performance has closerelation to the amount of hidden neurons and the training ending error; puts forward theprincipal and approach of multi-classifiers combination. The central problem ofcombi

14、nation principal is how to solve the collision when component classifiers disagree.The combining methods for the same style component classifiers contain: majority voting,linear combination. The maximum recognition rate is 87.73%. For the different stylecomponent classifiers, the combining methods i

15、s multiplication, the maximum recognitionrate is 90.73%. At the same time, this thesis discusses the approach that the recognitionsystem reject to classify the numeral when its class attribute is not definitude, this approachcan satisfy the high reliability needing in practical application.Key words: optical character recognitionhandwritten numeral recognitionprincipal component analysismulti-classifiers combination三峡大学学位论文原创性声明

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 商业/管理/HR > 管理学资料

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号