无编码候选词汉语拼音输入法的实现

上传人:E**** 文档编号:118097758 上传时间:2019-12-11 格式:PDF 页数:74 大小:2.52MB
返回 下载 相关 举报
无编码候选词汉语拼音输入法的实现_第1页
第1页 / 共74页
无编码候选词汉语拼音输入法的实现_第2页
第2页 / 共74页
无编码候选词汉语拼音输入法的实现_第3页
第3页 / 共74页
无编码候选词汉语拼音输入法的实现_第4页
第4页 / 共74页
无编码候选词汉语拼音输入法的实现_第5页
第5页 / 共74页
点击查看更多>>
资源描述

《无编码候选词汉语拼音输入法的实现》由会员分享,可在线阅读,更多相关《无编码候选词汉语拼音输入法的实现(74页珍藏版)》请在金锄头文库上搜索。

1、 I I 摘要 I 摘要 中文信息处理技术是我国信息产业最重要的技术支柱之一,它是利用计算 机对汉语信息进行自动处理。当前对于汉字的输入已经推出了很多种方法,例 如:手写笔、语音输入等,但汉字的键盘输入还是汉字录入计算机最普及的手 段,它是中文信息处理的一个重要课题。随着智能处理技术在计算机领域的应 用, 汉字的输入法也得到了新的发展, 本文在汉语拼音输入基本框架的基础上, 引入中文自动分词技术,建立了二元统计语言模型,提出了无编码候选词汉语 拼音输入法的实现方法。所谓的无编码候选词输入是指在候选词列表的生成过 程中,根据当前已输入的词组,在不需要输入外码的前提下,智能地将当前可 能会出现的候

2、选词生成候选词列表,供用户选择,并且只要用户输入的目标词 在无编码候选词列表中,这种候选词的生成方式就会一直持续下去。 文章主要包括以下内容: (1) 介绍本文的研究背景以及当前汉语拼音输入法所面临的问题, 简单 阐述了本文无编码候选词输入法的生成原理。 (2) 针对本文采用的汉语拼音输入,对拼音流进行了预处理,实现了从 拼音流到独立音节的切分。 对样本语料库采用了自动机械分词的方 式,并且对分词过程中出现的交集性歧义进行了消歧。根据分词的 结果建立了二元统计语言模型以及对其进行了优化。 (3) 建立了有编码候选词的数据结构,给出了具体生成算法。在上述准 备工作的基础上, 对无编码候选词输入法

3、的实现从理论和算法上给 予了验证。 (4) 输入法系统自学习功能。根据用户的输入情况,对二元统计语言模 型进行了二次优化,主要是针对词频的调整以及用户词典的自定 义。 (5) 利用 Windows 操作系统提供的 IMM-IME (输入法管理器-输入法编 辑器)实现无编码候选词汉语拼音输入法,详细分析了基于 IMM-IME 的无编码候选词拼音输入法的构成、接口以及一些重要 的设计理念,使用当前流行 VC+.NET 进行编程,尤其是对 无编码候选词汉语拼音输入法的实现 II Windows 操作系统提供的关于输入法接口的 API 函数进行了重写。 关键字关键字:无编码候选词;自动分词;统计语言模

4、型 Abstract III Abstract Chinese information processing is one of the most important pillars in our countrys information industry, which utilizes computer to automatically process Chinese information. At present, there are already several methods for the input of Chinese characters, such as: touch pe

5、n, phonetic input and so on. However, the most popular device is to use keyboard, which is an important subject in Chinese information processing. With the application of intellectual processing technology in computer, the method of Chinese characters input has also been greatly improved. Basing on

6、the basic framework of Chinese pinyin input, this paper introduces the technology of automatic segmentation of Chinese characters and bi-gram statistical language model, putting forward the realization of Chinese pinyin input with non-coding candidate words. The so-called non-coding candidate words

7、input means that in the process of generating candidate word lists, basing on the currently inputted words, the computer can intelligently generate candidate words lists of those possible occurring candidates with no outer code and if the target word which the user input is within the non-coding can

8、didate words lists, the generation of this kind of candidate words generation will continue. This paper mainly includes the following parts: In the first part, the author introduces the background and problem in the present research, and briefly describes the basic principles for the realization of

9、Chinese character input. In the second part, the author pre treats pinyin string which is used in this paper and realizes the segmentation of pinyin string into independent syllables. In addition, the author uses mechanical means to segment sample corpus automatically; disambiguates the intersection

10、 of words which occurred during the process of segmentation. Based on the result of segmentation, the author establishes a bi-gram statistical language model, including the optimization of it. In the third part, the author establishes a data structure with coding candidate 无编码候选词汉语拼音输入法的实现 IV words,

11、 and presents the specific generation algorithm. Based on the above preparation work, the author verifies the realization of Chinese pinyin input with non-coding candidate words both in theory and in algorithm. In the fourth part, in order to enable the input system to learn from itself, the author

12、optimizes the big ram statistical language model again based on users input, especially readjusts the word frequency and users editing of dictionary. In the last part, the author realizes the Chinese pinyin input with non-coding candidate words through IMM-IME which is offered by Windows operation s

13、ystem. In addition the author elaborates the formation of Chinese pinyin input with no candidate coding words basing on the IMM-IME and other important design concepts. Finally the author uses the popular VC+.NET to program, especially rewrites the API function which is offered by Windows operation system. Key Words: non-coding candidate words; automatic segmentation; statistical language model 目录 V 目录 摘要摘要 . I ABSTRACT . III 目录目录 . V 第一章第一章 绪论绪论 . 1 1.1 研究背景 . 1 1.2 国内外研究现状 . 5 1.3 无编码候选词汉语拼音输入法概述 . 6 第二章第二章 拼音流的预处理拼音流的预处理 . 9 2.1 拼音流的概述 .

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 学术论文 > 其它学术论文

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号