语音识别解码讲解－金锄头文库

资源描述

《语音识别解码讲解》由会员分享，可在线阅读，更多相关《语音识别解码讲解（99页珍藏版）》请在金锄头文库上搜索。

1、6 搜索和解码黎塔 lita 语言声学与内容理解重点实验室中国科学院声学研究所 2014-04-10 提纲 o概述 o搜索空间 n知识源 n搜索空间构建 n搜索空间优化 o解码和词图生成 n解码算法 n词图 o基于WFST语音识别 o多遍融合解码系统 6.1 概述 o 语音识别问题 o 模式识别 o 语言模型 o 声学模型 o 发音字典 o 特征 o 搜索问题（解码） o 给定模型找到与特征相匹配文本 6.1 概述 o 解码技术 o 快速 o 准确 6.1 概述 o what was W? knowing O, prior knowledge source: nlanguage model

2、 (LM): P(W) (often M-gram;) ndictionary: words = triphone (W = Ph) nacoustic model: p(O|PH),HMMs language language generationgeneration pronunciationpronunciation speech speech productionproduction word sequence W = (w1.wN) sequence of triphone PH = (ph1.phN) sound waves transmissiontransmission SR

3、front-end SR front-end processingprocessing observation vectors O = (o1.oT) ? LM: P(W) dictio- nary AM: p(O|S) 6.1 概述贝叶斯框架及viterbi近似 6.1 概述提纲 o概述 o 搜索空间 n知识源 n搜索空间构建 n搜索空间优化 o解码和词图生成 n解码算法 n词图 o基于WFST语音识别 o多遍融合解码系统 6.2搜索空间 o 什么是搜索空间 n文本搜索 nInformation Retrieval n多媒体检索 o 什么是语音识别的搜索空间 n语音识别知识源空间 n完备

4、 n精简 6.2.1 知识源 o 语言模型 o 发音字典 o 声学上下文 o 声学模型 o 特征 6.2.1 知识源 o 语言模型 n词图/有限状态图 nN元统计语言模型 6.2.1 知识源 o 发音字典 o 声学上下文 o 声学模型 n隐含马尔科夫模型（HMM） o 特征 nMFCC,PLP,LPCC. 6.2.2 搜索空间构建 o 发音字典（线性） 6.2.2 搜索空间构建 o 发音字典（前缀树） 6.2.2 搜索空间构建 15 s ihae td sit t satsad root Next word 6.2.2 搜索空间构建 6.2.2 搜索空间构建 o WithinWord 和

5、CrossWord 6.2.2 搜索空间构建 o 孤立词识别 n固定识别词边界 6.2.2 搜索空间构建 o 受限词图识别 6.2.2 搜索空间构建 o 大词表连续语音识别（LVCSR） n特点 o大词汇 o连续语音 o语言模型延迟 n难点 o搜索空间占用内存大 o搜索速度慢 6.2.2 搜索空间构建 o LVCSR 搜索空间 -Unigram 6.2.2 搜索空间构建 o LVCSR 搜索空间 -Bigram 6.2.2 搜索空间构建 o LVCSR 搜索空间 -backoff 6.2.2 搜索空间构建 o 对句中的静音处理 6.2.3搜索空间的优化 o 优化准则 n不改变空间完备性 n减少

6、重复路径 n知识源信息紧耦合 o 目标 n减少搜索空间内存 n提高搜索速度 6.2.3搜索空间的优化 o 前向后向归并搜索算法 6.2.3搜索空间的优化-example 例子：受限词图 6.2.3搜索空间的优化-example 原始词图 6.2.3搜索空间的优化-example o Word graph - phoneme network 6.2.3搜索空间的优化-example o Triphone network 搜索空间构建 o 优化的网络 6.2.3搜索空间的优化 o LVCSR搜索空间 n动态 o 动态加载知识源 o 解码过程需要查询知识源信息 o 内存占用小，空间构建速度快 o 解

7、码速度不够快 n静态 o 预先编译好知识源 o 解码过程就是一个FST的搜索问题 o 内存占用大，空间构建速度慢 o 解码速度快 6.2.3搜索空间的优化 o LVCSR搜索空间 n动态 o 词典树 6.2.3搜索空间的优化 o LVCSR搜索空间 n静态 o WFST （GL） 6.2.3搜索空间的优化初始网络构造 How to reduce redundancy? 6.2.3搜索空间的优化 o LVCSR搜索空间 o 前后向归并 6.2.3搜索空间的优化 o LVCSR搜索空间 37 oWI 节节点提前 n更有效进行前后向归并算法 n解码尽快加入语言模型信息 6.2.3搜索空间的优化 F

8、I (fan-in) WE (word end) WI (word id) AW (aboveWI ) DW (downWI ) o解码码空间间分层层 n路径管理 n结构清晰 n独立剪枝 n集成语言模型提纲 o概述 o搜索空间 n知识源 n搜索空间构建 n搜索空间优化 o 解码和词图生成 n解码算法 n词图 o基于WFST语音识别 o多遍融合解码系统 6.3 解码和词图生成 o 解码算法 n快速搜索空间中找到最可能路径 o 词图 n识别中间结果 n信息丰富 6.3.1 解码算法 o 动态规划（Dynamic Programming） 6.3.1 解码算法 o 动态时间规整（DTW） 6.3.

9、1解码算法 o 基本图搜索算法 6.3.1 解码算法 o 树搜索空间 6.3.1 解码算法 n深度优先搜索（Depth first） 6.3.1 解码算法 n宽度优先搜索（ Breadth first） 6.3.1 解码算法 o 时间同步：宽度优先 oBeam search oTree search o 时间异步：深度优先 oA star (stack decoder) oUse forward algorithm o Mathematics is simple， but implementation is the most challenge part in terms of program

10、ming 6.3.1 解码算法 oViterbi algorithm o To discover the most possible state sequences o Beam search oForword algorithm o To evaluate the probablility of input observation 6.3.1 解码算法 6.3.1 解码算法 6.3.1 解码算法 6.3.1 解码算法 o Beam search 6.3.1 解码算法 o Beam search Pruning nBeam Pruning oIf(p MaxProb beam) nHistog

11、ram Pruning oLimit the number of ative hypotheses oCannot be achieved by sorting oUse th histogram of scores to find the appropriate pruning threthold 6.3.1 解码算法 o Number of state hypotheses 6.3.1 解码算法 o Word End Pruning n Since the dynamic range of language model probability is relatively small, a

12、narrower beam can be applied to word-end nodes without incurring additional search errors. 6.3.1 解码算法 o Language model Lookahead 6.3.1 解码算法 o Phoneme Lookahead nEach time a hypothesis is formed about a new phoneme arc to be started in the search process, it is first checked whether this new phone hy

13、pothesis is likely to survive in the next future time frames. 6.3.1 解码算法 o 时间异步搜索 nStack decoder nA star search：启发函数（heuristics） 6.3.1 解码算法 o 时间异步搜索举例 nRecognize “if music be the food of love” 6.3.1 解码算法 o 时间异步搜索 6.3.1 解码算法 o 时间异步搜索 6.3.1 解码算法 o 令牌传递（token passing） “token passing” is framework for

14、 connected speech recognition (Youngetal.Cambridge,1989) nEach node holds token containing oAccumulated cost up to end of word oPointer to token of previous word oEnd time of current word oName of current word nTokens are passed from node to node; when searching, the token with the best score at tim

15、e(t-1) is passed to the node at time t. 6.3.1 解码算法 o Token Passing Strategy nDecoding is to find those paths through the search space which have the max log Prob. nA token represents a partial path through the search space. nAt each time ,tokens are propagated along transitions in the network. At th

16、e same time, its log probability is accumulated by the corresponding transition probs and emission probs. nAs each token passing through the network, it must maintain a history recording its route. nWhen there are multiple exits for a node, the token is copied so that all possible paths are explored in parallel. 6.3.1 解码算法 o Token数据结构 6.3.1 解码算法 o Search Alogorithm 6.3.2 词图 o Nbest and Word Graph 和one best 结果相比 n更高识别准确性 n更丰富的信息 n便于二次搜索 6.3.2 词图 o Nbest nmany of the different word sequen

展开阅读全文