语音实验一端点检测

资源描述

《语音实验一端点检测》由会员分享，可在线阅读，更多相关《语音实验一端点检测（9页珍藏版）》请在金锄头文库上搜索。

1、实验一语音信号端点检测一、实验目的1 学会MATLAB的使用，掌握MATLAB的程序设计方法；2 掌握语音处理的基本概念、基本理论和基本方法；3 掌握基于MATLAB编程实现带噪语音信号端点检测；4 学会用MATLAB对信号进行分析和处理。5. 学会利用短时过零率和短时能量，对语音信号的端点进行检测。二、实验仪器设备及软件HP D538、MATLAB三、实验原理端点检测是语音信号处理过程中非常重要的一步，它的准确性直接影响到语音信号处理的速度和结果。本次实验利用短时过零率和短时能量相结合的语音端点检测算法利用短时过零率来检测清音，用短时能量来检测浊音，两者相配合便实现了信号信噪比较大情况

2、下的端点检测。算法对于输入信号的检测过程可分为短时能量检测和短时过零率检测两个部分。算法以短时能量检测为主，短时过零率检测为辅。根据语音的统计特性，可以把语音段分为清音、浊音以及静音（包括背景噪声）三种。在本算法中，短时能量检测可以较好地区分出浊音和静音。对于清音，由于其能量较小，在短时能量检测中会因为低于能量门限而被误判为静音；短时过零率则可以从语音中区分出静音和清音。将两种检测结合起来，就可以检测出语音段（清音和浊音）及静音段1、短时能量计算定义n时刻某语言信号的短时平均能量为：式中N为窗长，可见短时平均能量为一帧样点值的平方和。特殊地，当窗函数为矩形窗时，有2、短时过零率过零就是指信号

3、通过零值。过零率就是每秒内信号值通过零值的次数。对于离散时间序列，过零则是指序列取样值改变符号，过零率则是每个样本的改变符号的次数。对于语音信号，则是指在一帧语音中语音信号波形穿过横轴（零电平）的次数。可以用相邻两个取样改变符号的次数来计算。如果窗的起点是n=0，短时过零率Z为波形穿过横轴（零电平）的次数短时过零可以看作信号频率的简单度量浊音的短时平均幅度最大，无声的短时平均幅度最小，清音的短时过零率最大，无声居中，浊音的短时过零率最小。3、短时自相关函数是偶函数；s(n)是周期的，那么R（k）也是周期的；可用于基音周期估计和线性预测分析4、判断语音信号的起点和终点利用短时平均幅度和短时过零率

4、可以判断语音信号的起点和终点。语音端点检测方法可采用测试信号的短时能量或短时对数能量、联合过零率等特征参数，并采用双门限判定法来检测语音端点，即利用过零率检测清音，用短时能量检测浊音，两者配合。首先为短时能量和过零率分别确定两个门限，一个是较低的门限数值较小，对信号的变化比较敏感，很容易超过;另一个是比较高的门限，数值较大。低门限被超过未必是语音的开始，有可能是很短的噪声引起的，高门限被超过并且接下来的自定义时间段内的语音。四、实验步骤及程序（1）实验步骤：1、取一段录音作为音频样本。2、利用公式分别编程计算这段语音信号的短时能量和短时过零率，然后分别画出它们的曲线。3、调整能量门

5、限。4、进行幅度归一化并设置帧长、短时能量阈值、过零率阈值等参数。5、编写程序实现语音端点检测。6、最后得到语音端点检测图像。（2) 语音信号的端点检测程序流程图：输入语音信号幅度归一化设置参数计算短时能量和过零率调整能量门限开始端点检测输出样本端点检测图像图 1.1 语音信号的端点检测程序流程图(3) 语音信号的端点检测实验源程序：clc;clear;x,fs=wavread(2.wav);% y = end_point(x);% f0 = pitch_sift(x,0.38,fs);% plot(f0);% e_x=(frame(x,lpc_spectrum,fs);%

6、plot(e_x(2,:);%某一维随时间变化plot(e_x(:,89);%一帧信号各维之间变化hold on; c=melcepst(x,fs);plot(c(89,:),k);frame定义% function y = frame(x,func,SAMP_FREQ,l,step)% where y is output on a frame by frame basis, x is input speech, % and l is the window size. l and step are optional parameters, % by default SAMP_FREQ is 8

7、000, l is 200, and step is 100.% func is a string e.g. pitch that determines a function that you want % to apply to x on a short-time basis.% Written by: Levent Arslan Apr. 11, 1994% function yy = frame(x,func,SAMP_FREQ,l,step) m,n=size(x);if mn n=m;else n=n; x=x;end if nargin 3, SAMP_FREQ=16000; en

8、d;if nargin 4, l=SAMP_FREQ/40; end;if nargin 5, step=l/2; end; num_frames=ceil(n/step); %NUMBER OF FRAMESx(n+1:n+2*l)=zeros(2*l,1); %ADD ZEROS AT THE END OF THE SPEECH SIGNALi=0:step:num_frames*step;%i is the arithmetical proportion series by stepj=i*ones(1,l);i=j+ones(num_frames+1,1)*1:l;y=reshape(

9、x(i),num_frames+1,l);y=(hanning(l)*ones(1,num_frames+1).*y;for i=1:num_frames cmd=sprintf(yy(:,i)=%s(y(:,i);,func); eval(cmd);endmelcepst定义function c=melcepst(s,fs,w,nc,p,n,inc,fl,fh)%MELCEPST Calculate the mel cepstrum of a signal C=(S,FS,W,NC,P,N,INC,FL,FH)% Simple use: c=melcepst(s,fs) % calculat

10、e mel cepstrum with 12 coefs, 256 sample frames% c=melcepst(s,fs,e0dD) % include log energy, 0th cepstral coef, delta and delta-delta coefs% Inputs:% s speech signal% fs sample rate in Hz (default 11025)% nc number of cepstral coefficients excluding 0th coefficient (default 12)% n length of frame (d

11、efault power of 2 30 ms)% p number of filters in filterbank (default floor(3*log(fs) )% inc frame increment (default n/2)% fl low end of the lowest filter as a fraction of fs (default = 0)% fh high end of highest filter as a fraction of fs (default = 0.5)% w any sensible combination of the following

12、:% R rectangular window in time domain% N Hanning window in time domain% M Hamming window in time domain (default)% t triangular shaped filters in mel domain (default)% n hanning shaped filters in mel domain% m hamming shaped filters in mel domain% p filters act in the power domain% a filters act in

13、 the absolute magnitude domain (default)% 0 include 0th order cepstral coefficient% e include log energy% d include delta coefficients (dc/dt)% D include delta-delta coefficients (d2c/dt2)% z highest and lowest filters taper down to zero (default)% y lowest filter remains at 1 down to 0 frequency an

14、d% highest filter remains at 1 up to nyquist freqency% If ty or ny is specified, the total power in the fft is preserved.% Outputs: c mel cepstrum output: one frame per row% % Copyright (C) Mike Brookes 1997% Last modified Thu Jun 15 09:14:48 % VOICEBOX is a MATLAB toolbox for speech processing. Home page is at%

展开阅读全文