基于耳蜗滤波带心理声学模型改进语音编码

资源描述

《基于耳蜗滤波带心理声学模型改进语音编码》由会员分享，可在线阅读，更多相关《基于耳蜗滤波带心理声学模型改进语音编码（9页珍藏版）》请在金锄头文库上搜索。

1、IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 10, NO. 7, OCTOBER 2002 495Improved Audio Coding Using a PsychoacousticModel Based on a Cochlear Filter BankFrank BaumgarteAbstractPerceptual audio coders use an estimated maskedthreshold for the determination of the maximum permissiblejust-inau

2、dible noise level introduced by quantization. This es-timate is derived from a psychoacoustic model mimicking theproperties of masking. Most psychoacoustic models for codingapplications use a uniform (equal bandwidth) spectral decomposi-tion as a first step to approximate the frequency selectivity o

3、f thehuman auditory system. However, the equal filter properties of theuniform subbands do not match the nonuniform characteristics ofcochlear filters and reduce the precision of psychoacoustic mod-eling. Even so, uniform filter banks are applied because they arecomputationally efficient. This paper

4、 presents a psychoacousticmodel based on an efficient nonuniform cochlear filter bankand a simple masked threshold estimation. The novel filter-bankstructure employs cascaded low-order IIR filters and appropriatedown-sampling to increase efficiency. The filter responses areoptimized for the modeling

5、 of auditory masking effects. Resultsof the new psychoacoustic model applied to audio coding showbetter performance in terms of bit rate and/or quality of the newmodel in comparison with other state-of-the-art models using auniform spectral decomposition. The low delay of the new modelis particularl

6、y suitable for low-delay coders.Index TermsAudio coding, filter bank, masked threshold,model of masking, perceptual model.I. INTRODUCTIONIN PERCEPTUAL audio coding 1, the audio signal istreated as a masker for distortions introduced by lossydata compression. For this purpose, the masked threshold fo

7、rthe distortions is approximated by a psychoacoustic model.The masked threshold is the time and frequency-dependentmaximum level that marks the boundary for distortions beinginaudible if superimposed to the audio signal. The initial audiosignal processing within the psychoacoustic model consists of

8、aspectral decomposition to account for the frequency selectivityof the auditory system. However, the auditory system performsa nonuniform (nonequal bandwidths) spectral decompositionof the acoustic signal in the cochlea. This first stage of cochlearsound processing already determines basic propertie

9、s ofmasking, e.g., the frequency spread of masking which is relatedto the frequency response of the human cochlear filters. Above1 kHz, the cochlear filter bandwidths increase almost propor-tionally to the center frequency. These bandwidths determineboth, the spectral width of energy integration ass

10、ociated withManuscript received June 20, 2001; revised July 18, 2002. The associate ed-itor coordinating the review of this manuscript and approving it for publicationwas Dr. Peter Vary.The author is with the Media Signal Processing Research Department, AgereSystems, Berkeley Heights, NJ 07922 USA (

11、e-mail: ).Digital Object Identifier 10.1109/TSA.2002.804536a band and the range of spectral components that can interactwithin a band, e.g., two sinusoids creating a beating effect. Thisinteraction plays a crucial role in the perception of whether asound is noise-like which in turn corresponds to a

12、significantlymore efficient masking compared with a tone-like signal 2.The noise or tone-like character is basically determined bythe amount of envelope fluctuations at the cochlear filteroutputs which widely depend on the interaction of the spectralcomponents in the pass-band of the filter.Many exi

13、sting psychoacoustic models, e.g., 1, 3, and 4,employ an FFT-based transform to derive a spectral decom-position of the audio signal into uniform subbands with equalbandwidths. The nonuniform spectral resolution of the auditorysystem is taken into account by summing up the energies of theappropriate

14、 number of neighboring FFT frequency subbands.Consequently, the phase relation between the spectral compo-nents of the different subbands within a cochlear filter band isnot taken into account. Since the cochlear filter slopes are lesssteep than the subband slopes, they must be approximated byspread

15、ing the subband energies across several bands. This wayof mapping the uniform subbands to cochlear filter bands pro-duces envelopes of the output signal that are different from thosemeasured at the output of the cochlea. The temporal resolutionof the spectral decomposition is determined by the trans

16、formsize, i.e., FFT length, and thus, is constant across all center fre-quencies. For high center frequencies this results in a signif-icantly lower temporal resolution in comparison with that ofthe corresponding cochlear filters. All the described mismatchescontribute to an inaccurate modeling of masking that causes sub-optimal coder compression performance.To overcome the mismatch between uniform filter banks andthe spectral decomposition of the cochlea, a linear nonuni

展开阅读全文

基于耳蜗滤波带心理声学模型改进语音编码

最新文档