《SpeechOcean中文普通话识别语音库-单句(录音棚)-200人》由会员分享,可在线阅读,更多相关《SpeechOcean中文普通话识别语音库-单句(录音棚)-200人(4页珍藏版)》请在金锄头文库上搜索。
1、- 1 -DELIVERABLE IDENTIFICATIONIdentification number King-ASR-043Type Technical ReportTitle Definition of Corpus, scripts, standards and Specifications of environment/speaker coverage for MandarinStatusDate 2010-5-22Version 1.0Number of pages 14Author Ke LI, Xin CHEN, Dr Yufeng HAOProject contact po
2、int:Ke LIProject MangerYufeng HaoChief TechnologistBeijing Haitian Ruisheng Science Technology Ltd.Address: D-801, U-center Building, No.28 Chengfu Road, Haidian District, Beijing China.Postcode: 100083Phone: +86-(10)-62660053, Fax: +86-(10)-62660053 ext. 8103E-mail: ; Supplementary notesKey words
3、Desktop speech, ASR database, contents, design, description, speaker coverage.AbstractThis document provides a specification of the contents, speaker and environment coverage of the speech database to be collected over the desktop for the Korean language. Status of the abstract Public- 2 -Contents1.
4、 INTRODUCTION .- 3 -1.1 SPEECH FILE FORMAT .- 3 -1.2 DIRECTORY STRUCTURE.- 3 -1.3 SCRIPT FILES .- 3 -2. DATABASE DESIGN AND COLLECTION .- 3 -2.1 RECORDING PLATFORMS .- 3 -2.2 SPEAKER RECRUITMENT .- 4 -3. DATABASE CONTENTS DEFINITION.- 4 -4. SPEAKER DEMOGRAPHIC INFORMATION .- 4 -4.1 GENDER BALANCE.-
5、4 -4.2 AGE DISTRIBUTION.- 4 -4.3 DIALECTAL REGIONS .- 4 -5. REFERENCE .- 4 - 3 -1. IntroductionThis database collection is a high quality two-channel speech database collected and owned by SpeechOcean which is performed in a professional recording room environment. The corpus contains the recordings
6、 of 16000 utterances of Mandarin speech data which were from 200 speakers. The pure recording time of speech is about 41.6 hours.1.1 Speech File FormatThe utterance waves of each channel are stored as 44.1 KHz, 16 bit, dual channel, and uncompressed. All the wave files are stored in wave directory.1
7、.2 Directory StructureThe 3-levels directory structure is defined aswaveChannelinputwhere is a number from 1 to 2 and is a number from 000 to 199.The wave files are under folder, and their names are defined as .wav. The first four digits represent the speaker IDs and the last four digits are utteran
8、ce IDs.In addition to the previous structure, additional directories are used to store some other files, as defined in Table 1-1./ README file with overview of database and COPYRIGHT file./doc Documentation/script script and transcriptionsTable 1-1 Non-speech related directory structure1.3 Script Fi
9、lesEach script file has 40 long sentences, and these sentences are selected from news.2. Database Design and Collection2.1 Recording PlatformsThe recording tracks are made in Windows XP SP2 by AUDIOREC, which is a recording software developed by SpeechOcean. The recording equipments are listed in Ta
10、ble 2-1. (More details could be found in http:/ Type No.Sound Card MAYA 1010 1Channel 1 SHURE SM58 1MicrophonesChannel 2 Shure SM10A 1Table 2-1 Recording platform2.2 Speaker Recruitment- 4 -Speakers were to record 40 utterances in 30 minutes.The entire collection was performed in Beijing. Many recruitment methods are adopted: Posters spread in the Unive