《Automaticconstructionofacousticinventoryfortheconcatenativespeechsynthesisforpolish.》由会员分享,可在线阅读,更多相关《Automaticconstructionofacousticinventoryfortheconcatenativespeechsynthesisforpolish.(4页珍藏版)》请在金锄头文库上搜索。
1、AUTOMATIC CONSTRUCTION OF ACOUSTIC INVENTORYFOR THE CONCATENATIVE SPEECH SYNTHESIS FOR POLISHArtur JanickiWarsaw University of Technology, Institute of Telecommunications,Division of Teletransmission SystemsABSTRACTThis paper describes the automatic construction of the acoustic inventory for the pur
2、pose of development of the speech synthesis system for the Polish language. Very efficient algorithm of speech segmentation is proposed, also original process of reduction is described. Results of classification using LBG algorithm and brief description of created inventory are presented. 1. INTRODU
3、CTIONThe concatenative method of synthesis has been chosen, so a very careful design of the database with acoustic units is required. The type of the acoustic units has not been postulated a priori, therefore the automatic method has been proposed to choose the best set of units basing on the analys
4、is of the natural speech signal.The process of designing the acoustic inventory consists of the segmentation, reduction and classification algorithms. Subsequent steps will be described. The whole process is iterative and has been repeated many times until the satisfactory result was achieved. To ch
5、eck this a method of verification of the solution quality has been proposed. It is based on the re-synthesis of the original speech using the newly selected acoustic units, using the TD-PSOLA algorithm and copying the prosodic parameters from the original speech. Quality of the output speech is then
6、 being compared with quality of the original one, thus indicating e.g. whether the number of segments or number of classes used in the iteration was sufficient.2. PROCESS OF CONSTRUCTION OF THE SPEECH SYNTHESIS SYSTEMGeneration of the acoustic inventory fits into a global concept of designing the sp
7、eech synthesis system for Polish. For the latter it has been proposed to use so called smooth transition method. It consists in basing on the natural speech corpus and in gradually passing from simple re-synthesis of the speech to a fully functional speech synthesis system. The algorithm consists of
8、 the following steps:1. Development of the analysis block with functionality of extraction of phonetic units and related spectral and prosodic parameters, feeding directly the synthesis block.2. Identifying segments within the speech signal and reducing the signal to segments representatives. 3. Cla
9、ssification of segments into classes. Identifying representatives of the classes, thus forming the acoustic inventory.4. Development of the prosody control unit.5. Development of the phonetic transcription unit.The process of going through steps 1-5 means gradually getting rid of original informatio
10、n: original acoustic units, original prosodic parameters etc., and replacing them by selected acoustic elements (representatives of segments, then of classes), generated prosody etc. It results inevitably in decrease of speech quality, in damage to the naturalness of synthesised speech. The proposed
11、 idea of smooth transition implies maximum effort towards minimisation of error while going from step n to step n+1. It is being carried out by permanent control of quality of efficiency of the subsequent steps, by means of: re-synthesis of the speech signal after each block, comparing the speech qu
12、ality with the original speech.The whole process is iterative, so whenever the quality is unsatisfactory one may go back and e.g. increase the number of segments, number of output classes etc. The paper describes researches related to steps 2 and 3.3. SEGMENTATIONA very efficient segmentation algori
13、thm 3 into 4 has been proposed. It is a mixture of exhaustive search and segment bisection algorithms. It consists of the following actions:1. Define an error measure E.2. Search for a segment , E() = max.3. If E() Ethr, go to step 7.4. Take segment together with two neighbouring segments: , .5. Per
14、form an exhaustive search of segment , looking for a best division into 4 segments.6. Go to step 2.7. End.where Ethr is a threshold error value, maximum dispersion allowed. Sum of dispersion of 12 mel-cepstrum coefficients has been used as the error measure.Computational load of the algorithm is not
15、 high, number of dispersion computations in one iteration equals:(1)where n is the length of the segment (number of data points).3.1. Comparison with Other Segmentation MethodsThe 3 into 4 segmentation has a great advantage if compared with other algorithms, such as segment bisection or exhaustive s
16、earch. The segment bisection algorithm also looks for the worst segment and places a new boundary to minimalise the error, but it doesnt modify any boundaries set before. Therefore, although the method is computationally very attractive, it is not optimal. A new boundary improves very strongly the dispersion of 2 new sub-segments, but it may stay in big contrast