Using Genetic Algorithms to Improve the Performance of Speech Recognition Based on Artificial Neural Network

Min-Lun Lan, Shing-Tai Pan, Chih-Chin Lai

Reviewer

Ashoksundar Umayarpatham(auma003@aucklanduni.ac.nz)

Reference

1. Shiao Chun Wang, Voice signal processing, Chun Hwa Publication,2004.

2. Shi Hong Chu, Combination of GA and SDM to Improve ANN Training Efficiency, Shu-Te University, MS Thesis, 2003.

3. Shi Chun Chen, Use of GA in CSD Coded Finite Inpulse Digital Filter (FIR), Shu-Te University, MS Thesis, 2003.

4. Min YIn Chen, PC Computer Voice operation, Chi Biao publication, 1994.

5. Sung Lin Chen, Speech Recognition based on Artificial Neural Network, National Sun Yat-Setn University, Master Thesis, 2002.

6. Yi Chen Yeh, Production and Application of Artificial Neural Network, Ru Lin Publication, 1993.

7. W.C. Chu, Speech Coding Algorithms, John Wiley & sons, 2003.

Keywords

back-propagation neural network, genetic algorithm and non-specific speaker speech recognition

Related Papers

1. Ganesh K Venayagamoorthy, Viresh Moonasar and kumbes Sandrasegaran, "Voice Recognition using neural networks"

2. El-Ramly, S.H, Abdel-Kader, N.S, El-Adawi, R. Ain Shams University, cairo, Egypt, "Neural networks used for speech recognition"

3. Pacnik, G, Benkic, K, Brecko, B, "voice operated intelligent wheelchair - VOIC"

Summary

The paper begins by explaining the drawbacks of SDM (Steepest Descent Method) in Artificial neural networks, which were applied in the speech recognition. The paper suggested the method of implementing the GA (Genetic Algorithms) to improve the speech recognition rate.

The paper explains the process of speech recognition in a step by step and clear manner. The paper introduces the speech pre-processing steps clearly. The following are the preprocessing features that were discussed: frame size limit, point detection, hamming window, feature capture. The authors have selected the following as the parameters:

1. Frame size limit: is used to obtain fixed frames in the speech signals. Dynamic-size frames to obtain fixed number of frames

2. Point detection method: is used to differentiate the speech and silence segment from the voice signal. Time domain point detection method has been selected.

3. Hamming window: is used to prevent discontinuity in every frame.

4. Feature Capture: generally there are three types of features: Linear Predict Coding (LPC), Linear Predict Cepstrum coefficient (LPCC), Mel-frequency Cepstrum coefficient (MFCC). MFCC has three steps: a)using fft to obtain power spectrum of speech signal, b)applying mel-space filter bank to the power spectrum to get logarithmic energy value, c)perform Discrete cosine Transform of log filter bank energies to obtain MFCC.

The paper then explains the methodologies used for speech recognition: back-propagation neural network and Genetic Algorithm. The authors have decided to use three layer structure (input layer, hidden layer and output layer) as the basic framework for speech recognition. For genetic algorithm, tournament selection was used.

The authors experimented the voice files of ten people, which has chinese numerals (1-9) recorded. each of the people recorded four sets, three of which were used for training and one of which was for test.

Experiment Constraints and initial values: The recording format for the voice are as follows: sampling frequency- 8KHz, mono channel 16 bit sampling point. Then, speech segment was divided to 20 fixed frames, each containing 10 features. certain other calculations that were required before the experiments were: frame sampling point (for fixed overlap rate) and threshold value (for point detection). MFCC levels in the experiment was 10. After, obtaining features, they were input to recognition platform to start speech recognition.

The experiment was done with both back propagation artificial neural network and with Genetic algorithm and compared. With BPNN, 200 inputs were there with 30 neurons in the hidden layer and 9 outputs. when the training generation is over 1500, it fails to breakthrough and the recognition rate is 91%. Even if further training is continued, root mean square error doesnot progress and recognition rate does not improve. At this stage, the converged weight and bias were enterd into the genetic algorithm. It improved the initial speed and also helped SDM out of local optimum. Through 3000 generations of GA, the recognition rate increases upto 95%.

Hence, the authors succeeded in their task of improving the speech recognition rate from 91% to 95% with the help of Genetic algorithm in BPNN.

Evaluation

1. This paper is a Clear Accept Paper. The objectives are clear and explanation of the concepts is apt. The paper is technically sound. The paper also does a good job in comparing the results with BPNN.

2. With appropriate explanation for the defects in the current system and how it is been modified has been clearly addressed. Hence, the paper can be categorized into balanced theory and practical paper.

3. The length of the paper is appropriate giving prompt explanation of the defects in the existing system using BPNN and how it is been implemented using GA.

Comments to the author:

The paper is well structured and organized. The introduction clearly explains the current issue and aims to improve them. The authors have successfully accomplished in their task of proving their system with GA, a better solution to the existing speech recognition system with BPNN. The results from the experiments also shows that speech recognition with GA is better than speech recognition with BPNN. The following are some points that can be taken for further enhancements of the paper:

1. Both figures are not clearly visible.

2. Though the experiment results were impressive, the authors have failed to mention the sigmoid function they have used for the BPNN.

3. The parameters related to the Genetic Algorithms were not mentioned.