Convolutional Neural Networks for Accent Classification
Grigoriadis, Stavros (2019)
Kuvaus
Opinnäytetyö kokotekstinä PDF-muodossa.
Tiivistelmä
Speech recognition systems have been extensively improved over the years. However, accent classification remains a highly challenging task. Accent classification technology can be a great benefit to automatic speech recognition applications, telephony based service centres, immigration offices and in military operations. The application of con-volutional neural networks has been an efficient and effective way to solve the accent recognition problem.
In this thesis the accent classification task is approached by the application of two con-volutional neural networks. The difference between them can be seen at their activation functions. The work includes a dataset of native speakers of four different languages (Chinese, Spanish, English, Arabic) who read a certain elicitation paragraph in English. The chosen paragraph contains common English words which cover in majority the sounds of English language. The feature extraction is based on the Mel-Frequency Cep-stral Coefficients, in particular the first 13 coefficients are used. The MFCC has proved to be one of the best representations of human voice in terms of audio signal processing. The convolutional neural networks manipulate the audio signals of the speakers in the form of 2 dimensional images, making them an effective approach for accent classifica-tion. The thesis contains an extensive presentation of the accuracy, validation loss and confusion matrices of each cases between training and test samples and the results of each model for the reader to compare and decide which model to apply for a similar ap-plication. Appendix 1 contains the original and modified source code for the implemen-tation of the proposed convolutional neural networks in order to solve the accent classi-fication problem.
In this thesis the accent classification task is approached by the application of two con-volutional neural networks. The difference between them can be seen at their activation functions. The work includes a dataset of native speakers of four different languages (Chinese, Spanish, English, Arabic) who read a certain elicitation paragraph in English. The chosen paragraph contains common English words which cover in majority the sounds of English language. The feature extraction is based on the Mel-Frequency Cep-stral Coefficients, in particular the first 13 coefficients are used. The MFCC has proved to be one of the best representations of human voice in terms of audio signal processing. The convolutional neural networks manipulate the audio signals of the speakers in the form of 2 dimensional images, making them an effective approach for accent classifica-tion. The thesis contains an extensive presentation of the accuracy, validation loss and confusion matrices of each cases between training and test samples and the results of each model for the reader to compare and decide which model to apply for a similar ap-plication. Appendix 1 contains the original and modified source code for the implemen-tation of the proposed convolutional neural networks in order to solve the accent classi-fication problem.