Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems

de Wet,Febe; Kleynhans,Neil; van Compernolle,Dirk; Sahraeian,Reza

Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems

For purposes of automated speech recognition in under-resourced environments, techniques used to share acoustic data between closely related or similar languages become important. Donor languages with abundant resources can potentially be used to increase the recognition accuracy of speech systems developed in the resource poor target language. The assumption is that adding more data will increase the robustness of the statistical estimations captured by the acoustic models. In this study we investigated data sharing between Afrikaans and Flemish - an under-resourced and well-resourced language, respectively. Our approach was focused on the exploration of model adaptation and refinement techniques associated with hidden Markov model based speech recognition systems to improve the benefit of sharing data. Specifically, we focused on the use of currently available techniques, some possible combinations and the exact utilisation of the techniques during the acoustic model development process. Our findings show that simply using normal approaches to adaptation and refinement does not result in any benefits when adding Flemish data to the Afrikaans training pool. The only observed improvement was achieved when developing acoustic models on all available data but estimating model refinements and adaptations on the target data only. SIGNIFICANCE: Acoustic modelling for under-resourced languages Automatic speech recognition for Afrikaans Data sharing between Flemish and Afrikaans to improve acoustic modelling for Afrikaans

Saved in:

Bibliographic Details
Main Authors:	de Wet,Febe, Kleynhans,Neil, van Compernolle,Dirk, Sahraeian,Reza
Format:	Digital revista
Language:	English
Published:	Academy of Science of South Africa 2017
Online Access:	http://www.scielo.org.za/scielo.php?script=sci_arttext&pid=S0038-23532017000100009
Tags:	Add Tag No Tags, Be the first to tag this record!

id	oai:scielo:S0038-23532017000100009
record_format	ojs
spelling	oai:scielo:S0038-235320170001000092017-02-23Speech recognition for under-resourced languages: Data sharing in hidden Markov model systemsde Wet,FebeKleynhans,Neilvan Compernolle,DirkSahraeian,Reza acoustic modelling Afrikaans Flemish automatic speech recognition For purposes of automated speech recognition in under-resourced environments, techniques used to share acoustic data between closely related or similar languages become important. Donor languages with abundant resources can potentially be used to increase the recognition accuracy of speech systems developed in the resource poor target language. The assumption is that adding more data will increase the robustness of the statistical estimations captured by the acoustic models. In this study we investigated data sharing between Afrikaans and Flemish - an under-resourced and well-resourced language, respectively. Our approach was focused on the exploration of model adaptation and refinement techniques associated with hidden Markov model based speech recognition systems to improve the benefit of sharing data. Specifically, we focused on the use of currently available techniques, some possible combinations and the exact utilisation of the techniques during the acoustic model development process. Our findings show that simply using normal approaches to adaptation and refinement does not result in any benefits when adding Flemish data to the Afrikaans training pool. The only observed improvement was achieved when developing acoustic models on all available data but estimating model refinements and adaptations on the target data only. SIGNIFICANCE: Acoustic modelling for under-resourced languages Automatic speech recognition for Afrikaans Data sharing between Flemish and Afrikaans to improve acoustic modelling for AfrikaansAcademy of Science of South AfricaSouth African Journal of Science v.113 n.1-2 20172017-02-01journal articletext/htmlhttp://www.scielo.org.za/scielo.php?script=sci_arttext&pid=S0038-23532017000100009en
institution	SCIELO
collection	OJS
country	Sudáfrica
countrycode	ZA
component	Revista
access	En linea
databasecode	rev-scielo-za
tag	revista
region	África del Sur
libraryname	SciELO
language	English
format	Digital
author	de Wet,Febe Kleynhans,Neil van Compernolle,Dirk Sahraeian,Reza
spellingShingle	de Wet,Febe Kleynhans,Neil van Compernolle,Dirk Sahraeian,Reza Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems
author_facet	de Wet,Febe Kleynhans,Neil van Compernolle,Dirk Sahraeian,Reza
author_sort	de Wet,Febe
title	Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems
title_short	Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems
title_full	Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems
title_fullStr	Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems
title_full_unstemmed	Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems
title_sort	speech recognition for under-resourced languages: data sharing in hidden markov model systems
description	For purposes of automated speech recognition in under-resourced environments, techniques used to share acoustic data between closely related or similar languages become important. Donor languages with abundant resources can potentially be used to increase the recognition accuracy of speech systems developed in the resource poor target language. The assumption is that adding more data will increase the robustness of the statistical estimations captured by the acoustic models. In this study we investigated data sharing between Afrikaans and Flemish - an under-resourced and well-resourced language, respectively. Our approach was focused on the exploration of model adaptation and refinement techniques associated with hidden Markov model based speech recognition systems to improve the benefit of sharing data. Specifically, we focused on the use of currently available techniques, some possible combinations and the exact utilisation of the techniques during the acoustic model development process. Our findings show that simply using normal approaches to adaptation and refinement does not result in any benefits when adding Flemish data to the Afrikaans training pool. The only observed improvement was achieved when developing acoustic models on all available data but estimating model refinements and adaptations on the target data only. SIGNIFICANCE: Acoustic modelling for under-resourced languages Automatic speech recognition for Afrikaans Data sharing between Flemish and Afrikaans to improve acoustic modelling for Afrikaans
publisher	Academy of Science of South Africa
publishDate	2017
url	http://www.scielo.org.za/scielo.php?script=sci_arttext&pid=S0038-23532017000100009
work_keys_str_mv	AT dewetfebe speechrecognitionforunderresourcedlanguagesdatasharinginhiddenmarkovmodelsystems AT kleynhansneil speechrecognitionforunderresourcedlanguagesdatasharinginhiddenmarkovmodelsystems AT vancompernolledirk speechrecognitionforunderresourcedlanguagesdatasharinginhiddenmarkovmodelsystems AT sahraeianreza speechrecognitionforunderresourcedlanguagesdatasharinginhiddenmarkovmodelsystems
_version_	1756004837157240832

Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems

Similar Items

Resource Map