Change search
ReferencesLink to record
Permanent link

Direct link
Speech Intelligibility Measurement on the basis of ITU-T Recommendation P.863
Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Intelligent Systems´ laboratory.
2012 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Speech Intelligibility Measurement on the basis of ITU-T Recommendation P.863 (English)
Abstract [en]

Objective speech intelligibility measurement techniques like AI (Articulation Index) and AI based STI (Speech Transmission Index) fail to assess speech intelligibility in modern telecommunication networks that use several non-linear processing for enhancing speech. Moreover, these techniques do not allow prediction of single individual CVC (Consonant Vowel Consonant) word intelligibility scores. ITU-T P.863 standard [1], which was developed for assessing speech quality, is used as a starting point to develop a simple new model for predicting subjective speech intelligibility of individual CVC words. Subjective intelligibility measurements were carried out for a large set of speech degradations. The subjective test uses single CVC word presentations in an eight alternative closed response set experiment. Subjects assess individual degraded CVC words and an average of correct recognition is used as the intelligibility score for a particular CVC word. The first subjective database uses CVC words that have variations in the first consonant i.e. /C/ous (represented as "kæʊs" using International Phonetic Association phonetic alphabets). This database is used for developing the objective model, while a new database based on VC words (Vowel Consonant) that uses variations in the second consonant (a/C/ e.g. aH, aL) is used for validating the model.

ITU-T P.863 shows very poor results with a correlation of 0.30 for the first subjective database. A first extension to make P.863 suited for intelligibility prediction is done by restructuring speech material to meet the temporal structure requirements (speech+silence+speech) set for standard P.863 measurements. The restructuring is done by concatenating every original and degraded CVC word with itself. There is no significant improvement in correlation (0.34) when using P.863 on the restructured first subjective database (speech material meets temporal requirements).  In this thesis a simple model based on P.863 is developed for assessing intelligibility of individual CVC words. The model uses a linear combination of a simple time clipping indicator (missing speech parts) and a “Good frame count” indicator which is based on the local perceptual (frame by frame) signal to noise ratio. Using this model on the restructured first database, a reasonably good correlation of 0.81 is seen between subjective scores and the model output values. For the validation database, a correlation of around 0.76 is obtained. Further validation on an existing database at TNO, which uses time clipping degradation only, shows an excellent correlation of 0.98.

Although a reasonably good correlation is seen on the first database and the validation database, it is too low for reliable measurements. Further validation and development is required, nevertheless the results show that a perception-based technique that uses internal representations of signals can be used for predicting subjective intelligibility scores of individual CVC words.

Place, publisher, year, edition, pages
2012. , 66 p.
Keyword [en]
Speech Intelligibility, POLQA
National Category
Engineering and Technology Signal Processing
Identifiers
URN: urn:nbn:se:hh:diva-20023Local ID: IDE1271OAI: oai:DiVA.org:hh-20023DiVA: diva2:571613
External cooperation
TNO, The Netherlands
Subject / course
Computer science and engineering
Presentation
2012-11-06, Halmstad, 15:15 (English)
Uppsok
Technology
Supervisors
Examiners
Available from: 2012-11-30 Created: 2012-11-23 Last updated: 2012-11-30Bibliographically approved

Open Access in DiVA

Speech Intelligibility Measurement on the basis of ITU-T Recommendation P.863(1357 kB)753 downloads
File information
File name FULLTEXT01.pdfFile size 1357 kBChecksum SHA-512
1b76a5c9b4818bc599b01cdcec92a7d5556b1114497f5e971e736595a3cdf0e3df0164ef85fcb2a8d23a79758faddaa8e7af85b9e9ba0fc6e76e20bee38e12af
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
GHIMIRE, SWATANTRA
By organisation
Intelligent Systems´ laboratory
Engineering and TechnologySignal Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 753 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 346 hits
ReferencesLink to record
Permanent link

Direct link