Digitala Vetenskapliga Arkivet

Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
A semi-polar grid strategy for the three-dimensional finite element simulation of vowel-vowel sequences
GTM Grup de recerca en Tecnologies Mèdia, La Salle, Universitat Ramon Llull, Barcelona, Spain.
KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.ORCID-id: 0000-0002-8991-1016
GTM Grup de recerca en Tecnologies Mèdia, La Salle, Universitat Ramon Llull, Barcelona, Spain.
KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.ORCID-id: 0000-0003-4532-014X
2017 (Engelska)Ingår i: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, The International Speech Communication Association (ISCA), 2017, Vol. 2017, s. 3477-3481Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Three-dimensional computational acoustic models need very detailed 3D vocal tract geometries to generate high quality sounds. Static geometries can be obtained from Magnetic Resonance Imaging (MRI), but it is not currently possible to capture dynamic MRI-based geometries with sufficient spatial and time resolution. One possible solution consists in interpolating between static geometries, but this is a complex task. We instead propose herein to use a semi-polar grid to extract 2D cross-sections from the static 3D geometries, and then interpolate them to obtain the vocal tract dynamics. Other approaches such as the adaptive grid have also been explored. In this method, cross-sections are defined perpendicular to the vocal tract midline, as typically done in 1D to obtain the vocal tract area functions. However, intersections between adjacent cross-sections may occur during the interpolation process, especially when the vocal tract midline quickly changes its orientation. In contrast, the semi-polar grid prevents these intersections because the plane orientations are fixed over time. Finite element simulations of static vowels are first conducted, showing that 3D acoustic wave propagation is not significantly altered when the semi-polar grid is used instead of the adaptive grid. The vowel-vowel sequence [ɑi] is finally simulated to demonstrate the method.

Ort, förlag, år, upplaga, sidor
The International Speech Communication Association (ISCA), 2017. Vol. 2017, s. 3477-3481
Serie
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, ISSN 2308-457X ; 2017
Nationell ämneskategori
Språkteknologi (språkvetenskaplig databehandling)
Forskningsämne
Tal- och musikkommunikation
Identifikatorer
URN: urn:nbn:se:kth:diva-212994DOI: 10.21437/Interspeech.2017-448ISI: 000457505000724Scopus ID: 2-s2.0-85039147985OAI: oai:DiVA.org:kth-212994DiVA, id: diva2:1136224
Konferens
18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, Stockholm, Sweden, 20 August 2017 through 24 August 2017
Anmärkning

QC 20170828

Tillgänglig från: 2017-08-25 Skapad: 2017-08-25 Senast uppdaterad: 2019-09-24Bibliografiskt granskad
Ingår i avhandling
1. Computational Modeling of the Vocal Tract: Applications to Speech Production
Öppna denna publikation i ny flik eller fönster >>Computational Modeling of the Vocal Tract: Applications to Speech Production
2018 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Human speech production is a complex process, involving neuromuscular control signals, the effects of articulators' biomechanical properties and acoustic wave propagation in a vocal tract tube of intricate shape. Modeling these phenomena may play an important role in advancing our understanding of the involved mechanisms, and may also have future medical applications, e.g., guiding doctors in diagnosing, treatment planning, and surgery prediction of related disorders, ranging from oral cancer, cleft palate, obstructive sleep apnea, dysphagia, etc.

A more complete understanding requires models that are as truthful representations as possible of the phenomena. Due to the complexity of such modeling, simplifications have nevertheless been used extensively in speech production research: phonetic descriptors (such as the position and degree of the most constricted part of the vocal tract) are used as control signals, the articulators are represented as two-dimensional geometrical models, the vocal tract is considered as a smooth tube and plane wave propagation is assumed, etc.

This thesis aims at firstly investigating the consequences of such simplifications, and secondly at contributing to establishing unified modeling of the speech production process, by connecting three-dimensional biomechanical modeling of the upper airway with three-dimensional acoustic simulations. The investigation on simplifying assumptions demonstrated the influence of vocal tract geometry features — such as shape representation, bending and lip shape — on its acoustic characteristics, and that the type of modeling — geometrical or biomechanical — affects the spatial trajectories of the articulators, as well as the transition of formant frequencies in the spectrogram.

The unification of biomechanical and acoustic modeling in three-dimensions allows to realistically control the acoustic output of dynamic sounds, such as vowel-vowel utterances, by contraction of relevant muscles. This moves and shapes the speech articulators that in turn dene the vocal tract tube in which the wave propagation occurs. The main contribution of the thesis in this line of work is a novel and complex method that automatically reconstructs the shape of the vocal tract from the biomechanical model. This step is essential to link biomechanical and acoustic simulations, since the vocal tract, which anatomically is a cavity enclosed by different structures, is only implicitly defined in a biomechanical model constituted of several distinct articulators.

Ort, förlag, år, upplaga, sidor
KTH Royal Institute of Technology, 2018. s. 105
Serie
TRITA-EECS-AVL ; 2018:90
Nyckelord
vocal tract, upper airway, speech production, biomechanical model, acoustic model, vocal tract reconstruction
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Tal- och musikkommunikation
Identifikatorer
urn:nbn:se:kth:diva-239071 (URN)978-91-7873-021-6 (ISBN)
Disputation
2018-12-07, D2, Lindstedtsvägen 5, Stockholm, 14:00 (Engelska)
Opponent
Handledare
Anmärkning

QC 20181116

Tillgänglig från: 2018-11-16 Skapad: 2018-11-16 Senast uppdaterad: 2018-11-16Bibliografiskt granskad

Open Access i DiVA

fulltext(647 kB)161 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 647 kBChecksumma SHA-512
230d79403abe5df7ec971582588efe6b146eecd6f813187ace215df72c0a9e8edca31c92324fd34f1491c527134bf9bbddb62eb85fa64ae852835d82cec93924
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltextScopus

Sök vidare i DiVA

Av författaren/redaktören
Dabbaghchian, SaeedEngwall, Olov
Av organisationen
Tal, musik och hörsel, TMH
Språkteknologi (språkvetenskaplig databehandling)

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 161 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 351 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf