Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Generating Facial Animation With Emotions In A Neural Text-To-Speech Pipeline
Linköping University, Department of Science and Technology, Media and Information Technology. Linköping University, The Institute of Technology.
2019 (English)Independent thesis Advanced level (degree of Master (One Year)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

This thesis presents the work of incorporating facial animation with emotions into a neural text-to-speech pipeline. The project aims to allow for a digital human to utter sentences given only text, removing the need for video input. Our solution consists of a neural network able to generate blend shape weights from speech which is placed in a neural text-to-speech pipeline. We build on ideas from previous work and implement a recurrent neural network using four LSTM layers and later extend this implementation by incorporating emotions. The emotions are learned by the network itself via the emotion layer and used at inference to produce the desired emotion. While using LSTMs for speech-driven facial animation is not a new idea, it has not yet been combined with the idea of using emotional states that are learned by the network itself. Previous approaches are either only two-dimensional, of complicated design or require manual laboring of the emotional states. Thus, we implement a network of simple design, taking advantage of the sequence processing ability of LSTMs and combines it with the idea of emotional states. We trained several variations of the network on data captured using a head mounted camera, and the results of the best performing model were used in a subjective evaluation. During the evaluation the participants were presented several videos and asked to rate the naturalness of the face uttering the sentence. The results showed that the naturalness of the face greatly depends on which emotion vector was used, as some vectors limited the mobility of the face. However, our best achieving emotion vector was rated at the same level of naturalness as the ground truth, proving our method successful. The purpose of the thesis was fulfilled as our implementation demonstrates one possibility of incorporating facial animation into a text-to-speech pipeline.

Place, publisher, year, edition, pages
2019. , p. 31
Keywords [en]
facial animation, neural networks, text-to-speech, speech-driven animation, emotional states
National Category
Media and Communication Technology
Identifiers
URN: urn:nbn:se:liu:diva-160535ISRN: LIU-ITN-TEK-A-019/048--SEOAI: oai:DiVA.org:liu-160535DiVA, id: diva2:1354626
Subject / course
Media Technology
Uppsok
Technology
Supervisors
Examiners
Available from: 2019-09-25 Created: 2019-09-25 Last updated: 2019-09-25Bibliographically approved

Open Access in DiVA

Generating Facial Animation With Emotions In A Neural Text-To-Speech Pipeline(1026 kB)10 downloads
File information
File name FULLTEXT01.pdfFile size 1026 kBChecksum SHA-512
28fcff6343bcebadf712a83a8ba8bf3b623f0fcc7ce04c6637b441cb07d3007db00e61180ca5cb2da81dbab570ce0f81e884f8289932c821460d6102c403925b
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Igeland, Viktor
By organisation
Media and Information TechnologyThe Institute of Technology
Media and Communication Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 10 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 154 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf