Change search
Refine search result
1234567 1 - 50 of 1504
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Oldest first
  • Newest first
Select all
  • 1. AAl Abdulsalam, Abdulrahman
    et al.
    Velupillai, Sumithra
    Meystre, Stephane
    UtahBMI at SemEval-2016 Task 12: Extracting Temporal Information from Clinical Text2016In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), Association for Computational Linguistics , 2016, 1256-1262 p.Conference paper (Refereed)
    Abstract [en]

    The 2016 Clinical TempEval continued the 2015 shared task on temporal information extraction with a new evaluation test set. Our team, UtahBMI, participated in all subtasks using machine learning approaches with ClearTK (LIBLINEAR), CRF++ and CRFsuite packages. Our experiments show that CRF-based classifiers yield, in general, higher recall for multi-word spans, while SVM-based classifiers are better at predicting correct attributes of TIMEX3. In addition, we show that an ensemble-based approach for TIMEX3 could yield improved results. Our team achieved competitive results in each subtask with an F1 75.4% for TIMEX3, F1 89.2% for EVENT, F1 84.4% for event relations with document time (DocTimeRel), and F1 51.1% for narrative container (CONTAINS) relations.

  • 2.
    Abou Zliekha, M.
    et al.
    Damascus University/Faculty of Information Technology.
    Al Moubayed, Samer
    Damascus University/Faculty of Information Technology.
    Al Dakkak, O.
    Higher Institute of Applied Science and Technology (HIAST).
    Ghneim, N.
    Higher Institute of Applied Science and Technology (HIAST).
    Emotional Audio-Visual Arabic Text to Speech2006In: Proceedings of the XIV European Signal Processing Conference (EUSIPCO), Florence, Italy, 2006Conference paper (Refereed)
    Abstract [en]

    The goal of this paper is to present an emotional audio-visual. Text to speech system for the Arabic Language. The system is based on two entities: un emotional audio text to speech system which generates speech depending on the input text and the desired emotion type, and un emotional Visual model which generates the talking heads, by forming the corresponding visemes. The phonemes to visemes mapping, and the emotion shaping use a 3-paramertic face model, based on the Abstract Muscle Model. We have thirteen viseme models and five emotions as parameters to the face model. The TTS produces the phonemes corresponding to the input text, the speech with the suitable prosody to include the prescribed emotion. In parallel the system generates the visemes and sends the controls to the facial model to get the animation of the talking head in real time.

  • 3. Abrahamsson, M.
    et al.
    Sundberg, Johan
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics.
    Subglottal pressure variation in actors’ stage speech2007In: Voice and Gender Journal for the Voice and Speech Trainers Association / [ed] Rees, M., VASTA Publishing , 2007, 343-347 p.Chapter in book (Refereed)
  • 4.
    Abrahamsson, Peder
    Linköping University, Department of Computer and Information Science.
    Mer lättläst: Påbyggnad av ett automatiskt omskrivningsverktyg till lätt svenska2011Independent thesis Basic level (degree of Bachelor), 12 credits / 18 HE creditsStudent thesis
    Abstract [sv]

    Det svenska språket ska finnas tillgängligt för alla som bor och verkar i Sverige. Därförär det viktigt att det finns lättlästa alternativ för dem som har svårighet att läsa svensktext. Detta arbete bygger vidare på att visa att det är möjligt att skapa ett automatisktomskrivningsprogram som gör texter mer lättlästa. Till grund för arbetet liggerCogFLUX som är ett verktyg för automatisk omskrivning till lätt svenska. CogFLUXinnehåller funktioner för att syntaktiskt skriva om texter till mer lättläst svenska.Omskrivningarna görs med hjälp av omskrivningsregler framtagna i ett tidigare projekt.I detta arbete implementeras ytterligare omskrivningsregler och även en ny modul förhantering av synonymer. Med dessa nya regler och modulen ska arbetet undersöka omdet är det är möjligt att skapa system som ger en mer lättläst text enligt etableradeläsbarhetsmått som LIX, OVIX och Nominalkvot. Omskrivningsreglerna ochsynonymhanteraren testas på tre olika texter med en total lägnd på ungefär hundra tusenord. Arbetet visar att det går att sänka både LIX-värdet och Nominalkvoten signifikantmed hjälp av omskrivningsregler och synonymhanterare. Arbetet visar även att det finnsfler saker kvar att göra för att framställa ett riktigt bra program för automatiskomskrivning till lätt svenska.

  • 5.
    Adesam, Yvonne
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    The Multilingual Forest: Investigating High-quality Parallel Corpus Development2012Doctoral thesis, monograph (Other academic)
    Abstract [en]

    This thesis explores the development of parallel treebanks, collections of language data consisting of texts and their translations, with syntactic annotation and alignment, linking words, phrases, and sentences to show translation equivalence. We describe the semi-manual annotation of the SMULTRON parallel treebank, consisting of 1,000 sentences in English, German and Swedish. This description is the starting point for answering the first of two questions in this thesis.

    • What issues need to be considered to achieve a high-quality, consistent,parallel treebank?

    The units of annotation and the choice of annotation schemes are crucial for quality, and some automated processing is necessary to increase the size. Automatic quality checks and evaluation are essential, but manual quality control is still needed to achieve high quality.

    Additionally, we explore improving the automatically created annotation for one language, using information available from the annotation of the other languages. This leads us to the second of the two questions in this thesis.

    • Can we improve automatic annotation by projecting information available in the other languages?

    Experiments with automatic alignment, which is projected from two language pairs, L1–L2 and L1–L3, onto the third pair, L2–L3, show an improvement in precision, in particular if the projected alignment is intersected with the system alignment. We also construct a test collection for experiments on annotation projection to resolve prepositional phrase attachment ambiguities. While majority vote projection improves the annotation, compared to the basic automatic annotation, using linguistic clues to correct the annotation before majority vote projection is even better, although more laborious. However, some structural errors cannot be corrected by projection at all, as different languages have different wording, and thus different structures.

  • 6. Agić, Zeljko
    et al.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Merkler, Danijela
    Krek, Simon
    Dobrovoljc, Kaja
    Moze, Sara
    Cross-lingual Dependency Parsing of Related Languages with Rich Morphosyntactic Tagsets2014In: Proceedings of the EMNLP’2014 Workshop on Language Technology for Closely Related Languages and Language Variants, 2014, 13-24 p.Conference paper (Refereed)
  • 7.
    Ahlbom, Viktoria
    et al.
    Uppsala University, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Faculty of Languages, Department of Linguistics.
    Sågvall Hein, Anna
    Test Suites Covering the Functional Specifications of the Sub-components of the Swedish Prototype1999In: Working Papers in Computational Linguistics & Language Engineering;13, ISSN 1401-923X, no 13, 28- p.Article in journal (Other scientific)
  • 8.
    Ahlbom, Viktoria
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics.
    Sågvall Hein, Anna
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics.
    Test Suites Covering the Functional Specifications of the Sub-components of the Swedish Prototype1999In: Working Papers in Computational Linguistics & Language Engineering;13, ISSN 1401-923X, no 13, 28- p.Article in journal (Other academic)
  • 9.
    Ahltorp, Magnus
    et al.
    Stockholm.
    Skeppstedt, Maria
    Linnaeus University, Faculty of Technology, Department of Computer Science. Gavagai, Stockholm.
    Kitajima, Shiho
    Hokkaido Univ, Japan.
    Henriksson, Aron
    Stockholm University.
    Rzepka, Rafal
    Hokkaido Univ, Japan.
    Araki, Kenji
    Hokkaido Univ, Japan.
    Expansion of medical vocabularies using distributional semantics on Japanese patient blogs2016In: Journal of Biomedical Semantics, ISSN 2041-1480, E-ISSN 2041-1480, Vol. 7, 58Article in journal (Refereed)
    Abstract [en]

    Background: Research on medical vocabulary expansion from large corpora has primarily been conducted using text written in English or similar languages, due to a limited availability of large biomedical corpora in most languages. Medical vocabularies are, however, essential also for text mining from corpora written in other languages than English and belonging to a variety of medical genres. The aim of this study was therefore to evaluate medical vocabulary expansion using a corpus very different from those previously used, in terms of grammar and orthographics, as well as in terms of text genre. This was carried out by applying a method based on distributional semantics to the task of extracting medical vocabulary terms from a large corpus of Japanese patient blogs. Methods: Distributional properties of terms were modelled with random indexing, followed by agglomerative hierarchical clustering of 3x100 seed terms from existing vocabularies, belonging to three semantic categories: Medical Finding, Pharmaceutical Drug and Body Part. By automatically extracting unknown terms close to the centroids of the created clusters, candidates for new terms to include in the vocabulary were suggested. The method was evaluated for its ability to retrieve the remaining n terms in existing medical vocabularies. Results: Removing case particles and using a context window size of 1 + 1 was a successful strategy for Medical Finding and Pharmaceutical Drug, while retaining case particles and using a window size of 8 + 8 was better for Body Part. For a 10n long candidate list, the use of different cluster sizes affected the result for Pharmaceutical Drug, while the effect was only marginal for the other two categories. For a list of top n candidates for Body Part, however, clusters with a size of up to two terms were slightly more useful than larger clusters. For Pharmaceutical Drug, the best settings resulted in a recall of 25 % for a candidate list of top n terms and a recall of 68 % for top 10n. For a candidate list of top 10n candidates, the second best results were obtained for Medical Finding: a recall of 58 %, compared to 46 % for Body Part. Only taking the top n candidates into account, however, resulted in a recall of 23 % for Body Part, compared to 16 % for Medical Finding. Conclusions: Different settings for corpus pre-processing, window sizes and cluster sizes were suitable for different semantic categories and for different lengths of candidate lists, showing the need to adapt parameters, not only to the language and text genre used, but also to the semantic category for which the vocabulary is to be expanded. The results show, however, that the investigated choices for pre-processing and parameter settings were successful, and that a Japanese blog corpus, which in many ways differs from those used in previous studies, can be a useful resource for medical vocabulary expansion.

  • 10.
    Ahmady, Tobias
    et al.
    KTH, School of Technology and Health (STH), Medical Engineering, Computer and Electronic Engineering.
    Klein Rosmar, Sander
    KTH, School of Technology and Health (STH), Medical Engineering, Computer and Electronic Engineering.
    Translation of keywords between English and Swedish2014Independent thesis Basic level (university diploma), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    In this project, we have investigated how to perform rule-based machine translation of sets of keywords between two languages. The goal was to translate an input set, which contains one or more keywords in a source language, to a corresponding set of keywords, with the same number of elements, in the target language. However, some words in the source language may have several senses and may be translated to several, or no, words in the target language. If ambiguous translations occur, the best translation of the keyword should be chosen with respect to the context. In traditional machine translation, a word's context is determined by a phrase or sentences where the word occurs. In this project, the set of keywords represents the context.

    By investigating traditional approaches to machine translation (MT), we designed and described models for the specific purpose of keyword- translation. We have proposed a solution, based on direct translation for translating keywords between English and Swedish. In the proposed solu- tion, we also introduced a simple graph-based model for solving ambigu- ous translations. 

  • 11.
    Ahrenberg, Lars
    Linköping University, Department of Computer and Information Science, NLPLAB - Natural Language Processing Laboratory. Linköping University, The Institute of Technology.
    A Simple Hybrid Aligner for Generating Lexical Correspondences in Parallel Texts.1998In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL'98) / [ed] Pierre Isabelle, Stroudsburg, PA, USA: The Association for Computational Linguistics , 1998, 29-35 p.Conference paper (Refereed)
  • 12.
    Ahrenberg, Lars
    Linköping University, Department of Computer and Information Science, Human-Centered systems. Linköping University, Faculty of Science & Engineering.
    Towards a research infrastructure for translation studies.2014Conference paper (Other academic)
    Abstract [en]

    In principle the CLARIN research infrastructure provides a good environment to support research on translation. In reality, the progress within CLARIN in this area seems to be fairly slow. In this paper I will give examples of the resources currently available, and suggest what is needed to achieve a relevant research infrastructure for translation studies. Also, I argue that translation studies has more to gain from language technology, and statistical machine translation in particular, than what is generally assumed, and give some examples.

  • 13.
    Ahrenberg, Lars
    Linköping University, Department of Computer and Information Science, Human-Centered systems. Linköping University, Faculty of Science & Engineering.
    Converting an English-Swedish Parallel Treebank to Universal Dependencies2015In: Proceedings of the Third International Conference on Dependency Linguistics (DepLing 2015), Association for Computational Linguistics, 2015, 10-19 p., W15-2103Conference paper (Refereed)
    Abstract [en]

    The paper reports experiences of automatically converting the dependency analysis of the LinES English-Swedish parallel treebank to universal dependencies (UD). The most tangible result is a version of the treebank that actually employs the relations and parts-of-speech categories required by UD, and no other. It is also more complete in that punctuation marks have received dependencies, which is not the case in the original version. We discuss our method in the light of problems that arise from the desire to keep the syntactic analyses of a parallel treebank internally consistent, while available monolingual UD treebanks for English and Swedish diverge somewhat in their use of UD annotations. Finally, we compare the output from the conversion program with the existing UD treebanks.

  • 14.
    Ahrenberg, Lars
    Linköping University, Department of Computer and Information Science, NLPLAB - Natural Language Processing Laboratory. Linköping University, The Institute of Technology.
    Alignment-based profiling of Europarl data in an English-Swedish parallel corpus2010In: Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) / [ed] Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias, Paris, France: European Language Resources Association (ELRA) , 2010, 3398-3404 p.Conference paper (Refereed)
    Abstract [en]

    This paper profiles the Europarl part of an English-Swedish parallel corpus and compares it with three other subcorpora of the sameparallel corpus. We first describe our method for comparison which is based on alignments, both at the token level and the structurallevel. Although two of the other subcorpora contains fiction, it is found that the Europarl part is the one having the highest proportion ofmany types of restructurings, including additions, deletions and long distance reorderings. We explain this by the fact that the majorityof Europarl segments are parallel translations.

  • 15.
    Ahrenberg, Lars and Merkel, Magnus and Ridings, Daniel and Sågvall Hein, Anna and Tiedemann, Jörg
    Uppsala University, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Faculty of Languages, Department of Linguistics.
    Automatic processing of parallel corpora: A Swedish perspective.1999Report (Other scientific)
    Abstract [en]

    As empirical methods have come to the fore in language technology and translation studies, the processing of parallel texts and parallel corpora have become a major issue. In this article we review the state of the art in alignment and data extraction tec

  • 16.
    Ahrenberg, Lars
    et al.
    Linköping University, Department of Computer and Information Science, NLPLAB - Natural Language Processing Laboratory. Linköping University, The Institute of Technology.
    Merkel, Magnus
    Linköping University, Department of Computer and Information Science, NLPLAB - Natural Language Processing Laboratory. Linköping University, The Institute of Technology.
    Correspondence measures for MT evaluation.2000In: Proceedings of the Second International Conference on Linguistic Resources and Evaluation (LREC-2000, Paris, France: European Language Resources Association (ELRA) , 2000, 41-46 p.Conference paper (Refereed)
  • 17.
    Ahrenberg, Lars
    et al.
    Linköping University, Department of Computer and Information Science, NLPLAB - Natural Language Processing Laboratory. Linköping University, The Institute of Technology.
    Merkel, Magnus
    Linköping University, Department of Computer and Information Science, NLPLAB - Natural Language Processing Laboratory. Linköping University, The Institute of Technology.
    A knowledge-lite approach to word alignment2000In: Parallel Text Processing: Alignment and Use of Translation Corpora / [ed] Jean Veronis, Dordrecht, The Netherlands: Kluwer Academic Publishers, 2000, 97-116 p.Chapter in book (Other academic)
    Abstract [en]

    The most promising approach to word alignment is to combine statistical methods with non-statistical information sources. Some of the proposed non-statistical sources, including bilingual dictionaries, POS-taggers and lemmatizers, rely on considerable linguistic knowledge, while other knowledge-lite sources such as cognate heuristics and word order heuristics can be implemented relatively easy. While knowledge-heavy sources might be expected to give better performance, knowledge-lite systems are easier to port to new language pairs and text types, and they can give sufficiently good results for many purposes, e.g. if the output is to be used by a human user for the creation of a complete word-aligned bitext. In this paper we describe the current status of the Linköping Word Aligner (LWA), which combines the use of statistical measures of co-occurrence with four knowledge-lite modules for (i)) word categorization, (ii) morphological variation, (iii) word order, and (iv) phrase recognition. We demonstrate the portability of the system (from English-Swedish texts to French-English texts) and present results for these two language-pairs. Finally, we will report observations from an error analysis of system output, and identify the major strengths and weaknesses of the system.

  • 18. Ahrenberg, Lars
    et al.
    Merkel, Magnus
    Sågvall Hein, Anna
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics.
    Tiedemann, Jörg
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics.
    Evaluation of LWA and UWA1999Report (Other academic)
  • 19.
    Ahrenberg, Lars
    et al.
    Linköping University, Department of Computer and Information Science, NLPLAB - Natural Language Processing Laboratory. Linköping University, The Institute of Technology.
    Merkel, Magnus
    Linköping University, Department of Computer and Information Science, NLPLAB - Natural Language Processing Laboratory. Linköping University, The Institute of Technology.
    Sågvall Hein, Anna
    Institutionen för lingvistik, Uppsala universitet..
    Tiedemann, Jörg
    Institutionen för lingvistik, Uppsala universitet.
    Evaluation of word alignment systems2000In: Proceedings of the Second International Conference on Linguistic Resources and Evaluation (LREC-2000), Paris, France: European Language Resources Association (ELRA) , 2000, 1255-1261 p.Conference paper (Refereed)
  • 20.
    Al Dakkak, O.
    et al.
    Higher Institute of Applied Sciencenand Technology (HIAST).
    Ghneim, N.
    Higher Institute of Applied Sciencenand Technology (HIAST).
    Abou Zliekha, M.
    Damascus University/Faculty of Information Technology.
    Al Moubayed, Samer
    Damascus University/Faculty of Information Technology.
    Emotional Inclusion in An Arabic Text-To-Speech2005In: Proceedings of the 13th European Signal Processing Conference (EUSIPCO), Antalya, Turkey, 2005Conference paper (Refereed)
    Abstract [en]

    The goal of this paper is to present an emotional audio-visua lText to speech system for the Arabic Language. The system is based on two entities: un emotional audio text to speech system which generates speech depending on the input text and the desired emotion type, and un emotional Visual model which generates the talking heads, by forming the corresponding visemes. The phonemes to visemes mapping, and the emotion shaping use a 3-paramertic face model, based on the Abstract Muscle Model. We have thirteen viseme models and five emotions as parameters to the face model. The TTS produces the phonemes corresponding to the input text, the speech with the suitable prosody to include the prescribed emotion. In parallel the system generates the visemes and sends the controls to the facial model to get the animation of the talking head in real time.

  • 21.
    Al Dakkak, O.
    et al.
    HIAST, Damascus, Syria.
    Ghneim, N.
    HIAST, Damascus, Syria.
    Abou Zliekha, M.
    Damascus University.
    Al Moubayed, Samer
    Damascus University.
    Prosodic Feature Introduction and Emotion Incorporation in an Arabic TTS2006In: Proceedings of IEEE International Conference on Information and Communication Technologies, Damascus, Syria, 2006, 1317-1322 p.Conference paper (Refereed)
    Abstract [en]

    Text-to-speech is a crucial part of many man-machine communication applications, such as phone booking and banking, vocal e-mail, and many other applications. In addition to many other applications concerning impaired persons, such as: reading machines for blinds, talking machines for persons with speech difficulties. However, the main drawback of most speech synthesizers in the talking machines, are their metallic sounds. In order to sound naturally, we have to incorporate prosodic features, as close as possible to natural prosody, this helps to improve the quality of the synthetic speech. Actual researches in the world are towards better "automatic prosody generation".

  • 22.
    Al Moubayed, Samer
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Prosodic Disambiguation in Spoken Systems Output2009In: Proceedings of Diaholmia'09: 2009 Workshop on the Semantics and Pragmatics of Dialogue / [ed] Jens Edlund, Joakim Gustafson, Anna Hjalmarsson, Gabriel Skantze, Stockholm, Sweden., 2009, 131-132 p.Conference paper (Refereed)
    Abstract [en]

    This paper presents work on using prosody in the output of spoken dialogue systems to resolve possible structural ambiguity of output utterances. An algorithm is proposed to discover ambiguous parses of an utterance and to add prosodic disambiguation events to deliver the intended structure. By conducting a pilot experiment, the automatic prosodic grouping applied to ambiguous sentences shows the ability to deliver the intended interpretation of the sentences.

  • 23.
    Al Moubayed, Samer
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Towards rich multimodal behavior in spoken dialogues with embodied agents2013In: 4th IEEE International Conference on Cognitive Infocommunications, CogInfoCom 2013 - Proceedings, IEEE Computer Society, 2013, 817-822 p.Conference paper (Refereed)
    Abstract [en]

    Spoken dialogue frameworks have traditionally been designed to handle a single stream of data - the speech signal. Research on human-human communication has been providing large evidence and quantifying the effects and the importance of a multitude of other multimodal nonverbal signals that people use in their communication, that shape and regulate their interaction. Driven by findings from multimodal human spoken interaction, and the advancements of capture devices and robotics and animation technologies, new possibilities are rising for the development of multimodal human-machine interaction that is more affective, social, and engaging. In such face-to-face interaction scenarios, dialogue systems can have a large set of signals at their disposal to infer context and enhance and regulate the interaction through the generation of verbal and nonverbal facial signals. This paper summarizes several design decision, and experiments that we have followed in attempts to build rich and fluent multimodal interactive systems using a newly developed hybrid robotic head called Furhat, and discuss issues and challenges that this effort is facing.

  • 24.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Alexanderson, Simon
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    A robotic head using projected animated faces2011In: Proceedings of the International Conference on Audio-Visual Speech Processing 2011 / [ed] Salvi, G.; Beskow, J.; Engwall, O.; Al Moubayed, S., Stockholm: KTH Royal Institute of Technology, 2011, 71- p.Conference paper (Refereed)
    Abstract [en]

    This paper presents a setup which employs virtual animatedagents for robotic heads. The system uses a laser projector toproject animated faces onto a three dimensional face mask. This approach of projecting animated faces onto a three dimensional head surface as an alternative to using flat, two dimensional surfaces, eliminates several deteriorating effects and illusions that come with flat surfaces for interaction purposes, such as exclusive mutual gaze and situated and multi-partner dialogues. In addition to that, it provides robotic heads with a flexible solution for facial animation which takes into advantage the advancements of facial animation using computer graphics overmechanically controlled heads.

  • 25.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Ananthakrishnan, Gopal
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Acoustic-to-Articulatory Inversion based on Local Regression2010In: Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, Makuhari, Japan, 2010, 937-940 p.Conference paper (Refereed)
    Abstract [en]

    This paper presents an Acoustic-to-Articulatory inversionmethod based on local regression. Two types of local regression,a non-parametric and a local linear regression have beenapplied on a corpus containing simultaneous recordings of positionsof articulators and the corresponding acoustics. A maximumlikelihood trajectory smoothing using the estimated dynamicsof the articulators is also applied on the regression estimates.The average root mean square error in estimating articulatorypositions, given the acoustics, is 1.56 mm for the nonparametricregression and 1.52 mm for the local linear regression.The local linear regression is found to perform significantlybetter than regression using Gaussian Mixture Modelsusing the same acoustic and articulatory features.

  • 26.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Ananthakrishnan, Gopal
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Enflo, Laura
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Music Acoustics. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Automatic Prominence Classification in Swedish2010In: Proceedings of Speech Prosody 2010, Workshop on Prosodic Prominence, Chicago, USA, 2010Conference paper (Refereed)
    Abstract [en]

    This study aims at automatically classifying levels of acoustic prominence on a dataset of 200 Swedish sentences of read speech by one male native speaker. Each word in the sentences was categorized by four speech experts into one of three groups depending on the level of prominence perceived. Six acoustic features at a syllable level and seven features at a word level were used. Two machine learning algorithms, namely Support Vector Machines (SVM) and memory based Learning (MBL) were trained to classify the sentences into their respective classes. The MBL gave an average word level accuracy of 69.08% and the SVM gave an average accuracy of 65.17 % on the test set. These values were comparable with the average accuracy of the human annotators with respect to the average annotations. In this study, word duration was found to be the most important feature required for classifying prominence in Swedish read speech

  • 27.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Effects of Visual Prominence Cues on Speech Intelligibility2009In: Proceedings of Auditory-Visual Speech Processing AVSP'09, Norwich, England, 2009Conference paper (Refereed)
    Abstract [en]

    This study reports experimental results on the effect of visual prominence, presented as gestures, on speech intelligibility. 30 acoustically vocoded sentences, permutated into different gestural conditions were presented audio-visually to 12 subjects. The analysis of correct word recognition shows a significant increase in intelligibility when focally-accented (prominent) words are supplemented with head-nods or with eye-brow raise gestures. The paper also examines coupling other acoustic phenomena to brow-raise gestures. As a result, the paper introduces new evidence on the ability of the non-verbal movements in the visual modality to support audio-visual speech perception.

  • 28.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Perception of Nonverbal Gestures of Prominence in Visual Speech Animation2010In: Proceedings of the ACM/SSPNET 2nd International Symposium on Facial Analysis and Animation, Edinburgh, UK, 2010, 25- p.Conference paper (Refereed)
    Abstract [en]

    It has long been recognized that visual speech information is important for speech perception [McGurk and MacDonald 1976] [Summerfield 1992]. Recently there has been an increasing interest in the verbal and non-verbal interaction between the visual and the acoustic modalities from production and perception perspectives. One of the prosodic phenomena which attracts much focus is prominence. Prominence is defined as when a linguistic segment is made salient in its context.

  • 29.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    A novel Skype interface using SynFace for virtual speech reading support2011In: Proceedings from Fonetik 2011, June 8 - June 10, 2011: Speech, Music and Hearing, Quarterly Progress and Status Report, TMH-OPSR, Volume 51, 2011, Stockholm, Sweden, 2011, 33-36 p.Conference paper (Other academic)
    Abstract [en]

    We describe in this paper a support client interface to the IP telephony application Skype. The system uses a variant of SynFace, a real-time speech reading support system using facial animation. The new interface is designed for the use by elderly persons, and tailored for use in systems supporting touch screens. The SynFace real-time facial animation system has previously shown ability to enhance speech comprehension for the hearing impaired persons. In this study weemploy at-home field studies on five subjects in the EU project MonAMI. We presentinsights from interviews with the test subjects on the advantages of the system, and onthe limitations of such a technology of real-time speech reading to reach the homesof elderly and the hard of hearing.

  • 30.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Prominence Detection in Swedish Using Syllable Correlates2010In: Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, Makuhari, Japan, 2010, 1784-1787 p.Conference paper (Refereed)
    Abstract [en]

    This paper presents an approach to estimating word level prominence in Swedish using syllable level features. The paper discusses the mismatch problem of annotations between word level perceptual prominence and its acoustic correlates, context, and data scarcity. 200 sentences are annotated by 4 speech experts with prominence on 3 levels. A linear model for feature extraction is proposed on a syllable level features, and weights of these features are optimized to match word level annotations. We show that using syllable level features and estimating weights for the acoustic correlates to minimize the word level estimation error gives better detection accuracy compared to word level features, and that both features exceed the baseline accuracy.

  • 31.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Blomberg, Mats
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Mirning, N.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Talking with Furhat - multi-party interaction with a back-projected robot head2012In: Proceedings of Fonetik 2012, Gothenberg, Sweden, 2012, 109-112 p.Conference paper (Other academic)
    Abstract [en]

    This is a condensed presentation of some recent work on a back-projected robotic head for multi-party interaction in public settings. We will describe some of the design strategies and give some preliminary analysis of an interaction database collected at the Robotville exhibition at the London Science Museum

  • 32.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Animated Faces for Robotic Heads: Gaze and Beyond2011In: Analysis of Verbal and Nonverbal Communication and Enactment: The Processing Issues / [ed] Anna Esposito, Alessandro Vinciarelli, Klára Vicsi, Catherine Pelachaud and Anton Nijholt, Springer Berlin/Heidelberg, 2011, 19-35 p.Conference paper (Refereed)
    Abstract [en]

    We introduce an approach to using animated faces for robotics where a static physical object is used as a projection surface for an animation. The talking head is projected onto a 3D physical head model. In this chapter we discuss the different benefits this approach adds over mechanical heads. After that, we investigate a phenomenon commonly referred to as the Mona Lisa gaze effect. This effect results from the use of 2D surfaces to display 3D images and causes the gaze of a portrait to seemingly follow the observer no matter where it is viewed from. The experiment investigates the perception of gaze direction by observers. The analysis shows that the 3D model eliminates the effect, and provides an accurate perception of gaze direction. We discuss at the end the different requirements of gaze in interactive systems, and explore the different settings these findings give access to.

  • 33.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Auditory visual prominence From intelligibility to behavior2009In: Journal on Multimodal User Interfaces, ISSN 1783-7677, E-ISSN 1783-8738, Vol. 3, no 4, 299-309 p.Article in journal (Refereed)
    Abstract [en]

    Auditory prominence is defined as when an acoustic segment is made salient in its context. Prominence is one of the prosodic functions that has been shown to be strongly correlated with facial movements. In this work, we investigate the effects of facial prominence cues, in terms of gestures, when synthesized on animated talking heads. In the first study, a speech intelligibility experiment is conducted, speech quality is acoustically degraded and the fundamental frequency is removed from the signal, then the speech is presented to 12 subjects through a lip synchronized talking head carrying head-nods and eyebrows raise gestures, which are synchronized with the auditory prominence. The experiment shows that presenting prominence as facial gestures significantly increases speech intelligibility compared to when these gestures are randomly added to speech. We also present a follow-up study examining the perception of the behavior of the talking heads when gestures are added over pitch accents. Using eye-gaze tracking technology and questionnaires on 10 moderately hearing impaired subjects, the results of the gaze data show that users look at the face in a similar fashion to when they look at a natural face when gestures are coupled with pitch accents opposed to when the face carries no gestures. From the questionnaires, the results also show that these gestures significantly increase the naturalness and the understanding of the talking head.

  • 34.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Mirning, Nicole
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Tscheligi, Manfred
    Furhat goes to Robotville: a large-scale multiparty human-robot interaction data collection in a public space2012In: Proc of LREC Workshop on Multimodal Corpora, Istanbul, Turkey, 2012Conference paper (Refereed)
    Abstract [en]

    In the four days of the Robotville exhibition at the London Science Museum, UK, during which the back-projected head Furhat in a situated spoken dialogue system was seen by almost 8 000 visitors, we collected a database of 10 000 utterances spoken to Furhat in situated interaction. The data collection is an example of a particular kind of corpus collection of human-machine dialogues in public spaces that has several interesting and specific characteristics, both with respect to the technical details of the collection and with respect to the resulting corpus contents. In this paper, we take the Furhat data collection as a starting point for a discussion of the motives for this type of data collection, its technical peculiarities and prerequisites, and the characteristics of the resulting corpus.

  • 35.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Salvi, Giampiero
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    SynFace Phone Recognizer for Swedish Wideband and Narrowband Speech2008In: Proceedings of The second Swedish Language Technology Conference (SLTC), Stockholm, Sweden., 2008, 3-6 p.Conference paper (Other academic)
    Abstract [en]

    In this paper, we present new results and comparisons of the real-time lips synchronized talking head SynFace on different Swedish databases and bandwidth. The work involves training SynFace on narrow-band telephone speech from the Swedish SpeechDat, and on the narrow-band and wide-band Speecon corpus. Auditory perceptual tests are getting established for SynFace as an audio visual hearing support for the hearing-impaired. Preliminary results show high recognition accuracy compared to other languages.

  • 36.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    The Furhat Social Companion Talking Head2013In: Interspeech 2013 - Show and Tell, 2013, 747-749 p.Conference paper (Refereed)
    Abstract [en]

    In this demonstrator we present the Furhat robot head. Furhat is a highly human-like robot head in terms of dynamics, thanks to its use of back-projected facial animation. Furhat also takes advantage of a complex and advanced dialogue toolkits designed to facilitate rich and fluent multimodal multiparty human-machine situated and spoken dialogue. The demonstrator will present a social dialogue system with Furhat that allows for several simultaneous interlocutors, and takes advantage of several verbal and nonverbal input signals such as speech input, real-time multi-face tracking, and facial analysis, and communicates with its users in a mixed initiative dialogue, using state of the art speech synthesis, with rich prosody, lip animated facial synthesis, eye and head movements, and gestures.

  • 37.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Spontaneous spoken dialogues with the Furhat human-like robot head2014In: HRI '14 Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction, Bielefeld, Germany, 2014, 326- p.Conference paper (Refereed)
    Abstract [en]

    We will show in this demonstrator an advanced multimodal and multiparty spoken conversational system using Furhat, a robot head based on projected facial animation. Furhat is an anthropomorphic robot head that utilizes facial animation for physical robot heads using back-projection. In the system, multimodality is enabled using speech and rich visual input signals such as multi-person real-time face tracking and microphone tracking. The demonstrator will showcase a system that is able to carry out social dialogue with multiple interlocutors simultaneously with rich output signals such as eye and head coordination, lips synchronized speech synthesis, and non-verbal facial gestures used to regulate fluent and expressive multiparty conversations. The dialogue design is performed using the IrisTK [4] dialogue authoring toolkit developed at KTH. The system will also be able to perform a moderator in a quiz-game showing different strategies for regulating spoken situated interactions.

  • 38.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Öster, Anne-Marie
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Salvi, Giampiero
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    van Son, Nic
    Viataal, Nijmegen, The Netherlands.
    Ormel, Ellen
    Viataal, Nijmegen, The Netherlands.
    Herzke, Tobias
    HörTech gGmbH, Germany.
    Studies on Using the SynFace Talking Head for the Hearing Impaired2009In: Proceedings of Fonetik'09: The XXIIth Swedish Phonetics Conference, June 10-12, 2009 / [ed] Peter Branderud, Hartmut Traunmüller, Stockholm: Stockholm University, 2009, 140-143 p.Conference paper (Other academic)
    Abstract [en]

    SynFace is a lip-synchronized talking agent which is optimized as a visual reading support for the hearing impaired. In this paper wepresent the large scale hearing impaired user studies carried out for three languages in the Hearing at Home project. The user tests focuson measuring the gain in Speech Reception Threshold in Noise and the effort scaling when using SynFace by hearing impaired people, where groups of hearing impaired subjects with different impairment levels from mild to severe and cochlear implants are tested. Preliminaryanalysis of the results does not show significant gain in SRT or in effort scaling. But looking at large cross-subject variability in both tests, it isclear that many subjects benefit from SynFace especially with speech with stereo babble.

  • 39.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Taming Mona Lisa: communicating gaze faithfully in 2D and 3D facial projections2012In: ACM Transactions on Interactive Intelligent Systems, ISSN 2160-6455, Vol. 1, no 2, 25- p., 11Article in journal (Refereed)
    Abstract [en]

    The perception of gaze plays a crucial role in human-human interaction. Gaze has been shown to matter for a number of aspects of communication and dialogue, especially for managing the flow of the dialogue and participant attention, for deictic referencing, and for the communication of attitude. When developing embodied conversational agents (ECAs) and talking heads, modeling and delivering accurate gaze targets is crucial. Traditionally, systems communicating through talking heads have been displayed to the human conversant using 2D displays, such as flat monitors. This approach introduces severe limitations for an accurate communication of gaze since 2D displays are associated with several powerful effects and illusions, most importantly the Mona Lisa gaze effect, where the gaze of the projected head appears to follow the observer regardless of viewing angle. We describe the Mona Lisa gaze effect and its consequences in the interaction loop, and propose a new approach for displaying talking heads using a 3D projection surface (a physical model of a human head) as an alternative to the traditional flat surface projection. We investigate and compare the accuracy of the perception of gaze direction and the Mona Lisa gaze effect in 2D and 3D projection surfaces in a five subject gaze perception experiment. The experiment confirms that a 3Dprojection surface completely eliminates the Mona Lisa gaze effect and delivers very accurate gaze direction that is independent of the observer's viewing angle. Based on the data collected in this experiment, we rephrase the formulation of the Mona Lisa gaze effect. The data, when reinterpreted, confirms the predictions of the new model for both 2D and 3D projection surfaces. Finally, we discuss the requirements on different spatially interactive systems in terms of gaze direction, and propose new applications and experiments for interaction in a human-ECA and a human-robot settings made possible by this technology.

  • 40.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Analysis of gaze and speech patterns in three-party quiz game interaction2013In: Interspeech 2013, 2013, 1126-1130 p.Conference paper (Refereed)
    Abstract [en]

    In order to understand and model the dynamics between interaction phenomena such as gaze and speech in face-to-face multiparty interaction between humans, we need large quantities of reliable, objective data of such interactions. To date, this type of data is in short supply. We present a data collection setup using automated, objective techniques in which we capture the gaze and speech patterns of triads deeply engaged in a high-stakes quiz game. The resulting corpus consists of five one-hour recordings, and is unique in that it makes use of three state-of-the-art gaze trackers (one per subject) in combination with a state-of-theart conical microphone array designed to capture roundtable meetings. Several video channels are also included. In this paper we present the obstacles we encountered and the possibilities afforded by a synchronised, reliable combination of large-scale multi-party speech and gaze data, and an overview of the first analyses of the data. Index Terms: multimodal corpus, multiparty dialogue, gaze patterns, multiparty gaze.

  • 41.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Effects of 2D and 3D Displays on Turn-taking Behavior in Multiparty Human-Computer Dialog2011In: SemDial 2011: Proceedings of the 15th Workshop on the Semantics and Pragmatics of Dialogue / [ed] Ron Artstein, Mark Core, David DeVault, Kallirroi Georgila, Elsi Kaiser, Amanda Stent, Los Angeles, CA, 2011, 192-193 p.Conference paper (Refereed)
    Abstract [en]

    The perception of gaze from an animated agenton a 2D display has been shown to suffer fromthe Mona Lisa effect, which means that exclusive mutual gaze cannot be established if there is more than one observer. In this study, we investigate this effect when it comes to turntaking control in a multi-party human-computerdialog setting, where a 2D display is compared to a 3D projection. The results show that the 2D setting results in longer response times andlower turn-taking accuracy.

  • 42.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Turn-taking Control Using Gaze in Multiparty Human-Computer Dialogue: Effects of 2D and 3D Displays2011In: Proceedings of the International Conference on Audio-Visual Speech Processing 2011, Stockholm: KTH Royal Institute of Technology, 2011, 99-102 p.Conference paper (Refereed)
    Abstract [en]

    In a previous experiment we found that the perception of gazefrom an animated agent on a two-dimensional display suffersfrom the Mona Lisa effect, which means that exclusive mutual gaze cannot be established if there is more than one observer. By using a three-dimensional projection surface, this effect can be eliminated. In this study, we investigate whether this difference also holds for the turn-taking behaviour of subjects interacting with the animated agent in a multi-party dialogue. We present a Wizard-of-Oz experiment where five subjects talk toan animated agent in a route direction dialogue. The results show that the subjects to some extent can infer the intended target of the agent’s questions, in spite of the Mona Lisa effect, but that the accuracy of gaze when it comes to selecting an addressee is still significantly lower in the 2D condition, ascompared to the 3D condition. The response time is also significantly longer in the 2D condition, indicating that the inference of intended gaze may require additional cognitive efforts.

  • 43.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Lip-reading: Furhat audio visual intelligibility of a back projected animated face2012In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Berlin/Heidelberg, 2012, 196-203 p.Conference paper (Refereed)
    Abstract [en]

    Back projecting a computer animated face, onto a three dimensional static physical model of a face, is a promising technology that is gaining ground as a solution to building situated, flexible and human-like robot heads. In this paper, we first briefly describe Furhat, a back projected robot head built for the purpose of multimodal multiparty human-machine interaction, and its benefits over virtual characters and robotic heads; and then motivate the need to investigating the contribution to speech intelligibility Furhat's face offers. We present an audio-visual speech intelligibility experiment, in which 10 subjects listened to short sentences with degraded speech signal. The experiment compares the gain in intelligibility between lip reading a face visualized on a 2D screen compared to a 3D back-projected face and from different viewing angles. The results show that the audio-visual speech intelligibility holds when the avatar is projected onto a static face model (in the case of Furhat), and even, rather surprisingly, exceeds it. This means that despite the movement limitations back projected animated face models bring about; their audio visual speech intelligibility is equal, or even higher, compared to the same models shown on flat displays. At the end of the paper we discuss several hypotheses on how to interpret the results, and motivate future investigations to better explore the characteristics of visual speech perception 3D projected faces.

  • 44.
    Albertsson, Sarah
    et al.
    Linköping University, Department of Computer and Information Science, Human-Centered systems. Linköping University, Faculty of Science & Engineering. SICS East Swedish ICT AB, Linköping, Sweden.
    Rennes, Evelina
    Linköping University, Department of Computer and Information Science, Human-Centered systems. Linköping University, Faculty of Science & Engineering. SICS East Swedish ICT AB, Linköping, Sweden.
    Jönsson, Arne
    Linköping University, Department of Computer and Information Science, Human-Centered systems. Linköping University, Faculty of Arts and Sciences. SICS East Swedish ICT AB, Linköping, Sweden.
    Similarity-Based Alignment of Monolingual Corpora for Text Simplification2016In: CL4LC 2016 - Computational Linguistics for Linguistic Complexity: Proceedings of the Workshop, 2016, 154-163 p.Conference paper (Refereed)
    Abstract [en]

    Comparable or parallel corpora are beneficial for many NLP tasks.  The automatic collection of corpora enables large-scale resources, even for less-resourced languages, which in turn can be useful for deducing rules and patterns for text rewriting algorithms, a subtask of automatic text simplification. We present two methods for the alignment of Swedish easy-to-read text segments to text segments from a reference corpus.  The first method (M1) was originally developed for the task of text reuse detection, measuring sentence similarity by a modified version of a TF-IDF vector space model. A second method (M2), also accounting for part-of-speech tags, was devel- oped, and the methods were compared.  For evaluation, a crowdsourcing platform was built for human judgement data collection, and preliminary results showed that cosine similarity relates better to human ranks than the Dice coefficient. We also saw a tendency that including syntactic context to the TF-IDF vector space model is beneficial for this kind of paraphrase alignment task.

  • 45.
    Alemu Argaw, Atelach
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Asker, Lars
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
    Cöster, Rickard
    SICS.
    Karlgren, Jussi
    SICS.
    Sahlgren, Magnus
    SICS.
    Dictionary-based Amharic-French information retrieval2006In: Accessing multilingual information repositories: 6th workshop of the Cross-Language Evalution Forum, CLEF 2005, Vienna, Austria, 21-23 September, 2005, revised selected papers / [ed] Carol Peters, Fredric C. Gey, Julio Gonzalo, Henning Müller, Gareth J. F. Jones, Michael kluck, Bernardo Magnini, Maarten de Rijke, Berlin: Springer Berlin/Heidelberg, 2006, 83-92 p.Conference paper (Other academic)
    Abstract [en]

    We present four approaches to the Amharic - French bilingual track at CLEF 2005. All experiments use a dictionary based approach to translate the Amharic queries into French Bags-of-words, but while one approach uses word sense discrimination on the translated side of the queries, the other one includes all senses of a translated word in the query for searching. We used two search engines: The SICS experimental engine and Lucene, hence four runs with the two approaches. Non-content bearing words were removed both before and after the dictionary lookup. TF/IDF values supplemented by a heuristic function was used to remove the stop words from the Amharic queries and two French stopwords lists were used to remove them from the French translations. In our experiments, we found that the SICS search engine performs better than Lucene and that using the word sense discriminated keywords produce a slightly better result than the full set of non discriminated keywords.

  • 46. Alemu, Atelach
    et al.
    Hulth, Anette
    Megyesi, Beata
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. Datorlingvistik.
    General-Purpose Text Categorization Applied to the Medical Domain.2007Report (Other academic)
    Abstract [en]

    This paper presents work where a general-purpose text categorization method was applied to categorize medical free-texts. The purpose of the experiments was to examine how such a method performs without any domain-specific knowledge, hand-crafting or tuning. Additionally, we compare the results from the general-purpose method with results from runs in which a medical thesaurus as well as automatically extracted keywords were used when building the classifiers. We show that standard text categorization techniques using stemmed unigrams as the basis for learning can be applied directly to categorize medical reports, yielding an F-measure of 83.9, and outperforming the more sophisticated methods.

  • 47.
    Alexanderson, Simon
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Can Anybody Read Me? Motion Capture Recordings for an Adaptable Visual Speech Synthesizer2012In: In proceedings of The Listening Talker, Edinburgh, UK., 2012, 52-52 p.Conference paper (Refereed)
  • 48.
    Alexanderson, Simon
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Animated Lombard speech: Motion capture, facial animation and visual intelligibility of speech produced in adverse conditions2014In: Computer speech & language (Print), ISSN 0885-2308, E-ISSN 1095-8363, Vol. 28, no 2, 607-618 p.Article in journal (Refereed)
    Abstract [en]

    In this paper we study the production and perception of speech in diverse conditions for the purposes of accurate, flexible and highly intelligible talking face animation. We recorded audio, video and facial motion capture data of a talker uttering a,set of 180 short sentences, under three conditions: normal speech (in quiet), Lombard speech (in noise), and whispering. We then produced an animated 3D avatar with similar shape and appearance as the original talker and used an error minimization procedure to drive the animated version of the talker in a way that matched the original performance as closely as possible. In a perceptual intelligibility study with degraded audio we then compared the animated talker against the real talker and the audio alone, in terms of audio-visual word recognition rate across the three different production conditions. We found that the visual intelligibility of the animated talker was on par with the real talker for the Lombard and whisper conditions. In addition we created two incongruent conditions where normal speech audio was paired with animated Lombard speech or whispering. When compared to the congruent normal speech condition, Lombard animation yields a significant increase in intelligibility, despite the AV-incongruence. In a separate evaluation, we gathered subjective opinions on the different animations, and found that some degree of incongruence was generally accepted.

  • 49.
    Alexanderson, Simon
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Towards Fully Automated Motion Capture of Signs -- Development and Evaluation of a Key Word Signing Avatar2015In: ACM Transactions on Accessible Computing, ISSN 1936-7228, Vol. 7, no 2, 7:1-7:17 p.Article in journal (Refereed)
    Abstract [en]

    Motion capture of signs provides unique challenges in the field of multimodal data collection. The dense packaging of visual information requires high fidelity and high bandwidth of the captured data. Even though marker-based optical motion capture provides many desirable features such as high accuracy, global fitting, and the ability to record body and face simultaneously, it is not widely used to record finger motion, especially not for articulated and syntactic motion such as signs. Instead, most signing avatar projects use costly instrumented gloves, which require long calibration procedures. In this article, we evaluate the data quality obtained from optical motion capture of isolated signs from Swedish sign language with a large number of low-cost cameras. We also present a novel dual-sensor approach to combine the data with low-cost, five-sensor instrumented gloves to provide a recording method with low manual postprocessing. Finally, we evaluate the collected data and the dual-sensor approach as transferred to a highly stylized avatar. The application of the avatar is a game-based environment for training Key Word Signing (KWS) as augmented and alternative communication (AAC), intended for children with communication disabilities.

  • 50.
    Alexanderson, Simon
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Aspects of co-occurring syllables and head nods in spontaneous dialogue2013In: Proceedings of 12th International Conference on Auditory-Visual Speech Processing (AVSP2013), 2013, 169-172 p.Conference paper (Refereed)
    Abstract [en]

    This paper reports on the extraction and analysis of head nods taken from motion capture data of spontaneous dialogue in Swedish. The head nods were extracted automatically and then manually classified in terms of gestures having a beat function or multifunctional gestures. Prosodic features were extracted from syllables co-occurring with the beat gestures. While the peak rotation of the nod is on average aligned with the stressed syllable, the results show considerable variation in fine temporal synchronization. The syllables co-occurring with the gestures generally show greater intensity, higher F0, and greater F0 range when compared to the mean across the entire dialogue. A functional analysis shows that the majority of the syllables belong to words bearing a focal accent.

1234567 1 - 50 of 1504
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf