Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Approaches to gathering realistic training data for speech translation systems
Telia Research AB, Haninge, SWEDEN.ORCID iD: 0000-0003-3734-0757
Telia Research AB, Haninge, SWEDEN.
Telia Research AB, Haninge, SWEDEN.
1996 (English)In: Proceedings of Third IEEE Workshop on Interactive Voice Technology for Telecommunications Applications, 1996, Institute of Electrical and Electronics Engineers (IEEE), 1996, p. 97-100Conference paper, Published paper (Refereed)
Abstract [en]

The Spoken Language Translator (SLT) is a multi-lingual speech-to-speech translation prototype supporting English, Swedish and French within the air traffic information system (ATIS) domain. The design of SLT is characterized by a strongly corpus-driven approach, which accentuates the need for cost-efficient collection procedures to obtain training data. This paper discusses various approaches to the data collection issue pursued within a speech translation framework. Original American English speech and language data have been collected using traditional Wizard-of-Oz (WOZ) techniques, a relatively costly procedure yielding high-quality results. The resulting corpus has been translated textually into Swedish by a large number of native speakers (427) and used as prompts for training the target language speech model. This ᅵbudgetᅵ collection method is compared to the accepted method, i.e., gathering data by means of a full-blown WOZ simulation. The results indicate that although translation in this case proved economical and produced considerable data, the method is not sensitive to certain features typical of spoken language, for which WOZ is superior

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 1996. p. 97-100
Keywords [en]
air traffic control;language translation;natural language interfaces;speech recognition;English;French;Spoken Language Translator;Swedish;Wizard-of-Oz techniques;air traffic information system;corpus-driven approach;cost-efficient collection procedures;data collection;multi-lingual speech-to-speech translation prototype;realistic training data gathering;speech translation systems;Costs;Frequency;Humans;Information systems;Large-scale systems;Natural languages;Prototypes;Speech recognition;Training data;Vocabulary
National Category
General Language Studies and Linguistics Specific Languages Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:liu:diva-135316DOI: 10.1109/IVTTA.1996.552770ISBN: 0780332385 (print)OAI: oai:DiVA.org:liu-135316DiVA, id: diva2:1080633
Conference
Third IEEE Workshop on Interactive Voice Technology for Telecommunications Applications, 30 Sept.-1 Oct. 1996, Basking Bridge, NJ, USA
Available from: 2017-03-10 Created: 2017-03-10 Last updated: 2018-01-13Bibliographically approved

Open Access in DiVA

Approaches to gathering realistic training data for speech translation systems(805 kB)32 downloads
File information
File name FULLTEXT01.pdfFile size 805 kBChecksum SHA-512
135e89300f9f3b6925459e60542f14e33ff77a2f7ff762c70b97b3b70ec1ff7d3b3f827968b7097b07a7ad29a69aca9b2b8eedfa037fded2d54681c608c31b2a
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Bretan, IvanEklund, Robert
General Language Studies and LinguisticsSpecific LanguagesLanguage Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 32 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 50 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf