Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Using computational approaches to integrate endangered language legacy data into documentation corpora: Past experiences and challenges ahead
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Modern Languages.
Institute for the Languages of Finland.
University of Freiburg.
University of Freiburg.
2019 (English)In: Proceedings of the Workshop on Computational Methods for Endangered Languages: Honolulu, Hawai’i, USA, February 26–27, 2019 / [ed] Arppe, Antti et al., Boulder, Colorado: University of Colorado , 2019, Vol. 2, p. 24-30, article id 5Conference paper, Published paper (Refereed)
Abstract [en]

The systematic integration of pre-digital published transcriptions of legacy language materials offers many possibilities to enrich documentary corpora with data that is often very comparable to contemporary collections, and often originating from the same speech communities researchers currently work with. Especially recent advances in text recognition technologies make the reuse of old materials a very attractive and accessible task. However, the output of text recognition needs to be connected to further parts of the pipeline, namely forced alignment and speech recognition. The workflows discussed here attempt to reach a maximally useful situation where legacy data is transformed into a usable and comparable format, but not yet transformed into a time aligned corpus.

Place, publisher, year, edition, pages
Boulder, Colorado: University of Colorado , 2019. Vol. 2, p. 24-30, article id 5
Keywords [en]
Zyrian Komi, endangered languages, computational linguistics, documentary linguistics
National Category
Specific Languages
Research subject
Computational Linguistics
Identifiers
URN: urn:nbn:se:uu:diva-378412OAI: oai:DiVA.org:uu-378412DiVA, id: diva2:1293959
Conference
Workshop on Computational Methods for Endangered Languages, Honolulu, Hawai’i, USA, February 26–27, 2019
Available from: 2019-03-06 Created: 2019-03-06 Last updated: 2019-03-06Bibliographically approved

Open Access in DiVA

fulltext(207 kB)24 downloads
File information
File name FULLTEXT01.pdfFile size 207 kBChecksum SHA-512
1e7aa73cd3058766e95b243bfb868b064c263353d90fcc73e38dcb05816f1b0a770288de3aa611ae997c2cfa503ec44c8f376c25ccd8d03a3fa640a5d1705926
Type fulltextMimetype application/pdf

Other links

https://scholar.colorado.edu/scil-cmel/vol2/iss1/5

Search in DiVA

By author/editor
Blokland, Rogier
By organisation
Department of Modern Languages
Specific Languages

Search outside of DiVA

GoogleGoogle Scholar
Total: 24 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 28 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf