Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
What Do Language Representations Really Represent?
Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.ORCID iD: 0000-0002-6027-4156
Show others and affiliations
2019 (English)In: Computational linguistics - Association for Computational Linguistics (Print), ISSN 0891-2017, E-ISSN 1530-9312, Vol. 45, no 2, p. 381-389Article in journal (Refereed) Published
Abstract [en]

A neural language model trained on a text corpus can be used to induce distributed representations of words, such that similar words end up with similar representations. If the corpus is multilingual, the same model can be used to learn distributed representations of languages, such that similar languages end up with similar representations. We show that this holds even when the multilingual corpus has been translated into English, by picking up the faint signal left by the source languages. However, just as it is a thorny problem to separate semantic from syntactic similarity in word representations, it is not obvious what type of similarity is captured by language representations. We investigate correlations and causal relationships between language representations learned from translations on one hand, and genetic, geographical, and several levels of structural similarity between languages on the other. Of these, structural similarity is found to correlate most strongly with language representation similarity, whereas genetic relationships—a convenient benchmark used for evaluation in previous work—appears to be a confounding factor. Apart from implications about translation effects, we see this more generally as a case where NLP and linguistic typology can interact and benefit one another.

Place, publisher, year, edition, pages
2019. Vol. 45, no 2, p. 381-389
Keywords [en]
representation learning, language representations, linguistic typology
National Category
Language Technology (Computational Linguistics)
Research subject
Computational Linguistics
Identifiers
URN: urn:nbn:se:su:diva-169784DOI: 10.1162/coli_a_00351OAI: oai:DiVA.org:su-169784DiVA, id: diva2:1325807
Available from: 2019-06-17 Created: 2019-06-17 Last updated: 2019-08-09Bibliographically approved

Open Access in DiVA

fulltext(1149 kB)63 downloads
File information
File name FULLTEXT01.pdfFile size 1149 kBChecksum SHA-512
58d8baafb087b36be78f030594dfdbc2cb4184741e2c9fba54d2d0400976eebcdd917ef6364a0e996c4aa3f34a24487a2edc4ea21c5409a89fdeef0137025d52
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Bjerva, JohannesÖstling, RobertTiedemann, JörgAugenstein, Isabelle
By organisation
Computational Linguistics
In the same journal
Computational linguistics - Association for Computational Linguistics (Print)
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 63 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 164 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf