Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Parsing the Past - Identification of Verb Constructions in Historical Text
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. (computational linguistics)
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. (computational linguistics)
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. (computational linguistics)
2012 (English)In: Language Technology for Cultural Heritage, Social Sciences, and Humanities, 2012Conference paper, Published paper (Refereed)
Abstract [en]

Even though NLP tools are widely used for contemporary text today, there is a lack of tools that can handle historical documents. Such tools could greatly facilitate the work of researchers dealing with large volumes of historical texts. In this paper we pro- pose a method for extracting verbs and their complements from historical Swedish text, using NLP tools and dictionaries developed for contemporary Swedish and a set of nor- malisation rules that are applied before tag- ging and parsing the text. When evaluated on a sample of texts from the period 1550– 1880, this method identifies verbs with an F-score of 77.2% and finds a partially or completely correct set of complements for 55.6% of the verbs. Although these re- sults are in general lower than for contem- porary Swedish, they are strong enough to make the approach useful for information extraction in historical research. Moreover, the exact match rate for complete verb con- structions is in fact higher for historical texts than for contemporary texts (38.7% vs. 30.8%). 

Place, publisher, year, edition, pages
2012.
Keyword [en]
automatic processing of historical texts
National Category
Humanities
Research subject
Computational Linguistics
Identifiers
URN: urn:nbn:se:uu:diva-189427OAI: oai:DiVA.org:uu-189427DiVA: diva2:581397
Conference
6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH 2012) in conjunction with the Thirteenth Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), Apr 23–27, 2012, Avignon, France
Available from: 2013-01-09 Created: 2013-01-01 Last updated: 2017-01-25Bibliographically approved

Open Access in DiVA

fulltext(208 kB)151 downloads
File information
File name FULLTEXT01.pdfFile size 208 kBChecksum SHA-512
847354de3f35d6383c8bf0f2c941a1c6058bc286f987b9b8dce79cc66dc7dd1ce86d2c6301b4a131d89025b1c428d6362200cf9c52f8184d2812e4629e5f9cbf
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Pettersson, EvaMegyesi, BeataNivre, Joakim
By organisation
Department of Linguistics and Philology
Humanities

Search outside of DiVA

GoogleGoogle Scholar
Total: 151 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 434 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf