Change search
ReferencesLink to record
Permanent link

Direct link
Supporting the Exploration of a Corpus of 17th-Century Scholarly Correspondences by Topic Modeling.
University of Borås, Swedish School of Library and Information Science.
2011 (English)Conference paper (Refereed)
Abstract [en]

This paper deals with the application of topic modeling to a corpus of 17th-century scholarly correspondences built up by the CKCC project. The topic modeling approaches considered are latent Dirichlet allocation (LDA), latent semantic analysis (LSA), and random indexing (RI). After describing the corpus and the topic modeling approaches, we present an experiment for the quantitative evaluation of the performance of the various topic modeling approaches in reproducing human-labeled words in a subset of the corpus. In our experiments random indexing shows the best performance, with scope for further improvement. Next we discuss the role of topic modeling in the CKCC Epistolarium, the virtual research environment that is being developed for exploring and analysing the CKCC corpus. The key feature of topic modeling is its ability to calculate similarities between words and texts. In an example we illustrate how such an approach may yield results that transcend a regular text search.

Place, publisher, year, edition, pages
University of Copenhagen , 2011.
Keyword [en]
topic modeling, latent semantic indexing, random projection
Keyword [sv]
text mining
National Category
Computer and Information Science Language Technology (Computational Linguistics)
Research subject
Library and Information Science
URN: urn:nbn:se:hb:diva-6661Local ID: 2320/9689OAI: diva2:887360
SDH 2011 Supporting Digital Humanities: Answering the unaskable
Available from: 2015-12-22 Created: 2015-12-22

Open Access in DiVA

fulltext(212 kB)19 downloads
File information
File name FULLTEXT01.pdfFile size 212 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Wittek, Peter
By organisation
Swedish School of Library and Information Science
Computer and Information ScienceLanguage Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 19 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 24 hits
ReferencesLink to record
Permanent link

Direct link