Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Visually guided extraction of prevalent topics
Linnaeus University.
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.ORCID iD: 0000-0001-6745-4398
Linköping University.
Linnaeus University.
2025 (English)In: Information Visualization, ISSN 1473-8716, E-ISSN 1473-8724, Vol. 24, no 2, p. 179-198Article in journal (Refereed) Published
Abstract [en]

The sensemaking process of large sets of text documents is highly challenging for tasks such as obtaining a comprehensive overview or keeping up with the most important trends and topics. Even though several established methods for condensation and summarization of large text corpora exist, many of them lack the ability to account for difference in prevalence between identified topics, which in turn impedes quantitative analysis. In this paper, we therefore propose a novel prevalence-aware method for topic extraction, and show how it can be used to obtain important insights from two text corpora with very different content. We also implemented a prototype visual analytics tool which guides the user in the search for relevant insights and promotes trust in the yielded results. We have verified our application by a user study, as well as by a validation run on a data set with previously known topic structure. The results clearly show that our approach is suitable for text mining, that it can be used by non-experts, and that it offers features which makes it an interesting candidate for use in several different analysis scenarios. 

Place, publisher, year, edition, pages
Sage Publications, 2025. Vol. 24, no 2, p. 179-198
Keywords [en]
similarity calculations, text embedding, text mining, topic modeling, Visual analytics, Embeddings, Sense making, Similarity calculation, Text corpora, Text document, Text-mining, Topic extraction
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:bth-27447DOI: 10.1177/14738716241312400ISI: 001408697200001Scopus ID: 2-s2.0-105001067590OAI: oai:DiVA.org:bth-27447DiVA, id: diva2:1936281
Projects
Rekrytering 21
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile CommunicationsKnowledge Foundation, 20210077Available from: 2025-02-10 Created: 2025-02-10 Last updated: 2025-04-04Bibliographically approved

Open Access in DiVA

fulltext(4278 kB)50 downloads
File information
File name FULLTEXT01.pdfFile size 4278 kBChecksum SHA-512
b51ebc8b6dbb8c873e4bb7a4e30ba00c29ec34c5a7123e55d33341ab686d8be92b334f6b50e5e06db251e9ea2a75abbde1f3934e4ceb8548c95d300f45ecf06e
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Jusufi, Ilir
By organisation
Department of Computer Science
In the same journal
Information Visualization
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 50 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 1429 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf