Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Towards a language independent Twitter bot detector
DISA. (DISA-DH)ORCID iD: 0000-0001-9775-4594
Linnaeus University, Faculty of Technology, Department of Mathematics. DISA. (DISA-DH)ORCID iD: 0000-0002-0510-6782
University of Eastern Finland, Finland. (DISA-DH)ORCID iD: 0000-0003-3123-6932
2019 (English)In: Proceedings of 4th Conference of The Association Digital Humanities in the Nordic Countries: Copenhagen, March 6-8 2019 / [ed] Navarretta Costanza et al., Copenhagen: Digital Humanities in the Nordic countries , 2019Conference paper, Published paper (Refereed)
Abstract [en]

This article describes our work in developing an application that recognizes automatically generated tweets. The objective of this machine learning application is to increase data accuracy in sociolinguistic studies that utilize Twitter by reducing skewed sampling and inaccuracies in linguistic data. Most previous machine learning attempts to exclude bot material have been language dependent since they make use of monolingual Twitter text in their training phase. In this paper, we present a language independent approach which classifies each single tweet to be either autogenerated (AGT) or human-generated (HGT). We define an AGT as a tweet where all or parts of the natural language content is generated automatically by a bot or other type of program. In other words, while AGT/HGT refer to an individual message, the term bot refers to non-personal and automated accounts that post content to online social networks. Our approach classifies a tweet using only metadata that comes with every tweet, and we utilize those metadata parameters that are both language and country independent. The empirical part shows good success rates. Using a bilingual training set of Finnish and Swedish tweets, we correctly classified about 98.2% of all tweets in a test set using a third language (English).

Place, publisher, year, edition, pages
Copenhagen: Digital Humanities in the Nordic countries , 2019.
Keywords [en]
Twitter, bots, bot detection, supervised machine learning
National Category
General Language Studies and Linguistics Computer and Information Sciences
Research subject
Humanities, English; Computer and Information Sciences Computer Science, Computer Science
Identifiers
URN: urn:nbn:se:lnu:diva-81663OAI: oai:DiVA.org:lnu-81663DiVA, id: diva2:1302270
Conference
4th Conference of The Association Digital Humanities in the Nordic Countries, Copenhagen, March 6-8 2019
Available from: 2019-04-04 Created: 2019-04-04 Last updated: 2019-04-29Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Fulltext

Search in DiVA

By author/editor
Lundberg, JonasNordqvist, JonasLaitinen, Mikko
By organisation
Department of Mathematics
General Language Studies and LinguisticsComputer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 191 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf