Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Multi-domain alias matching using machine learning
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science. (Security)
Swedish Def Res Agcy FOI, Stockholm, Sweden..
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. Swedish Def Res Agcy FOI, Stockholm, Sweden.. (Security)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. (Security)
2016 (English)In: Proc. 3rd European Network Intelligence Conference, IEEE, 2016, p. 77-84Conference paper, Published paper (Refereed)
Abstract [en]

We describe a methodology for linking aliases belonging to the same individual based on a user's writing style (stylometric features extracted from the user generated content) and her time patterns (time-based features extracted from the publishing times of the user generated content). While most previous research on social media identity linkage relies on matching usernames, our methodology can also be used for users who actively try to choose dissimilar usernames when creating their aliases. In our experiments on a discussion forum dataset and a Twitter dataset, we evaluate the performance of three different classifiers. We use the best classifier (AdaBoost) to evaluate how well it works on different datasets using different features. Experiments show that combining stylometric and time based features yield good results on our synthetic datasets and a small-scale evaluation on real-world blog data confirm these results, yielding a precision over 95%. The use of emotion-related and Twitter-related features yield no significant impact on the results.

Place, publisher, year, edition, pages
IEEE, 2016. p. 77-84
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:uu:diva-306944DOI: 10.1109/ENIC.2016.019ISI: 000399097600011ISBN: 9781509034550 (electronic)OAI: oai:DiVA.org:uu-306944DiVA, id: diva2:1044820
Conference
ENIC 2016, September 5–7, Wroclaw, Poland
Available from: 2017-02-02 Created: 2016-11-07 Last updated: 2019-03-22Bibliographically approved
In thesis
1. Techniques for analyzing digital environments from a security perspective
Open this publication in new window or tab >>Techniques for analyzing digital environments from a security perspective
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The development of the Internet and social media has exploded in the last couple of years. Digital environments such as social media and discussion forums provide an effective method of communication and are used by various groups in our societies.  For example, violent extremist groups use social media platforms for recruiting, training, and communicating with their followers, supporters, and donors. Analyzing social media is an important task for law enforcement agencies in order to detect activity and individuals that might pose a threat towards the security of the society.

In this thesis, a set of different technologies that can be used to analyze digital environments from a security perspective are presented. Due to the nature of the problems that are studied, the research is interdisciplinary, and knowledge from terrorism research, psychology, and computer science are required. The research is divided into three different themes. Each theme summarizes the research that has been done in a specific area.

The first theme focuses on analyzing digital environments and phenomena. The theme consists of three different studies. The first study is about the possibilities to detect propaganda from the Islamic State on Twitter.  The second study focuses on identifying references to a narrative containing xenophobic and conspiratorial stereotypes in alternative immigration critic media. In the third study, we have defined a set of linguistic features that we view as markers of a radicalization.

A group consists of a set of individuals, and in some cases, individuals might be a threat towards the security of the society.  The second theme focuses on the risk assessment of individuals based on their written communication. We use different technologies including machine learning to experiment the possibilities to detect potential lone offenders.  Our risk assessment approach is implemented in the tool PRAT (Profile Risk Assessment Tool).

Internet users have the ability to use different aliases when they communicate since it offers a degree of anonymity. In the third theme, we present a set of techniques that can be used to identify users with multiple aliases. Our research focuses on solving two different problems: author identification and alias matching. The technologies that we use are based on the idea that each author has a fairly unique writing style and that we can construct a writeprint that represents the author. In a similar manner,  we also use information about when a user communicates to create a timeprint. By combining the writeprint and the timeprint, we can obtain a set of powerful features that can be used to identify users with multiple aliases.

To ensure that the technologies can be used in real scenarios, we have implemented and tested the techniques on data from social media. Several of the results are promising, but more studies are needed to determine how well they work in reality.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2019. p. 64
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1786
Keywords
digital communities, machine learning, text analysis, linguistic features, linguistic analysis, warning behaviors, Internet, social media, extremism, terrorism, psychological state, author identification, alias matching
National Category
Engineering and Technology
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-379605 (URN)978-91-513-0605-6 (ISBN)
Public defence
2019-05-17, Room 2446, ITC, Lägerhyddsvägen 2, Uppsala, 10:15 (English)
Opponent
Supervisors
Available from: 2019-04-24 Created: 2019-03-22 Last updated: 2019-06-18

Open Access in DiVA

fulltext(256 kB)153 downloads
File information
File name FULLTEXT01.pdfFile size 256 kBChecksum SHA-512
26272dd0de610b7e6f894db092d90d80b1165344d798bee55972f184e9b36e3ff0e33e42ee5cacfa6fea78246fb309b17f0e4e5f3964012964f06eef38423547
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Ashcroft, MichaelKaati, LisaShrestha, Amendra
By organisation
Computing ScienceComputer Systems
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 153 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 875 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf