Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Techniques for analyzing digital environments from a security perspective
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems.ORCID iD: 0000-0001-6553-4319
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The development of the Internet and social media has exploded in the last couple of years. Digital environments such as social media and discussion forums provide an effective method of communication and are used by various groups in our societies.  For example, violent extremist groups use social media platforms for recruiting, training, and communicating with their followers, supporters, and donors. Analyzing social media is an important task for law enforcement agencies in order to detect activity and individuals that might pose a threat towards the security of the society.

In this thesis, a set of different technologies that can be used to analyze digital environments from a security perspective are presented. Due to the nature of the problems that are studied, the research is interdisciplinary, and knowledge from terrorism research, psychology, and computer science are required. The research is divided into three different themes. Each theme summarizes the research that has been done in a specific area.

The first theme focuses on analyzing digital environments and phenomena. The theme consists of three different studies. The first study is about the possibilities to detect propaganda from the Islamic State on Twitter.  The second study focuses on identifying references to a narrative containing xenophobic and conspiratorial stereotypes in alternative immigration critic media. In the third study, we have defined a set of linguistic features that we view as markers of a radicalization.

A group consists of a set of individuals, and in some cases, individuals might be a threat towards the security of the society.  The second theme focuses on the risk assessment of individuals based on their written communication. We use different technologies including machine learning to experiment the possibilities to detect potential lone offenders.  Our risk assessment approach is implemented in the tool PRAT (Profile Risk Assessment Tool).

Internet users have the ability to use different aliases when they communicate since it offers a degree of anonymity. In the third theme, we present a set of techniques that can be used to identify users with multiple aliases. Our research focuses on solving two different problems: author identification and alias matching. The technologies that we use are based on the idea that each author has a fairly unique writing style and that we can construct a writeprint that represents the author. In a similar manner,  we also use information about when a user communicates to create a timeprint. By combining the writeprint and the timeprint, we can obtain a set of powerful features that can be used to identify users with multiple aliases.

To ensure that the technologies can be used in real scenarios, we have implemented and tested the techniques on data from social media. Several of the results are promising, but more studies are needed to determine how well they work in reality.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2019. , p. 64
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1786
Keywords [en]
digital communities, machine learning, text analysis, linguistic features, linguistic analysis, warning behaviors, Internet, social media, extremism, terrorism, psychological state, author identification, alias matching
National Category
Engineering and Technology
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:uu:diva-379605ISBN: 978-91-513-0605-6 (print)OAI: oai:DiVA.org:uu-379605DiVA, id: diva2:1298150
Public defence
2019-05-17, Room 2446, ITC, Lägerhyddsvägen 2, Uppsala, 10:15 (English)
Opponent
Supervisors
Available from: 2019-04-24 Created: 2019-03-22 Last updated: 2019-06-18
List of papers
1. A Machine Learning Approach Towards Detecting Extreme Adopters in Digital Communities
Open this publication in new window or tab >>A Machine Learning Approach Towards Detecting Extreme Adopters in Digital Communities
2017 (English)In: 2017 28th International Workshop on Database and Expert Systems Applications (DEXA) / [ed] Tjoa, AM Wagner, RR, IEEE, 2017, p. 1-5Conference paper, Published paper (Other academic)
Abstract [en]

In this study we try to identify extreme adopters on a discussion forum using machine learning. An extreme adopter is a user that has adopted a high level of a community-specific jargon and therefore can be seen as a user that has a high degree of identification with the community. The dataset that we consider consists of a Swedish xenophobic discussion forum where we use a machine learning approach to identify extreme adopters using a number of linguistic features that are independent on the dataset and the community. The results indicates that it is possible to separate these extreme adopters from the rest of the discussants on the discussion forum with more than 80% accuracy. Since the linguistic features that we use are highly domain independent, the results indicates that there is a possibility to use this kind of techniques to identify extreme adopters within other communities as well.

Place, publisher, year, edition, pages
IEEE, 2017
Series
International Workshop on Database and Expert Systems Applications-DEXA, ISSN 1529-4188
Keywords
Discussion forums, Support vector machines, Pragmatics, Manuals, Radio frequency, Electronic mail, Social network services
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-351187 (URN)10.1109/DEXA.2017.17 (DOI)000426078300001 ()978-1-5386-1051-0 (ISBN)
Conference
28th International Workshop on Database and Expert Systems Applications (DEXA), AUG 28-31, 2017, Lyon3 Univ, Lyon, FRANCE
Available from: 2018-05-23 Created: 2018-05-23 Last updated: 2019-03-22Bibliographically approved
2. Identifying warning behaviors of violent lone offenders in written communication
Open this publication in new window or tab >>Identifying warning behaviors of violent lone offenders in written communication
2016 (English)In: Proc. 16th ICDM Workshops, IEEE Computer Society, 2016, p. 1053-1060Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
IEEE Computer Society, 2016
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-306943 (URN)10.1109/ICDMW.2016.0152 (DOI)978-1-5090-5910-2 (ISBN)
Conference
ICDM Workshop on Social Media and Risk, SOMERIS 2016, December 12, Barcelona, Spain
Available from: 2017-02-02 Created: 2016-11-07 Last updated: 2019-03-22Bibliographically approved
3. Automatic detection of xenophobic narratives: A case study on Swedish alternative media
Open this publication in new window or tab >>Automatic detection of xenophobic narratives: A case study on Swedish alternative media
2016 (English)In: Proc. 14th International Conference on Intelligence and Security Informatics, IEEE, 2016, p. 121-126Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
IEEE, 2016
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-306903 (URN)10.1109/ISI.2016.7745454 (DOI)000390129600021 ()978-1-5090-3865-7 (ISBN)
Conference
ISI 2016, September 28–30, Tucson, AZ
Available from: 2016-11-17 Created: 2016-11-04 Last updated: 2019-03-22Bibliographically approved
4. Linguistic analysis of lone offender manifestos
Open this publication in new window or tab >>Linguistic analysis of lone offender manifestos
2016 (English)In: Proc. 4th International Conference on Cybercrime and Computer Forensics, IEEE, 2016Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
IEEE, 2016
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-306941 (URN)10.1109/ICCCF.2016.7740427 (DOI)000390123800007 ()978-1-5090-6096-2 (ISBN)
Conference
ICCCF 2016, June 12–14, Vancouver, Canada
Available from: 2016-11-17 Created: 2016-11-07 Last updated: 2019-03-22Bibliographically approved
5. Detecting multipliers of jihadism on twitter
Open this publication in new window or tab >>Detecting multipliers of jihadism on twitter
2015 (English)In: Proc. 15th ICDM Workshops, IEEE Computer Society, 2015, p. 954-960Conference paper, Published paper (Refereed)
Abstract [en]

Detecting terrorist related content on social media is a problem for law enforcement agency due to the large amount of information that is available. In this paper we describe a first step towards automatically classifying twitter user accounts (tweeps) as supporters of jihadist groups who disseminate propaganda content online. We use a machine learning approach with two set of features: data dependent features and data independent features. The data dependent features are features that are heavily influenced by the specific dataset while the data independent features are independent of the dataset and that can be used on other datasets with similar result. By using this approach we hope that our method can be used as a baseline to classify violent extremist content from different kind of sources since data dependent features from various domains can be added.

Place, publisher, year, edition, pages
IEEE Computer Society, 2015
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-272243 (URN)10.1109/ICDMW.2015.9 (DOI)000380556700127 ()9781467384926 (ISBN)
External cooperation:
Conference
ICDM Workshop on Intelligence and Security Informatics, ISI-ICDM 2015, November 14, Atlantic City, NJ
Available from: 2015-11-14 Created: 2016-01-12 Last updated: 2019-03-22Bibliographically approved
6. Detecting multiple aliases in social media
Open this publication in new window or tab >>Detecting multiple aliases in social media
2013 (English)In: Proc. 5th International Conference on Advances in Social Networks Analysis and Mining, New York: ACM Press, 2013, p. 1004-1011Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
New York: ACM Press, 2013
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-216568 (URN)10.1145/2492517.2500261 (DOI)978-1-4503-2240-9 (ISBN)
Conference
ASONAM 2013, August 25-29, Niagara Falls, Canada
Funder
Vinnova
Available from: 2013-08-29 Created: 2014-01-23 Last updated: 2019-03-22Bibliographically approved
7. Timeprints for identifying social media users with multiple aliases
Open this publication in new window or tab >>Timeprints for identifying social media users with multiple aliases
2015 (English)In: Security Informatics, ISSN 2190-8532, Vol. 4, p. 7:1-11, article id 7Article in journal (Refereed) Published
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-272242 (URN)10.1186/s13388-015-0022-z (DOI)
Available from: 2015-09-24 Created: 2016-01-12 Last updated: 2019-03-22Bibliographically approved
8. Multi-domain alias matching using machine learning
Open this publication in new window or tab >>Multi-domain alias matching using machine learning
2016 (English)In: Proc. 3rd European Network Intelligence Conference, IEEE, 2016, p. 77-84Conference paper, Published paper (Refereed)
Abstract [en]

We describe a methodology for linking aliases belonging to the same individual based on a user's writing style (stylometric features extracted from the user generated content) and her time patterns (time-based features extracted from the publishing times of the user generated content). While most previous research on social media identity linkage relies on matching usernames, our methodology can also be used for users who actively try to choose dissimilar usernames when creating their aliases. In our experiments on a discussion forum dataset and a Twitter dataset, we evaluate the performance of three different classifiers. We use the best classifier (AdaBoost) to evaluate how well it works on different datasets using different features. Experiments show that combining stylometric and time based features yield good results on our synthetic datasets and a small-scale evaluation on real-world blog data confirm these results, yielding a precision over 95%. The use of emotion-related and Twitter-related features yield no significant impact on the results.

Place, publisher, year, edition, pages
IEEE, 2016
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:uu:diva-306944 (URN)10.1109/ENIC.2016.019 (DOI)000399097600011 ()9781509034550 (ISBN)
Conference
ENIC 2016, September 5–7, Wroclaw, Poland
Available from: 2017-02-02 Created: 2016-11-07 Last updated: 2019-03-22Bibliographically approved
9. Assessment of risk in written communication: Introducing the Profile Risk Assessment Tool (PRAT)
Open this publication in new window or tab >>Assessment of risk in written communication: Introducing the Profile Risk Assessment Tool (PRAT)
Show others...
2018 (English)Report (Other academic)
Place, publisher, year, edition, pages
Belgium: EUROPOL, 2018. p. 24
National Category
Engineering and Technology
Identifiers
urn:nbn:se:uu:diva-367346 (URN)
Note

This paper was presented at the 2nd European Counter-Terrorism Centre (ECTC) Advisory Groupconference, 17-18 April 2018, at Europol Headquarters, The Hague.

Available from: 2018-11-30 Created: 2018-11-30 Last updated: 2019-03-22Bibliographically approved
10. Linguistic markers of a radicalized mind-set among extreme adopters
Open this publication in new window or tab >>Linguistic markers of a radicalized mind-set among extreme adopters
2017 (English)In: Proc. 10th ACM International Conference on Web Search and Data Mining, New York: ACM Press, 2017, p. 823-824Conference paper, Published paper (Refereed)
Abstract [en]

The words that we use when communicating in social media can reveal how we relate to ourselves and to others. For instance, within many online communities, the degree of adaptation to a community-specific jargon can serve as a marker of identification with the community. In this paper we single out a group of so called extreme adopters of community-specific jargon from the whole group of users of a Swedish discussion forum devoted to the topics immigration and integration. The forum is characterized by a certain xenophobic jargon, and we hypothesize that extreme adopters of this jargon also exhibit certain linguistic features that we view as markers of a radicalized mind-set. We use a Swedish translation of LIWC (linguistic inquiry word count) and find that the group of extreme adopters differs significantly from the whole group of forum users regarding six out of seven linguistic markers of a radicalized mind-set.

Place, publisher, year, edition, pages
New York: ACM Press, 2017
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-379919 (URN)10.1145/3018661.3022760 (DOI)978-1-4503-4675-7 (ISBN)
Conference
WSDM 2017, 1st International Workshop on Cyber Deviance Detection
Available from: 2017-02-02 Created: 2019-03-21 Last updated: 2019-04-08Bibliographically approved

Open Access in DiVA

fulltext(584 kB)48 downloads
File information
File name FULLTEXT01.pdfFile size 584 kBChecksum SHA-512
5f98ca3cfb1323530bb324f1b9ee4aeb6c0b2552ffacb4e4509b6d9871b89820dd85bf8765d31ccee0ccecc58af8da30e94df5b4e750cd5e3385271c4cbd1286
Type fulltextMimetype application/pdf
Buy this publication >>

Search in DiVA

By author/editor
Shrestha, Amendra
By organisation
Computer SystemsDivision of Computer Systems
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 48 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 182 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf