Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
EXIST: sEXism Identification in Social Networks (CLEF 2024)
Halmstad University, School of Information Technology.
Halmstad University, School of Information Technology.
2025 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

 Online harassment, especially gender-based abuse, is a serious global problem that affects women and gender minorities more than others. Reports from groups such as UN Women show high levels of online violence against these groups. For example, one in ten women in the European Union and 60 percent of female internet users in the Arab States face cyberbullying and online abuse. Taking on this issue is urgent, but detecting sexism online is not easy. It can appear in obvious ways, like rude comments, or in subtle ways, such as reinforcing gender stereotypes. Our work leverages the newly released BERT models, known for their state-of-the-art natural language processing capabilities, to address the challenges of detecting sexist comments on social networks, particularly on Twitter. To enhance the effectiveness of these models, we develop and integrate custom algorithms that implement hard-labeling strategies. These strategies assign definitive categorizations to ambiguous or subtly sexist content, providing clearer decision boundaries and enabling BERT models to tackle nuanced forms of online sexism. Along with hard labeling, we add knowledge injection with sexist libraries, data augmentation, data concatenation, with older datasets, freezing certain layers in BERT to retain pre-trained knowledge while fine-tuning, and Optuna for hyperparameter tuning. This combination of advanced pre-trained models with tailored algorithms aims to create an efficient system for correctly detecting and categorizing sexist content. Unlike previous research that has explored similar questions using different models, our study fills a critical knowledge gap by integrating newly released BERT models with innovative hard-labeling techniques. 

Our approach begins with determining whether a comment is sexist, as this initial classification is essential for further categorization, such as analyzing the author’s intent or the specific nature of the sexism. By evaluating the results of this novel approach, we aim to provide more effective solutions to combat online sexism, contributing to a safer and more equitable digital environment.

Place, publisher, year, edition, pages
2025. , p. 110
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:hh:diva-55655OAI: oai:DiVA.org:hh-55655DiVA, id: diva2:1946008
Educational program
Master's Programme in Embedded and Intelligent Systems, 120 credits; Master's Programme in Information Technology, 120 credits
Presentation
2025-02-24, 14:00 (English)
Supervisors
Examiners
Available from: 2025-03-21 Created: 2025-03-20 Last updated: 2025-03-25Bibliographically approved

Open Access in DiVA

fulltext(1289 kB)44 downloads
File information
File name FULLTEXT02.pdfFile size 1289 kBChecksum SHA-512
64ea3a98811d07c228e11175111e8020e2c647d28ddadb21e496c235f0c1c78ecc025dd5947eb05e5d8de1134322a44212865a7456430ac44d14e6a96020fd09
Type fulltextMimetype application/pdf

By organisation
School of Information Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 44 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 272 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf