EXIST: sEXism Identification in Social Networks (CLEF 2024)
2025 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Student thesis
Abstract [en]
Online harassment, especially gender-based abuse, is a serious global problem that affects women and gender minorities more than others. Reports from groups such as UN Women show high levels of online violence against these groups. For example, one in ten women in the European Union and 60 percent of female internet users in the Arab States face cyberbullying and online abuse. Taking on this issue is urgent, but detecting sexism online is not easy. It can appear in obvious ways, like rude comments, or in subtle ways, such as reinforcing gender stereotypes. Our work leverages the newly released BERT models, known for their state-of-the-art natural language processing capabilities, to address the challenges of detecting sexist comments on social networks, particularly on Twitter. To enhance the effectiveness of these models, we develop and integrate custom algorithms that implement hard-labeling strategies. These strategies assign definitive categorizations to ambiguous or subtly sexist content, providing clearer decision boundaries and enabling BERT models to tackle nuanced forms of online sexism. Along with hard labeling, we add knowledge injection with sexist libraries, data augmentation, data concatenation, with older datasets, freezing certain layers in BERT to retain pre-trained knowledge while fine-tuning, and Optuna for hyperparameter tuning. This combination of advanced pre-trained models with tailored algorithms aims to create an efficient system for correctly detecting and categorizing sexist content. Unlike previous research that has explored similar questions using different models, our study fills a critical knowledge gap by integrating newly released BERT models with innovative hard-labeling techniques.
Our approach begins with determining whether a comment is sexist, as this initial classification is essential for further categorization, such as analyzing the author’s intent or the specific nature of the sexism. By evaluating the results of this novel approach, we aim to provide more effective solutions to combat online sexism, contributing to a safer and more equitable digital environment.
Place, publisher, year, edition, pages
2025. , p. 110
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:hh:diva-55655OAI: oai:DiVA.org:hh-55655DiVA, id: diva2:1946008
Educational program
Master's Programme in Embedded and Intelligent Systems, 120 credits; Master's Programme in Information Technology, 120 credits
Presentation
2025-02-24, 14:00 (English)
Supervisors
Examiners
2025-03-212025-03-202025-03-25Bibliographically approved