This paper describes the system used by the Machine Learning Group of LTU in subtask 1 of the SemEval-2022 Task 4: Patronizing and Condescending Language (PCL) Detection. Our system consists of finetuning a pretrained text-to-text transfer transformer (T5) and innovatively reducing its out-of-class predictions. The main contributions of this paper are 1) the description of the implementation details of the T5 model we used, 2) analysis of the successes & struggles of the model in this task, and 3) ablation studies beyond the official submission to ascertain the relative importance of data split. Our model achieves an F1 score of 0.5452 on the official test set.
We introduce the Instruction Document Visual Question Answering (iDocVQA) dataset and the Large Language Document (LLaDoc) model, for training Language-Vision (LV) models for document analysis and predictions on document images, respectively. Usually, deep neural networks for the DocVQA task are trained on datasets lacking instructions. We show that using instruction-following datasets improves performance. We compare performance across document-related datasets using the recent state-of-the-art (SotA) Large Language and Vision Assistant (LLaVA)1.5 as the base model. We also evaluate the performance of the derived models for object hallucination using the Polling-based Object Probing Evaluation (POPE) dataset. The results show that instruction-tuning performance ranges from 11x to 32x of zero-shot performance and from 0.1% to 4.2% over non-instruction (traditional task) finetuning. Despite the gains, these still fall short of human performance (94.36%), implying there’s much room for improvement.
We investigate five English NLP benchmark datasets (on the superGLUE leaderboard) and two Swedish datasets for bias, along multiple axes. The datasets are the following: Boolean Question (Boolq), CommitmentBank (CB), Winograd Schema Challenge (WSC), Winogender diagnostic (AXg), Recognising Textual Entailment (RTE), Swedish CB, and SWEDN. Bias can be harmful and it is known to be common in data, which ML models learn from. In order to mitigate bias in data, it is crucial to be able to estimate it objectively. We use bipol, a novel multi-axes bias metric with explainability, to estimate and explain how much bias exists in these datasets. Multilingual, multi-axes bias evaluation is not very common. Hence, we also contribute a new, large Swedish bias-labeled dataset (of 2 million samples), translated from the English version and train the SotA mT5 model on it. In addition, we contribute new multi-axes lexica for bias detection in Swedish. We make the codes, model, and new dataset publicly available.
We introduce bipol, a new metric with explainability, for estimating social bias in text data. Harmful bias is prevalent in many online sources of data that are used for training machine learning (ML) models. In a step to address this challenge we create a novel metric that involves a two-step process: corpus-level evaluation based on model classification and sentence-level evaluation based on (sensitive) term frequency (TF). After creating new models to classify bias using SotA architectures, we evaluate two popular NLP datasets (COPA and SQuADv2) and the WinoBias dataset. As additional contribution, we created a large English dataset (with almost 2 million labeled samples) for training models in bias classification and make it publicly available. We also make public our codes.
The objective of this study is to explore the process of developing artificial intelligence and machine learning (ML) applications to establish an optimal support environment. The primary stages of ML include problem understanding, data management (DM), model building, model deployment, and maintenance. This paper specifically focuses on examining the DM stage of ML development and the challenges it presents, as it is crucial for achieving accurate end models. During this stage, the major obstacle encountered was the scarcity of adequate data for model training, particularly in domains where data confidentiality is a concern. The work aimed to construct and enhance a framework that would assist researchers and developers in addressing the insufficiency of data during the DM stage. The framework incorporates various data augmentation techniques, enabling the generation of new data from the original dataset along with all the required files for detection challenges. This augmentation process improves the overall performance of ML applications by increasing both the quantity and quality of available data, thereby providing the model with the best possible input.
Digital water meter digit recognition from images of water meter readings is a challenging research problem. One key reason is thatthis might be a lack of publicly available datasets to develop such methods. Another reason is the digits suffer from poor quality. In this work,we develop a dataset, called MR-AMR-v1, which comprises 10 different digits (0–9) that are commonly found in electrical and electronicwater meter readings. Additionally, we generate a synthetic benchmarking dataset to make the proposed model robust. We propose a weightedprobability averaging ensemble-based water meter digit recognition method applied to snapshots of the Fourier transformed convolution blockattention module-aided combined ResNet50-InceptionV3 architecture. This benchmarking method achieves an accuracy of 88% on test setimages (benchmarking data). Our model also achieves a high accuracy of 97.73% on the MNIST dataset. We benchmark the result on thisdataset using the proposed method after performing an exhaustive set of experiments.
Graph Neural Networks (GNNs) have garnered substantial interest across different fields, including the automotive sector, owing to their adeptness in comprehending and managing data characterized by intricate connections and arrangements. Within the automotive realm, GNNs can be harnessed in diverse capacities to elevate effectiveness, safety, and overall operational excellence. This study is centered on the assessment of various Graph Neural Network (GNN) models and their potential performance within the automotive sector, utilizing widely recognized datasets. The objective of the study was to raise awareness among researchers and developers working on vehicle intelligence systems (VIS) about the potential benefits of utilizing Graph Neural Networks (GNNs). This could offer solutions to various challenges in this field, including comprehending complex scenes, managing diverse data from multiple sources, adapting to dynamic situations, and more. The research explores three distinct GNN models named ViG, Point-GNN, and Few-shot GNN. These models were evaluated using datasets such as KITTI, Mini Imagenet, and ILSVRC.
Intelligent surveillance systems are inherently computationally intensive. And with their ever-expanding utilization in both small-scale home security applications and on the national scale, the necessity for efficient computer vision processing is critical. To this end, we propose a framework that utilizes modern hardware by incorporating multi-threading and concurrency to facilitate the complex processes associated with object detection, tracking, and identification, enabling lower-powered systems to support such intelligent surveillance systems effectively. The proposed architecture provides an adaptable and robust processing pipeline, leveraging the thread pool design pattern. The developed method can achieve respectable throughput rates on low-powered or constrained compute platforms.
For an intelligent transportation system, identifying license plate numbers in drone photos is difficult, and it is used in practical applications like parking management, traffic management, automatically organizing parking spots, etc. The primary goal of the work that is being presented is to demonstrate how to extract robust and invariant features from PCM that can withstand the difficulties posed by drone images. After that, the work will take advantage of a fully connected neural network to tackle the difficulties of fixing precise bounding boxes regardless of orientations, shapes, and text sizes. The proposed work will be able to find the detected text for both license plate numbers and natural scene images which will lead to a better recognition stage. Both our drone dataset (Mimos) and the benchmark license plate dataset (Medialab) are used to assess the effectiveness of the study that has been done. To show that the suggested system can detect text of natural scenes in a wide variety of situations. Four benchmark datasets, namely, SVT, MSRA-TD-500, ICDAR 2017 MLT, and Total Text are used for the experimental results. We also describe trials that demonstrate robustness to varying height distances and angles. This work's code and data will be made publicly available on GitHub.
Targeting the current Covid 19 pandemic situation, this paper identifies the need of crowd management. Thus, it proposes an effective and efficient real-time human detection and counting solution specifically for shopping malls by producing a system with graphical user interface and management functionalities. Besides, it comprehensively reviews and compares the existing techniques and similar systems to select the ideal solution for this scenario. Specifically, advanced deep learning computer vision techniques are decided by using YOLOv3 for detecting and classifying the human objects with DeepSORT tracking algorithm to track each detected human object and perform counting using intrusion line judgment. Additionally, it converts the pretrained YOLOv3 into TensorFlow format for better and faster real-time computation using graphical processing unit instead of using central processing unit as the traditional target machine. The experimental results have proven this implementation combination to be 91.07% accurate and real-time capable with testing videos from the internet to simulate the shopping mall entrance scenario.
This paper addresses the critical need for advanced real-time vehicle detection methodologies in Vehicle Intelligence Systems (VIS), especially in the context of using Unmanned Aerial Vehicles (UAVs) for data acquisition in severe weather conditions, such as heavy snowfall typical of the Nordic region. Traditional vehicle detection techniques, which often rely on custom-engineered features and deterministic algorithms, fall short in adapting to diverse environmental challenges, leading to a demand for more precise and sophisticated methods. The limitations of current architectures, particularly when deployed in real-time on edge devices with restricted computational capabilities, are highlighted as significant hurdles in the development of efficient vehicle detection systems. To bridge this gap, our research focuses on the formulation of an innovative approach that combines the fractional B-spline wavelet transform with a tailored U-Net architecture, operational on a Raspberry Pi 4. This method aims to enhance vehicle detection and localization by leveraging the unique attributes of the NVD dataset, which comprises drone-captured imagery under the harsh winter conditions of northern Sweden. The dataset, featuring 8450 annotated frames with 26,313 vehicles, serves as the foundation for evaluating the proposed technique. The comparative analysis of the proposed method against state-of-the-art detectors, such as YOLO and Faster RCNN, in both accuracy and efficiency on constrained devices, emphasizes the capability of our method to balance the trade-off between speed and accuracy, thereby broadening its utility across various domains.
Despite the recent progress on scaling multilingual machine translation (MT) to severalunder-resourced African languages, accuratelymeasuring this progress remains challenging,since evaluation is often performed on n-grammatching metrics such as BLEU, which typically show a weaker correlation with humanjudgments. Learned metrics such as COMEThave higher correlation; however, the lack ofevaluation data with human ratings for underresourced languages, complexity of annotationguidelines like Multidimensional Quality Metrics (MQM), and limited language coverageof multilingual encoders have hampered theirapplicability to African languages. In this paper, we address these challenges by creatinghigh-quality human evaluation data with simplified MQM guidelines for error detection and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AFRICOMET: COMETevaluation metrics for African languages byleveraging DA data from well-resourced languages and an African-centric multilingual encoder (AfroXLM-R) to create the state-of-theart MT evaluation metrics for African languages with respect to Spearman-rank correlation with human judgments (0.441).