Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Higher Order Mining Approach for the Analysis of Real-World Datasets
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.ORCID iD: 0000-0002-3010-8798
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.ORCID iD: 0000-0003-3128-191x
NODA Intelligent Systems AB, SWE.
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.ORCID iD: 0000-0001-9947-1088
2020 (English)In: Energies, E-ISSN 1996-1073, Vol. 13, no 21, article id 5781Article in journal (Refereed) Published
Abstract [en]

In this study, we propose a higher order mining approach that can be used for the analysis of real-world datasets. The approach can be used to monitor and identify the deviating operational behaviour of the studied phenomenon in the absence of prior knowledge about the data. The proposed approach consists of several different data analysis techniques, such as sequential pattern mining, clustering analysis, consensus clustering and the minimum spanning tree (MST). Initially, a clustering analysis is performed on the extracted patterns to model the behavioural modes of the studied phenomenon for a given time interval. The generated clustering models, which correspond to every two consecutive time intervals, can further be assessed to determine changes in the monitored behaviour. In cases in which significant differences are observed, further analysis is performed by integrating the generated models into a consensus clustering and applying an MST to identify deviating behaviours. The validity and potential of the proposed approach is demonstrated on a real-world dataset originating from a network of district heating (DH) substations. The obtained results show that our approach is capable of detecting deviating and sub-optimal behaviours of DH substations.

Place, publisher, year, edition, pages
MDPI, 2020. Vol. 13, no 21, article id 5781
Keywords [en]
outlier detection, fault detection, higher order mining, clustering analysis, minimum spanning tree, data mining, district heating substations
National Category
Energy Systems
Identifiers
URN: urn:nbn:se:bth-20453DOI: 10.3390/en13215781ISI: 000588863900001OAI: oai:DiVA.org:bth-20453DiVA, id: diva2:1469612
Part of project
Bigdata@BTH- Scalable resource-efficient systems for big data analytics, Knowledge Foundation
Funder
Knowledge Foundation, 20140032
Note

open access

Available from: 2020-09-22 Created: 2020-09-22 Last updated: 2023-08-28Bibliographically approved
In thesis
1. Data Mining Approaches for Outlier Detection Analysis
Open this publication in new window or tab >>Data Mining Approaches for Outlier Detection Analysis
2020 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Outlier detection is studied and applied in many domains. Outliers arise due to different reasons such as fraudulent activities, structural defects, health problems, and mechanical issues. The detection of outliers is a challenging task that can reveal system faults, fraud, and save people's lives. Outlier detection techniques are often domain-specific. The main challenge in outlier detection relates to modelling the normal behaviour in order to identify abnormalities. The choice of model is important, i.e., an unsuitable data model can lead to poor results. This requires a good understanding and interpretation of the data, the constraints, and requirements of the domain problem. Outlier detection is largely an unsupervised problem due to unavailability of labeled data and the fact that labeled data is expensive. 

In this thesis, we study and apply a combination of both machine learning and data mining techniques to build data-driven and domain-oriented outlier detection models. We focus on three real-world application domains: maritime surveillance, district heating, and online media and sequence datasets. We show the importance of data preprocessing as well as feature selection in building suitable methods for data modelling. We take advantage of both supervised and unsupervised techniques to create hybrid methods. 

More specifically, we propose a rule-based anomaly detection system using open data for the maritime surveillance domain. We exploit sequential pattern mining for identifying contextual and collective outliers in online media data. We propose a minimum spanning tree clustering technique for detection of groups of outliers in online media and sequence data. We develop a few higher order mining approaches for identifying manual changes and deviating behaviours in the heating systems at the building level. The proposed approaches are shown to be capable of explaining the underlying properties of the detected outliers. This can facilitate domain experts in narrowing down the scope of analysis and understanding the reasons of such anomalous behaviours. We also investigate the reproducibility of the proposed models in similar application domains.

Place, publisher, year, edition, pages
Karlskrona: Blekinge Tekniska Högskola, 2020. p. 251
Series
Blekinge Institute of Technology Doctoral Dissertation Series, ISSN 1653-2090 ; 9
Keywords
outlier detection, data modelling, machine learning, clustering analysis, data stream mining
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:bth-20454 (URN)9789172954090 (ISBN)
Public defence
2020-12-01, J1630, Karlskrona, 13:00 (English)
Opponent
Supervisors
Funder
Knowledge Foundation, 20140032
Available from: 2020-10-16 Created: 2020-10-12 Last updated: 2020-12-14Bibliographically approved

Open Access in DiVA

fulltext(531 kB)262 downloads
File information
File name FULLTEXT01.pdfFile size 531 kBChecksum SHA-512
81916f5d239ce8bcbc50d0ca37391efa8a68107568342cdf2da9570672fb1f457104bb93940dc9bcd8bb1f2705a519cee651a1b56913077d5926edc3517455f5
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Abghari, ShahroozBoeva, VeselkaGrahn, Håkan
By organisation
Department of Computer Science
In the same journal
Energies
Energy Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 262 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 498 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf