Digitala Vetenskapliga Arkivet

Change search
Refine search result
1234 1 - 50 of 156
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Abghari, S.
    et al.
    Department of Computer Science, Blekinge Institute of Technology, Sweden.
    Boeva, V.
    Department of Computer Science, Blekinge Institute of Technology, Sweden.
    Brage, J.
    Noda Intelligent Systems Ab, Sweden.
    Johansson, C.
    Noda Intelligent Systems Ab, Sweden.
    Grahn, H.
    Department of Computer Science, Blekinge Institute of Technology, Sweden.
    Lavesson, Niklas
    Jönköping University, School of Engineering, JTH, Department of Computing, Jönköping AI Lab (JAIL).
    Higher order mining for monitoring district heating substations2019In: Proceedings - 2019 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2019, Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 382-391, article id 8964173Conference paper (Refereed)
    Abstract [en]

    We propose a higher order mining (HOM) approach for modelling, monitoring and analyzing district heating (DH) substations' operational behaviour and performance. HOM is concerned with mining over patterns rather than primary or raw data. The proposed approach uses a combination of different data analysis techniques such as sequential pattern mining, clustering analysis, consensus clustering and minimum spanning tree (MST). Initially, a substation's operational behaviour is modeled by extracting weekly patterns and performing clustering analysis. The substation's performance is monitored by assessing its modeled behaviour for every two consecutive weeks. In case some significant difference is observed, further analysis is performed by integrating the built models into a consensus clustering and applying an MST for identifying deviating behaviours. The results of the study show that our method is robust for detecting deviating and sub-optimal behaviours of DH substations. In addition, the proposed method can facilitate domain experts in the interpretation and understanding of the substations' behaviour and performance by providing different data analysis and visualization techniques. 

  • 2.
    Abghari, Shahrooz
    et al.
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Boeva, Veselka
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Lavesson, Niklas
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL). Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Grahn, Håkan
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Gustafsson, Jörgen
    Ericsson AB.
    Shaikh, Junaid
    Ericsson AB.
    Outlier Detection for Video Session Data Using Sequential Pattern Mining2018In: ACM SIGKDD Workshop On Outlier Detection De-constructed, 2018Conference paper (Refereed)
    Abstract [en]

    The growth of Internet video and over-the-top transmission techniqueshas enabled online video service providers to deliver highquality video content to viewers. To maintain and improve thequality of experience, video providers need to detect unexpectedissues that can highly affect the viewers’ experience. This requiresanalyzing massive amounts of video session data in order to findunexpected sequences of events. In this paper we combine sequentialpattern mining and clustering to discover such event sequences.The proposed approach applies sequential pattern mining to findfrequent patterns by considering contextual and collective outliers.In order to distinguish between the normal and abnormal behaviorof the system, we initially identify the most frequent patterns. Thena clustering algorithm is applied on the most frequent patterns.The generated clustering model together with Silhouette Index areused for further analysis of less frequent patterns and detectionof potential outliers. Our results show that the proposed approachcan detect outliers at the system level.

    Download full text (pdf)
    FULLTEXT01
  • 3.
    Abghari, Shahrooz
    et al.
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Boeva, Veselka
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Lavesson, Niklas
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL). Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Grahn, Håkan
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Ickin, Selim
    Ericsson, SWE.
    Gustafsson, Jörgen
    Ericsson, SWE.
    A Minimum Spanning Tree Clustering Approach for Outlier Detection in Event Sequences2018In: The 17th IEEE International Conference on Machine Learning and Applications Special Session on Machine Learning Algorithms, Systems and Applications, IEEE, 2018Conference paper (Refereed)
    Abstract [en]

    Outlier detection has been studied in many domains. Outliers arise due to different reasons such as mechanical issues, fraudulent behavior, and human error. In this paper, we propose an unsupervised approach for outlier detection in a sequence dataset. The proposed approach combines sequential pattern mining, cluster analysis, and a minimum spanning tree algorithm in order to identify clusters of outliers. Initially, the sequential pattern mining is used to extract frequent sequential patterns. Next, the extracted patterns are clustered into groups of similar patterns. Finally, the minimum spanning tree algorithm is used to find groups of outliers. The proposed approach has been evaluated on two different real datasets, i.e., smart meter data and video session data. The obtained results have shown that our approach can be applied to narrow down the space of events to a set of potential outliers and facilitate domain experts in further analysis and identification of system level issues.

  • 4.
    Abghari, Shahrooz
    et al.
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    García Martín, Eva
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Johansson, Christian
    NODA Intelligent Systems AB, Sweden.
    Lavesson, Niklas
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL). Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Grahn, Håkan
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Trend analysis to automatically identify heat program changes2017In: Energy Procedia, Elsevier, 2017, p. 407-415Conference paper (Refereed)
    Abstract [en]

    The aim of this study is to improve the monitoring and controlling of heating systems located at customer buildings through the use of a decision support system. To achieve this, the proposed system applies a two-step classifier to detect manual changes of the temperature of the heating system. We apply data from the Swedish company NODA, active in energy optimization and services for energy efficiency, to train and test the suggested system. The decision support system is evaluated through an experiment and the results are validated by experts at NODA. The results show that the decision support system can detect changes within three days after their occurrence and only by considering daily average measurements.

    Download full text (pdf)
    fulltext
  • 5.
    Abghari, Shahrooz
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    García Martín, Eva
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Johansson, Christian
    NODA Intelligent Systems AB, SWE.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Trend analysis to automatically identify heat program changes2017In: Energy Procedia, Elsevier, 2017, Vol. 116, p. 407-415Conference paper (Refereed)
    Abstract [en]

    The aim of this study is to improve the monitoring and controlling of heating systems located at customer buildings through the use of a decision support system. To achieve this, the proposed system applies a two-step classifier to detect manual changes of the temperature of the heating system. We apply data from the Swedish company NODA, active in energy optimization and services for energy efficiency, to train and test the suggested system. The decision support system is evaluated through an experiment and the results are validated by experts at NODA. The results show that the decision support system can detect changes within three days after their occurrence and only by considering daily average measurements.

    Download full text (pdf)
    fulltext
  • 6. Allahyari, Hiva
    et al.
    Lavesson, Niklas
    User-oriented Assessment of Classification Model Understandability2011Conference paper (Refereed)
    Abstract [en]

    This paper reviews methods for evaluating and analyzing the understandability of classification models in the context of data mining. The motivation for this study is the fact that the majority of previous work has focused on increasing the accuracy of models, ignoring user-oriented properties such as comprehensibility and understandability. Approaches for analyzing the understandability of data mining models have been discussed on two different levels: one is regarding the type of the models’ presentation and the other is considering the structure of the models. In this study, we present a summary of existing assumptions regarding both approaches followed by an empirical work to examine the understandability from the user’s point of view through a survey. The results indicate that decision tree models are more understandable than rule-based models. Using the survey results regarding understandability of a number of models in conjunction with quantitative measurements of the complexity of the models, we are able to establish correlation between complexity and understandability of the models.

    Download full text (pdf)
    FULLTEXT01
  • 7.
    Allahyari, Hiva
    et al.
    Blekinge Institute of Technology, Karlskrona, Sweden.
    Lavesson, Niklas
    Blekinge Institute of Technology, Karlskrona, Sweden.
    User-oriented Assessment of Classification Model Understandability2011Conference paper (Refereed)
    Abstract [en]

    This paper reviews methods for evaluating and analyzing the understandability of classification models in the context of data mining. The motivation for this study is the fact that the majority of previous work has focused on increasing the accuracy of models, ignoring user-oriented properties such as comprehensibility and understandability. Approaches for analyzing the understandability of data mining models have been discussed on two different levels: one is regarding the type of the models’ presentation and the other is considering the structure of the models. In this study, we present a summary of existing assumptions regarding both approaches followed by an empirical work to examine the understandability from the user’s point of view through a survey. The results indicate that decision tree models are more understandable than rule-based models. Using the survey results regarding understandability of a number of models in conjunction with quantitative measurements of the complexity of the models, we are able to establish correlation between complexity and understandability of the models.

    Download full text (pdf)
    fulltext
  • 8.
    Angelova, Milena
    et al.
    Technical University of sofia, BUL.
    Vishnu Manasa, Devagiri
    Blekinge Tekniska Högskola, Institutionen för datavetenskap.
    Boeva, Veselka
    Blekinge Tekniska Högskola, Institutionen för datavetenskap.
    Linde, Peter
    Blekinge Tekniska Högskola, Biblioteket.
    Lavesson, Niklas
    Jönköping University, School of Engineering, JTH, Department of Computing, Jönköping AI Lab (JAIL). Blekinge Tekniska Högskola, Institutionen för datavetenskap.
    An Expertise Recommender System based on Data from an Institutional Repository (DiVA)2019In: Connecting the Knowledge Common from Projects to sustainable Infrastructure: The 22nd International conference on Electronic Publishing - Revised Selected Papers / [ed] Leslie Chan & Pierre Mounier, OpenEdition Press , 2019, p. 135-149Chapter in book (Refereed)
    Abstract [en]

    Finding experts in academics is an important practical problem, e.g. recruiting reviewersfor reviewing conference, journal or project submissions, partner matching for researchproposals, finding relevant M. Sc. or Ph. D. supervisors etc. In this work, we discuss anexpertise recommender system that is built on data extracted from the Blekinge Instituteof Technology (BTH) instance of the institutional repository system DiVA (DigitalScientific Archive). DiVA is a publication and archiving platform for research publicationsand student essays used by 46 publicly funded universities and authorities in Sweden andthe rest of the Nordic countries (www.diva-portal.org). The DiVA classification system isbased on the Swedish Higher Education Authority (UKÄ) and the Statistic Sweden's (SCB)three levels classification system. Using the classification terms associated with studentM. Sc. and B. Sc. theses published in the DiVA platform, we have developed a prototypesystem which can be used to identify and recommend subject thesis supervisors in academy.

  • 9.
    Angelova, Milena
    et al.
    Technical University of Sofia-branch Plovdiv, BUL.
    Vishnu Manasa, Devagiri
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Boeva, Veselka
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Linde, Peter
    Blekinge Institute of Technology, The Library.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    An expertise recommender system based on data from an institutional repository (DiVA)2018In: Proceedings of the 22nd edition of the International Conference on ELectronic PUBlishing: From Projects to Sustainable Infrastructure, ELPUB 2018 / [ed] Chan L.,Mounier P., OpenEdition Press , 2018Conference paper (Refereed)
    Abstract [en]

    Finding experts in academics is an important practical problem, e.g. recruiting reviewersfor reviewing conference, journal or project submissions, partner matching for researchproposals, finding relevant M. Sc. or Ph. D. supervisors etc. In this work, we discuss anexpertise recommender system that is built on data extracted from the Blekinge Instituteof Technology (BTH) instance of the institutional repository system DiVA (DigitalScientific Archive). DiVA is a publication and archiving platform for research publicationsand student essays used by 46 publicly funded universities and authorities in Sweden andthe rest of the Nordic countries (www.diva-portal.org). The DiVA classification system isbased on the Swedish Higher Education Authority (UKÄ) and the Statistic Sweden's (SCB)three levels classification system. Using the classification terms associated with studentM. Sc. and B. Sc. theses published in the DiVA platform, we have developed a prototypesystem which can be used to identify and recommend subject thesis supervisors inacademy.

    Download full text (pdf)
    fulltext
  • 10.
    Angelova, Milena
    et al.
    Technical University of sofia, BUL.
    Vishnu Manasa, Devagiri
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Boeva, Veselka
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Linde, Peter
    Blekinge Institute of Technology, The Library.
    Lavesson, Niklas
    An Expertise Recommender System based on Data from an Institutional Repository (DiVA)2019In: Connecting the Knowledge Common from Projects to sustainable Infrastructure: The 22nd International conference on Electronic Publishing - Revised Selected Papers / [ed] Leslie Chan, Pierre Mounier, OpenEdition Press , 2019, p. 135-149Chapter in book (Refereed)
    Abstract [en]

    Finding experts in academics is an important practical problem, e.g. recruiting reviewersfor reviewing conference, journal or project submissions, partner matching for researchproposals, finding relevant M. Sc. or Ph. D. supervisors etc. In this work, we discuss anexpertise recommender system that is built on data extracted from the Blekinge Instituteof Technology (BTH) instance of the institutional repository system DiVA (DigitalScientific Archive). DiVA is a publication and archiving platform for research publicationsand student essays used by 46 publicly funded universities and authorities in Sweden andthe rest of the Nordic countries (www.diva-portal.org). The DiVA classification system isbased on the Swedish Higher Education Authority (UKÄ) and the Statistic Sweden's (SCB)three levels classification system. Using the classification terms associated with studentM. Sc. and B. Sc. theses published in the DiVA platform, we have developed a prototypesystem which can be used to identify and recommend subject thesis supervisors in academy.

  • 11.
    Angelova, Milena
    et al.
    Technical University of Sofia-branch Plovdiv, BUL.
    Vishnu Manasa, Devagiri
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Boeva, Veselka
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Linde, Peter
    Blekinge Tekniska Högskola, Biblioteket.
    Lavesson, Niklas
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL). Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    An Expertise Recommender SystemBased on Data from an Institutional Repository (DiVA)2018In: Proceedings of the 22nd edition of the International Conference on ELectronic PUBlishing, 2018Conference paper (Refereed)
    Abstract [en]

    Finding experts in academics is an important practical problem, e.g. recruiting reviewersfor reviewing conference, journal or project submissions, partner matching for researchproposals, finding relevant M. Sc. or Ph. D. supervisors etc. In this work, we discuss anexpertise recommender system that is built on data extracted from the Blekinge Instituteof Technology (BTH) instance of the institutional repository system DiVA (DigitalScientific Archive). DiVA is a publication and archiving platform for research publicationsand student essays used by 46 publicly funded universities and authorities in Sweden andthe rest of the Nordic countries (www.diva-portal.org). The DiVA classification system isbased on the Swedish Higher Education Authority (UKÄ) and the Statistic Sweden's (SCB)three levels classification system. Using the classification terms associated with studentM. Sc. and B. Sc. theses published in the DiVA platform, we have developed a prototypesystem which can be used to identify and recommend subject thesis supervisors inacademy.

    Download full text (pdf)
    FULLTEXT01
  • 12. Annavarjula, Vaishnavi
    et al.
    Mbiydzenyu, Gideon
    Riveiro, Maria
    Jönköping University, School of Engineering, JTH, Department of Computing, Jönköping AI Lab (JAIL).
    Lavesson, Niklas
    Jönköping University, School of Engineering, JTH, Department of Computing, Jönköping AI Lab (JAIL).
    Implicit user data in fashion recommendation systems2020In: Developments of artificial intelligence technologies in computation and robotics: Proceedings of the 14th International FLINS Conference (FLINS 2020) / [ed] Zhong Li, Chunrong Yuan, Jie Lu & Etienne E. Kerre, World Scientific, 2020, p. 614-621Conference paper (Refereed)
    Abstract [en]

    Recommendation systems in fashion are used to provide recommendations to users on clothing items, matching styles, and size or fit. These recommendations are generated based on user actions such as ratings, reviews or general interaction with a seller. There is an increased adoption of implicit feedback in models aimed at providing recommendations in fashion. This paper aims to understand the nature of implicit user feedback in fashion recommendation systems by following guidelines to group user actions. Categories of user actions that characterize implicit feedback are examination, retention, reference, and annotation. Each category describes a specific set of actions a user takes. It is observed that fashion recommendations using implicit user feedback mostly rely on retention as a user action to provide recommendations.

  • 13. Baptista, Ana Alice
    et al.
    Linde, PeterBlekinge Institute of Technology, The Library.Lavesson, NiklasBlekinge Institute of Technology, School of Computing.Brito, Miguel Abrunhosa de
    Social Shaping of Digital Publishing: Exploring the Interplay Between Culture and Technology - Proceedings of the 16th International Conference on Electronic Publishing2012Collection (editor) (Other academic)
    Abstract [en]

    Since the advent of the Web, the processes and forms of electronic publishing have been changing. The open access movement has been a major driver of change in recent years with regard to scholarly communication; however, changes are also evident in other fields of application such as e-government and e-learning. In most cases these changes are driven by technological advances, but there are also cases where a change in social reality pushes technological development. Both the social and mobile web and linked data are currently shaping the edge of research in digital publishing. Liquid publishing is on the more daring agendas. Digital preservation is an issue that poses great challenges which are still far from being solved. The legal issues, security and trust continue to deserve our full attention. We need new visualization techniques and innovative interfaces that will keep pace with the global dimension of information. This is the current scenario, but what will follow? What are the technologies and social and communication paradigms that we will be discussing in ten or twenty years? ELPUB 2012 focuses on the social shaping of digital publishing, exploring the interplay between culture and technology. This makes the fact that it is being held in the European Capital of Culture for 2012, Guimarães, Portugal, all the more appropriate. 52 submissions were received for ELPUB 2012, from which 23 articles and 10 posters were accepted after peer review. Of the accepted articles, 11 were submitted as full articles and 12 as extended abstracts. These articles have been grouped into sessions on the following topics: Sessions 1 and 4 – Digital Scholarship & Publishing; Session 2 – Special Archives; Session 3 – Libraries & Repositories, Session 5 – Digital Texts & Readings, and Session 6 – Future Solutions & Innovations. The programme features two keynote speeches. Kathleen Fitzpatrick's speech is entitled “Planned Obsolescence: Publishing, Technology, and the Future of the Academy”, that of Antonio Câmara is entitled “Publishing in 2021”. Finally we call your attention to the panel on e-books, which is entitled “Academic e-books – Technological hostage or cultural redeemer?”. We believe this is another great edition of the ELPUB conference. We would like to take this opportunity to thank both the members of the ELPUB executive committee and the members of the local advisory committee, for making it happen. Together they provided valuable advice and assistance during the entire organization process. Secondly we would like to mention our colleagues on the program committee, who assured the quality of the conference through the peer review process. Last but not least, we wish to thank the local organization team for ensuring that all this effort culminates in a very interesting scientific event on the 14th and 15th of June. Thank you all for helping us to maintain the quality of ELPUB and merit the trust of our authors and attendees. We wish you all a good conference and we say farewell, hoping to see you again in Sweden in 2013!

  • 14. Baptista, Ana Alice
    et al.
    Linde, PeterBlekinge Tekniska Högskola, Biblioteket.Lavesson, NiklasBlekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation.Brito, Miguel Abrunhosa de
    Social Shaping of Digital Publishing: Exploring the Interplay Between Culture and Technology - Proceedings of the 16th International Conference on Electronic Publishing2012Collection (editor) (Other academic)
    Abstract [en]

    Since the advent of the Web, the processes and forms of electronic publishing have been changing. The open access movement has been a major driver of change in recent years with regard to scholarly communication; however, changes are also evident in other fields of application such as e-government and e-learning. In most cases these changes are driven by technological advances, but there are also cases where a change in social reality pushes technological development. Both the social and mobile web and linked data are currently shaping the edge of research in digital publishing. Liquid publishing is on the more daring agendas. Digital preservation is an issue that poses great challenges which are still far from being solved. The legal issues, security and trust continue to deserve our full attention. We need new visualization techniques and innovative interfaces that will keep pace with the global dimension of information. This is the current scenario, but what will follow? What are the technologies and social and communication paradigms that we will be discussing in ten or twenty years? ELPUB 2012 focuses on the social shaping of digital publishing, exploring the interplay between culture and technology. This makes the fact that it is being held in the European Capital of Culture for 2012, Guimarães, Portugal, all the more appropriate. 52 submissions were received for ELPUB 2012, from which 23 articles and 10 posters were accepted after peer review. Of the accepted articles, 11 were submitted as full articles and 12 as extended abstracts. These articles have been grouped into sessions on the following topics: Sessions 1 and 4 – Digital Scholarship & Publishing; Session 2 – Special Archives; Session 3 – Libraries & Repositories, Session 5 – Digital Texts & Readings, and Session 6 – Future Solutions & Innovations. The programme features two keynote speeches. Kathleen Fitzpatrick's speech is entitled “Planned Obsolescence: Publishing, Technology, and the Future of the Academy”, that of Antonio Câmara is entitled “Publishing in 2021”. Finally we call your attention to the panel on e-books, which is entitled “Academic e-books – Technological hostage or cultural redeemer?”. We believe this is another great edition of the ELPUB conference. We would like to take this opportunity to thank both the members of the ELPUB executive committee and the members of the local advisory committee, for making it happen. Together they provided valuable advice and assistance during the entire organization process. Secondly we would like to mention our colleagues on the program committee, who assured the quality of the conference through the peer review process. Last but not least, we wish to thank the local organization team for ensuring that all this effort culminates in a very interesting scientific event on the 14th and 15th of June. Thank you all for helping us to maintain the quality of ELPUB and merit the trust of our authors and attendees. We wish you all a good conference and we say farewell, hoping to see you again in Sweden in 2013!

  • 15. Beyene, Ayne A.
    et al.
    Welemariam, Tewelle
    Persson, Marie
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Improved concept drift handling in surgery prediction and other applications2015In: Knowledge and Information Systems, ISSN 0219-1377, Vol. 44, no 1, p. 177-196Article in journal (Refereed)
    Abstract [en]

    The article presents a new algorithm for handling concept drift: the Trigger-based Ensemble (TBE) is designed to handle concept drift in surgery prediction but it is shown to perform well for other classification problems as well. At the primary care, queries about the need for surgical treatment are referred to a surgeon specialist. At the secondary care, referrals are reviewed by a team of specialists. The possible outcomes of this review are that the referral: (i) is canceled, (ii) needs to be complemented, or (iii) is predicted to lead to surgery. In the third case, the referred patient is scheduled for an appointment with a surgeon specialist. This article focuses on the binary prediction of case three (surgery prediction). The guidelines for the referral and the review of the referral are changed due to, e.g., scientific developments and clinical practices. Existing decision support is based on the expert systems approach, which usually requires manual updates when changes in clinical practice occur. In order to automatically revise decision rules, the occurrence of concept drift (CD) must be detected and handled. The existing CD handling techniques are often specialized; it is challenging to develop a more generic technique that performs well regardless of CD type. Experiments are conducted to measure the impact of CD on prediction performance and to reduce CD impact. The experiments evaluate and compare TBE to three existing CD handling methods (AWE, Active Classifier, and Learn++) on one real-world dataset and one artificial dataset. TBA significantly outperforms the other algorithms on both datasets but is less accurate on noisy synthetic variations of the real-world dataset.

    Download full text (pdf)
    fulltext
  • 16. Beyene, Ayne A.
    et al.
    Welemariam, Tewelle
    Persson, Marie
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Lavesson, Niklas
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Improved concept drift handling in surgery prediction and other applications2015In: Knowledge and Information Systems, ISSN 0219-1377, E-ISSN 0219-3116, Vol. 44, no 1, p. 177-196Article in journal (Refereed)
    Abstract [en]

    The article presents a new algorithm for handling concept drift: the Trigger-based Ensemble (TBE) is designed to handle concept drift in surgery prediction but it is shown to perform well for other classification problems as well. At the primary care, queries about the need for surgical treatment are referred to a surgeon specialist. At the secondary care, referrals are reviewed by a team of specialists. The possible outcomes of this review are that the referral: (i) is canceled, (ii) needs to be complemented, or (iii) is predicted to lead to surgery. In the third case, the referred patient is scheduled for an appointment with a surgeon specialist. This article focuses on the binary prediction of case three (surgery prediction). The guidelines for the referral and the review of the referral are changed due to, e.g., scientific developments and clinical practices. Existing decision support is based on the expert systems approach, which usually requires manual updates when changes in clinical practice occur. In order to automatically revise decision rules, the occurrence of concept drift (CD) must be detected and handled. The existing CD handling techniques are often specialized; it is challenging to develop a more generic technique that performs well regardless of CD type. Experiments are conducted to measure the impact of CD on prediction performance and to reduce CD impact. The experiments evaluate and compare TBE to three existing CD handling methods (AWE, Active Classifier, and Learn++) on one real-world dataset and one artificial dataset. TBA significantly outperforms the other algorithms on both datasets but is less accurate on noisy synthetic variations of the real-world dataset.

    Download full text (pdf)
    fulltext
  • 17. Bhattacharyya, Prantik
    et al.
    Rowe, Jeff
    Wu, Felix
    Haigh, Karen
    Lavesson, Niklas
    Johnson, Henric
    Your Best might not be Good enough: Ranking in Collaborative Social Search Engines2011Conference paper (Refereed)
    Abstract [en]

    A relevant feature of online social networks like Facebook is the scope for users to share external information from the web with their friends by sharing an URL. The phenomenon of sharing has bridged the web graph with the social network graph and the shared knowledge in ego networks has become a source for relevant information for an individual user, leading to the emergence of social search as a powerful tool for information retrieval. Consideration of the social context has become an essential factor in the process of ranking results in response to queries in social search engines. In this work, we present InfoSearch, a social search engine built over the Facebook platform, which lets users search for information based on what their friends have shared. We identify and implement three distinct ranking factors based on the number of mutual friends, social group membership, and time stamp of shared documents to rank results for user searches. We perform user studies based on the Facebook feeds of two authors to understand the impact of each ranking factor on the result for two queries.

    Download full text (pdf)
    FULLTEXT01
  • 18.
    Bhattacharyya, Prantik
    et al.
    Department of Computer Science, University of California, Davis, USA.
    Rowe, Jeff
    Department of Computer Science, University of California, Davis, USA.
    Wu, Felix
    Department of Computer Science, University of California, Davis, USA.
    Haigh, Karen
    Intelligent Distributed Computing Group, BBN Technologies, USA.
    Lavesson, Niklas
    School of Computing, Blekinge Institute of Technology, Sweden.
    Johnson, Henric
    School of Computing, Blekinge Institute of Technology, Sweden.
    Your Best might not be Good enough: Ranking in Collaborative Social Search Engines2011Conference paper (Refereed)
    Abstract [en]

    A relevant feature of online social networks like Facebook is the scope for users to share external information from the web with their friends by sharing an URL. The phenomenon of sharing has bridged the web graph with the social network graph and the shared knowledge in ego networks has become a source for relevant information for an individual user, leading to the emergence of social search as a powerful tool for information retrieval. Consideration of the social context has become an essential factor in the process of ranking results in response to queries in social search engines. In this work, we present InfoSearch, a social search engine built over the Facebook platform, which lets users search for information based on what their friends have shared. We identify and implement three distinct ranking factors based on the number of mutual friends, social group membership, and time stamp of shared documents to rank results for user searches. We perform user studies based on the Facebook feeds of two authors to understand the impact of each ranking factor on the result for two queries.

    Download full text (pdf)
    fulltext
  • 19.
    Boeva, Veselka
    et al.
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Angelova, Milena
    Technical University Sofia, BUL.
    Lavesson, Niklas
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL). Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Rosander, Oliver
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Tsiporkova, Elena
    Collective Center for the Belgian Technological Industry, BEL.
    Evolutionary clustering techniques for expertise mining scenarios2018In: ICAART 2018 - Proceedings of the 10th International Conference on Agents and Artificial Intelligence, Volume 2 / [ed] van den Herik J.,Rocha A.P., SciTePress, 2018, p. 523-530Conference paper (Refereed)
    Abstract [en]

    The problem addressed in this article concerns the development of evolutionary clustering techniques that can be applied to adapt the existing clustering solution to a clustering of newly collected data elements. We are interested in clustering approaches that are specially suited for adapting clustering solutions in the expertise retrieval domain. This interest is inspired by practical applications such as expertise retrieval systems where the information available in the system database is periodically updated by extracting new data. The experts available in the system database are usually partitioned into a number of disjoint subject categories. It is becoming impractical to re-cluster this large volume of available information. Therefore, the objective is to update the existing expert partitioning by the clustering produced on the newly extracted experts. Three different evolutionary clustering techniques are considered to be suitable for this scenario. The proposed techniques are initially evaluated by applying the algorithms on data extracted from the PubMed repository. Copyright © 2018 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved.

  • 20.
    Boeva, Veselka
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Angelova, Milena
    Technical University Sofia, BUL.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Rosander, Oliver
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Tsiporkova, Elena
    Collective Center for the Belgian Technological Industry, BEL.
    Evolutionary clustering techniques for expertise mining scenarios2018In: ICAART 2018 - Proceedings of the 10th International Conference on Agents and Artificial Intelligence, Volume 2 / [ed] van den Herik J.,Rocha A.P., SciTePress , 2018, Vol. 2, p. 523-530Conference paper (Refereed)
    Abstract [en]

    The problem addressed in this article concerns the development of evolutionary clustering techniques that can be applied to adapt the existing clustering solution to a clustering of newly collected data elements. We are interested in clustering approaches that are specially suited for adapting clustering solutions in the expertise retrieval domain. This interest is inspired by practical applications such as expertise retrieval systems where the information available in the system database is periodically updated by extracting new data. The experts available in the system database are usually partitioned into a number of disjoint subject categories. It is becoming impractical to re-cluster this large volume of available information. Therefore, the objective is to update the existing expert partitioning by the clustering produced on the newly extracted experts. Three different evolutionary clustering techniques are considered to be suitable for this scenario. The proposed techniques are initially evaluated by applying the algorithms on data extracted from the PubMed repository. Copyright © 2018 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved.

  • 21. Boeva, Veselka
    et al.
    Ivanova, Petia
    Lavesson, Niklas
    A Hybrid Computational Method for the Identification of Cell Cycle-regulated Genes2010Conference paper (Refereed)
    Abstract [en]

    Gene expression microarrays are the most commonly available source of high-throughput biological data. They have been widely employed in recent years for the definition of cell cycle regulated (or periodically expressed) subsets of the genome in a number of different organisms. These have driven the development of various computational methods for identifying periodical expressed genes. However, the agreement is remarkably poor when different computational methods are applied to the same data. In view of this, we are motivated to propose herein a hybrid computational method targeting the identification of periodically expressed genes, which is based on a hybrid aggregation of estimations, generated by different computational methods. The proposed hybrid method is benchmarked against three other computational methods for the identification of periodically expressed genes: statistical tests for regulation and periodicity and a combined test for regulation and periodicity. The hybrid method is shown, together with the combined test, to statistically significantly outperform the statistical test for periodicity. However, the hybrid method is also demonstrated to be significantly better than the combined test for regulation and periodicity.

    Download full text (pdf)
    FULLTEXT01
  • 22.
    Boeva, Veselka
    et al.
    Computer Systems and Technologies Department, Technical University of Sofia, branch Plovdiv, Plovdiv, Bulgaria.
    Ivanova, Petia
    Computer Systems and Technologies Department, Technical University of Sofia, branch Plovdiv, Plovdiv, Bulgaria.
    Lavesson, Niklas
    School of Computing Blekinge, Institute of Technology, Ronneby, Sweden.
    A Hybrid Computational Method for the Identification of Cell Cycle-regulated Genes2010Conference paper (Refereed)
    Abstract [en]

    Gene expression microarrays are the most commonly available source of high-throughput biological data. They have been widely employed in recent years for the definition of cell cycle regulated (or periodically expressed) subsets of the genome in a number of different organisms. These have driven the development of various computational methods for identifying periodical expressed genes. However, the agreement is remarkably poor when different computational methods are applied to the same data. In view of this, we are motivated to propose herein a hybrid computational method targeting the identification of periodically expressed genes, which is based on a hybrid aggregation of estimations, generated by different computational methods. The proposed hybrid method is benchmarked against three other computational methods for the identification of periodically expressed genes: statistical tests for regulation and periodicity and a combined test for regulation and periodicity. The hybrid method is shown, together with the combined test, to statistically significantly outperform the statistical test for periodicity. However, the hybrid method is also demonstrated to be significantly better than the combined test for regulation and periodicity.

    Download full text (pdf)
    fulltext
  • 23. Boldt, Martin
    et al.
    Jacobsson, Andreas
    Lavesson, Niklas
    Davidsson, Paul
    Automated Spyware Detection Using End User License Agreements2008Conference paper (Refereed)
    Abstract [en]

    The amount of spyware increases rapidly over the Internet and it is usually hard for the average user to know if a software application hosts spyware. This paper investigates the hypothesis that it is possible to detect from the End User License Agreement (EULA) whether its associated software hosts spyware or not. We generated a data set by collecting 100 applications with EULAs and classifying each EULA as either good or bad. An experiment was conducted, in which 15 popular default-configured mining algorithms were applied on the data set. The results show that 13 algorithms are significantly better than random guessing, thus we conclude that the hypothesis can be accepted. Moreover, 2 algorithms also perform significantly better than the current state-of-the-art EULA analysis method. Based on these results, we present a novel tool that can be used to prevent the installation of spyware.

    Download full text (pdf)
    FULLTEXT01
  • 24.
    Boldt, Martin
    et al.
    Department of Systems and Software Engineering, School of Engineering, Blekinge Institute of Technology, Ronneby, Sweden.
    Jacobsson, Andreas
    Department of Systems and Software Engineering, School of Engineering, Blekinge Institute of Technology, Ronneby, Sweden.
    Lavesson, Niklas
    Department of Systems and Software Engineering, School of Engineering, Blekinge Institute of Technology, Ronneby, Sweden.
    Davidsson, Paul
    Department of Systems and Software Engineering, School of Engineering, Blekinge Institute of Technology, Ronneby, Sweden.
    Automated Spyware Detection Using End User License Agreements2008Conference paper (Refereed)
    Abstract [en]

    The amount of spyware increases rapidly over the Internet and it is usually hard for the average user to know if a software application hosts spyware. This paper investigates the hypothesis that it is possible to detect from the End User License Agreement (EULA) whether its associated software hosts spyware or not. We generated a data set by collecting 100 applications with EULAs and classifying each EULA as either good or bad. An experiment was conducted, in which 15 popular default-configured mining algorithms were applied on the data set. The results show that 13 algorithms are significantly better than random guessing, thus we conclude that the hypothesis can be accepted. Moreover, 2 algorithms also perform significantly better than the current state-of-the-art EULA analysis method. Based on these results, we present a novel tool that can be used to prevent the installation of spyware.

    Download full text (pdf)
    fulltext
  • 25. Borg, Anton
    et al.
    Boldt, Martin
    Lavesson, Niklas
    Informed Software Installation through License Agreement Categorization2011Conference paper (Refereed)
    Abstract [en]

    Spyware detection can be achieved by using machinelearning techniques that identify patterns in the End User License Agreements (EULAs) presented by application installers. However, solutions have required manual input from the user with varying degrees of accuracy. We have implemented an automatic prototype for extraction and classification and used it to generate a large data set of EULAs. This data set is used to compare four different machine learning algorithms when classifying EULAs. Furthermore, the effect of feature selection is investigated and for the top two algorithms, we investigate optimizing the performance using parameter tuning. Our conclusion is that feature selection and performance tuning are of limited use in this context, providing limited performance gains. However, both the Bagging and the Random Forest algorithms show promising results, with Bagging reaching an AUC measure of 0.997 and a False Negative Rate of 0.062. This shows the applicability of License Agreement Categorization for realizing informed software installation.

    Download full text (pdf)
    FULLTEXT01
  • 26. Borg, Anton
    et al.
    Boldt, Martin
    Lavesson, Niklas
    Blekinge Institute of Technology, School of Computing, Karlskrona, Sweden.
    Informed Software Installation through License Agreement Categorization2011Conference paper (Refereed)
    Abstract [en]

    Spyware detection can be achieved by using machinelearning techniques that identify patterns in the End User License Agreements (EULAs) presented by application installers. However, solutions have required manual input from the user with varying degrees of accuracy. We have implemented an automatic prototype for extraction and classification and used it to generate a large data set of EULAs. This data set is used to compare four different machine learning algorithms when classifying EULAs. Furthermore, the effect of feature selection is investigated and for the top two algorithms, we investigate optimizing the performance using parameter tuning. Our conclusion is that feature selection and performance tuning are of limited use in this context, providing limited performance gains. However, both the Bagging and the Random Forest algorithms show promising results, with Bagging reaching an AUC measure of 0.997 and a False Negative Rate of 0.062. This shows the applicability of License Agreement Categorization for realizing informed software installation.

    Download full text (pdf)
    fulltext
  • 27.
    Borg, Anton
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Boldt, Martin
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Melander, Ulf
    Boeva, Veselka
    Detecting serial residential burglaries using clustering2014In: Expert Systems with Applications, ISSN 0957-4174 , Vol. 41, no 11, p. 5252-5266Article in journal (Refereed)
    Abstract [en]

    According to the Swedish National Council for Crime Prevention, law enforcement agencies solved approximately three to five percent of the reported residential burglaries in 2012. Internationally, studies suggest that a large proportion of crimes are committed by a minority of offenders. Law enforcement agencies, consequently, are required to detect series of crimes, or linked crimes. Comparison of crime reports today is difficult as no systematic or structured way of reporting crimes exists, and no ability to search multiple crime reports exist. This study presents a systematic data collection method for residential burglaries. A decision support system for comparing and analysing residential burglaries is also presented. The decision support system consists of an advanced search tool and a plugin-based analytical framework. In order to find similar crimes, law enforcement officers have to review a large amount of crimes. The potential use of the cut-clustering algorithm to group crimes to reduce the amount of crimes to review for residential burglary analysis based on characteristics is investigated. The characteristics used are modus operandi, residential characteristics, stolen goods, spatial similarity, or temporal similarity. Clustering quality is measured using the modularity index and accuracy is measured using the rand index. The clustering solution with the best quality performance score were residential characteristics, spatial proximity, and modus operandi, suggesting that the choice of which characteristic to use when grouping crimes can positively affect the end result. The results suggest that a high quality clustering solution performs significantly better than a random guesser. In terms of practical significance, the presented clustering approach is capable of reduce the amounts of cases to review while keeping most connected cases. While the approach might miss some connections, it is also capable of suggesting new connections. The results also suggest that while crime series clustering is feasible, further investigation is needed.

  • 28.
    Borg, Anton
    et al.
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Boldt, Martin
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Lavesson, Niklas
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Melander, Ulf
    Boeva, Veselka
    Detecting serial residential burglaries using clustering2014In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 41, no 11, p. 5252-5266Article in journal (Refereed)
    Abstract [en]

    According to the Swedish National Council for Crime Prevention, law enforcement agencies solved approximately three to five percent of the reported residential burglaries in 2012. Internationally, studies suggest that a large proportion of crimes are committed by a minority of offenders. Law enforcement agencies, consequently, are required to detect series of crimes, or linked crimes. Comparison of crime reports today is difficult as no systematic or structured way of reporting crimes exists, and no ability to search multiple crime reports exist. This study presents a systematic data collection method for residential burglaries. A decision support system for comparing and analysing residential burglaries is also presented. The decision support system consists of an advanced search tool and a plugin-based analytical framework. In order to find similar crimes, law enforcement officers have to review a large amount of crimes. The potential use of the cut-clustering algorithm to group crimes to reduce the amount of crimes to review for residential burglary analysis based on characteristics is investigated. The characteristics used are modus operandi, residential characteristics, stolen goods, spatial similarity, or temporal similarity. Clustering quality is measured using the modularity index and accuracy is measured using the rand index. The clustering solution with the best quality performance score were residential characteristics, spatial proximity, and modus operandi, suggesting that the choice of which characteristic to use when grouping crimes can positively affect the end result. The results suggest that a high quality clustering solution performs significantly better than a random guesser. In terms of practical significance, the presented clustering approach is capable of reduce the amounts of cases to review while keeping most connected cases. While the approach might miss some connections, it is also capable of suggesting new connections. The results also suggest that while crime series clustering is feasible, further investigation is needed.

  • 29.
    Borg, Anton
    et al.
    Blekinge Institute of Technology, School of Computing.
    Lavesson, Niklas
    Blekinge Institute of Technology, School of Computing.
    E-mail Classification using Social Network Information2012Conference paper (Refereed)
    Abstract [en]

    A majority of E-mail is suspected to be spam. Traditional spam detection fails to differentiate between user needs and evolving social relationships. Online Social Networks (OSNs) contain more and more social information, contributed by users. OSN information may be used to improve spam detection. This paper presents a method that can use several social networks for detecting spam and a set of metrics for representing OSN data. The paper investigates the impact of using social network data extracted from an E-mail corpus to improve spam detection. The social data model is compared to traditional spam data models by generating and evaluating classifiers from both model types. The results show that accurate spam detectors can be generated from the low-dimensional social data model alone, however, spam detectors generated from combinations of the traditional and social models were more accurate than the detectors generated from either model in isolation.

    Download full text (pdf)
    FULLTEXT01
  • 30.
    Borg, Anton
    et al.
    Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation.
    Lavesson, Niklas
    Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation.
    E-mail Classification using Social Network Information2012Conference paper (Refereed)
    Abstract [en]

    A majority of E-mail is suspected to be spam. Traditional spam detection fails to differentiate between user needs and evolving social relationships. Online Social Networks (OSNs) contain more and more social information, contributed by users. OSN information may be used to improve spam detection. This paper presents a method that can use several social networks for detecting spam and a set of metrics for representing OSN data. The paper investigates the impact of using social network data extracted from an E-mail corpus to improve spam detection. The social data model is compared to traditional spam data models by generating and evaluating classifiers from both model types. The results show that accurate spam detectors can be generated from the low-dimensional social data model alone, however, spam detectors generated from combinations of the traditional and social models were more accurate than the detectors generated from either model in isolation.

    Download full text (pdf)
    fulltext
  • 31.
    Borg, Anton
    et al.
    Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation.
    Lavesson, Niklas
    Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation.
    Boeva, Veselka
    Comparison of clustering approaches for gene expression data2013In: Frontiers in Artificial Intelligence and Applications, IOS Press, 2013, p. 55-64Conference paper (Refereed)
    Abstract [en]

    Clustering algorithms have been used to divide genes into groups according to the degree of their expression similarity. Such a grouping may suggest that the respective genes are correlated and/or co-regulated, and subsequently indicates that the genes could possibly share a common biological role. In this paper, four clustering algorithms are investigated: k-means, cut-clustering, spectral and expectation-maximization. The algorithms are benchmarked against each other. The performance of the four clustering algorithms is studied on time series expression data using Dynamic TimeWarping distance in order to measure similarity between gene expression profiles. Four different cluster validation measures are used to evaluate the clustering algorithms: Connectivity and Silhouette Index for estimating the quality of clusters, Jaccard Index for evaluating the stability of a cluster method and Rand Index for assessing the accuracy. The obtained results are analyzed by Friedman's test and the Nemenyi post-hoc test. K-means is demonstrated to be significantly better than the spectral clustering algorithm under the Silhouette and Rand validation indices.

  • 32.
    Borg, Anton
    et al.
    Blekinge Institute of Technology, School of Computing.
    Lavesson, Niklas
    Blekinge Institute of Technology, School of Computing.
    Boeva, Veselka
    Comparison of clustering approaches for gene expression data2013In: Frontiers in Artificial Intelligence and Applications, IOS Press , 2013, Vol. 257, p. 55-64Conference paper (Refereed)
    Abstract [en]

    Clustering algorithms have been used to divide genes into groups according to the degree of their expression similarity. Such a grouping may suggest that the respective genes are correlated and/or co-regulated, and subsequently indicates that the genes could possibly share a common biological role. In this paper, four clustering algorithms are investigated: k-means, cut-clustering, spectral and expectation-maximization. The algorithms are benchmarked against each other. The performance of the four clustering algorithms is studied on time series expression data using Dynamic TimeWarping distance in order to measure similarity between gene expression profiles. Four different cluster validation measures are used to evaluate the clustering algorithms: Connectivity and Silhouette Index for estimating the quality of clusters, Jaccard Index for evaluating the stability of a cluster method and Rand Index for assessing the accuracy. The obtained results are analyzed by Friedman's test and the Nemenyi post-hoc test. K-means is demonstrated to be significantly better than the spectral clustering algorithm under the Silhouette and Rand validation indices.

  • 33.
    Dasari, Siva Krishna
    et al.
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Lavesson, Niklas
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Andersson, Petter
    Engineering Method Development, GKN Aerospace Engine Systems Sweden.
    Persson, Marie
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Tree-Based Response Surface Analysis2015Conference paper (Refereed)
    Abstract [en]

    Computer-simulated experiments have become a cost effective way for engineers to replace real experiments in the area of product development. However, one single computer-simulated experiment can still take a significant amount of time. Hence, in order to minimize the amount of simulations needed to investigate a certain design space, different approaches within the design of experiments area are used. One of the used approaches is to minimize the time consumption and simulations for design space exploration through response surface modeling. The traditional methods used for this purpose are linear regression, quadratic curve fitting and support vector machines. This paper analyses and compares the performance of four machine learning methods for the regression problem of response surface modeling. The four methods are linear regression, support vector machines, M5P and random forests. Experiments are conducted to compare the performance of tree models (M5P and random forests) with the performance of non-tree models (support vector machines and linear regression) on data that is typical for concept evaluation within the aerospace industry. The main finding is that comprehensible models (the tree models) perform at least as well as or better than traditional black-box models (the non-tree models). The first observation of this study is that engineers understand the functional behavior, and the relationship between inputs and outputs, for the concept selection tasks by using comprehensible models. The second observation is that engineers can also increase their knowledge about design concepts, and they can reduce the time for planning and conducting future experiments.

    Download full text (pdf)
    fulltext
  • 34.
    Dasari, Siva Krishna
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Andersson, Petter
    Engineering Method Development, GKN Aerospace Engine Systems Sweden.
    Persson, Marie
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Tree-Based Response Surface Analysis2015Conference paper (Refereed)
    Abstract [en]

    Computer-simulated experiments have become a cost effective way for engineers to replace real experiments in the area of product development. However, one single computer-simulated experiment can still take a significant amount of time. Hence, in order to minimize the amount of simulations needed to investigate a certain design space, different approaches within the design of experiments area are used. One of the used approaches is to minimize the time consumption and simulations for design space exploration through response surface modeling. The traditional methods used for this purpose are linear regression, quadratic curve fitting and support vector machines. This paper analyses and compares the performance of four machine learning methods for the regression problem of response surface modeling. The four methods are linear regression, support vector machines, M5P and random forests. Experiments are conducted to compare the performance of tree models (M5P and random forests) with the performance of non-tree models (support vector machines and linear regression) on data that is typical for concept evaluation within the aerospace industry. The main finding is that comprehensible models (the tree models) perform at least as well as or better than traditional black-box models (the non-tree models). The first observation of this study is that engineers understand the functional behavior, and the relationship between inputs and outputs, for the concept selection tasks by using comprehensible models. The second observation is that engineers can also increase their knowledge about design concepts, and they can reduce the time for planning and conducting future experiments.

    Download full text (pdf)
    fulltext
  • 35.
    Davidsson, Paul
    et al.
    Blekinge Institute of Technology, School of Computing.
    Gustafsson Friberger, Marie
    Lavesson, Niklas
    Blekinge Institute of Technology, School of Computing.
    Persson, Jan
    Blekinge Institute of Technology, School of Computing.
    Towards a Prediction Model for People Movements in Urban Areas2013Conference paper (Refereed)
    Abstract [en]

    The aim of this work is to develop a new type of service for predicting and communicating urban activity. This service provides short-term predictions (hours to days), which can be used as a basis for different types of resource allocation and planning, e.g. concerning public transport, personnel, or marketing. The core of the service consists of a forecasting engine that based on a prediction model processes data on different levels of detail and from various providers. This paper explores the requirements and features of the forecast engine. We conclude that agent-based modeling seems as the most promising approach to meet these requirements. Finally, some examples of potential applications are described along with analyses of scientific and engineering issues that need to be addressed.

    Download full text (pdf)
    fulltext
  • 36.
    Davidsson, Paul
    et al.
    Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation.
    Gustafsson Friberger, Marie
    Technology and Society, Malmö University, Malmö, Sweden.
    Lavesson, Niklas
    Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation.
    Persson, Jan
    Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation.
    Towards a Prediction Model for People Movements in Urban Areas2013Conference paper (Refereed)
    Abstract [en]

    The aim of this work is to develop a new type of service for predicting and communicating urban activity. This service provides short-term predictions (hours to days), which can be used as a basis for different types of resource allocation and planning, e.g. concerning public transport, personnel, or marketing. The core of the service consists of a forecasting engine that based on a prediction model processes data on different levels of detail and from various providers. This paper explores the requirements and features of the forecast engine. We conclude that agent-based modeling seems as the most promising approach to meet these requirements. Finally, some examples of potential applications are described along with analyses of scientific and engineering issues that need to be addressed.

    Download full text (pdf)
    fulltext
  • 37.
    Devagiri, Vishnu Manasa
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Boeva, Veselka
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Abghari, Shahrooz
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Basiri, Fahrad
    iquest AB, SWE.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.
    Multi-view data analysis techniques for monitoring smart building systems2021In: Sensors, E-ISSN 1424-8220, Vol. 21, no 20, article id 6775Article in journal (Refereed)
    Abstract [en]

    In smart buildings, many different systems work in coordination to accomplish their tasks. In this process, the sensors associated with these systems collect large amounts of data generated in a streaming fashion, which is prone to concept drift. Such data are heterogeneous due to the wide range of sensors collecting information about different characteristics of the monitored systems. All these make the monitoring task very challenging. Traditional clustering algorithms are not well equipped to address the mentioned challenges. In this work, we study the use of MV Multi-Instance Clustering algorithm for multi-view analysis and mining of smart building systems’ sensor data. It is demonstrated how this algorithm can be used to perform contextual as well as integrated analysis of the systems. Various scenarios in which the algorithm can be used to analyze the data generated by the systems of a smart building are examined and discussed in this study. In addition, it is also shown how the extracted knowledge can be visualized to detect trends in the systems’ behavior and how it can aid domain experts in the systems’ maintenance. In the experiments conducted, the proposed approach was able to successfully detect the deviating behaviors known to have previously occurred and was also able to identify some new deviations during the monitored period. Based on the results obtained from the experiments, it can be concluded that the proposed algorithm has the ability to be used for monitoring, analysis, and detecting deviating behaviors of the systems in a smart building domain. © 2021 by the authors. Licensee MDPI, Basel, Switzerland.

    Download full text (pdf)
    fulltext
  • 38.
    Flyckt, Jonatan
    et al.
    Jönköping University, SWE.
    Andersson, Filip
    Jönköping University, SWE.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.
    Nilsson, Liselott
    Swedish Forest Agency, SWE.
    Ågren, Anneli M.
    Swedish University of Agricultural Sciences, SLU, SWE.
    Detecting ditches using supervised learning on high-resolution digital elevation models2022In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 201, article id 116961Article in journal (Refereed)
    Abstract [en]

    Drained wetlands can constitute a large source of greenhouse gas emissions, but the drainage networks in these wetlands are largely unmapped, and better maps are needed to aid in forest production and to better understand the climate consequences. We develop a method for detecting ditches in high resolution digital elevation models derived from LiDAR scans. Thresholding methods using digital terrain indices can be used to detect ditches. However, a single threshold generally does not capture the variability in the landscape, and generates many false positives and negatives. We hypothesise that, by combining the digital terrain indices using supervised learning, we can improve ditch detection at a landscape-scale. In addition to digital terrain indices, additional features are generated by transforming the data to include neighbouring cells for better ditch predictions. A Random Forests classifier is used to locate the ditches, and its probability output is processed to remove noise, and binarised to produce the final ditch prediction. The confidence interval for the Cohen's Kappa index ranges [0.655, 0.781] between the evaluation plots with a confidence level of 95%. The study demonstrates that combining information from a suite of digital terrain indices using machine learning provides an effective technique for automatic ditch detection at a landscape-scale, aiding in both practical forest management and in combatting climate change. © 2022 The Authors

    Download full text (pdf)
    fulltext
  • 39.
    Flyckt, Jonatan
    et al.
    Jonkoping University.
    Andersson, Filip
    Jonkoping University.
    Westphal, Florian
    Jonkoping University.
    Mansson, Andreas
    Saab AB, Training & Simulation, Huskvarna, Sweden.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.
    Explaining rifle shooting factors through multi-sensor body tracking2023In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 27, no 2, p. 535-554Article in journal (Refereed)
    Abstract [en]

    There is a lack of data-driven training instructions for sports shooters, as instruction has commonly been based on subjective assessments. Many studies have correlated body posture and balance to shooting performance in rifle shooting tasks, but have mostly focused on single aspects of postural control. This study has focused on finding relevant rifle shooting factors by examining the entire body over sequences of time. A data collection was performed with 13 human participants carrying out live rifle shooting scenarios while being recorded with multiple body tracking sensors. A pre-processing pipeline produced a novel skeleton sequence representation, which was used to train a transformer model. The predictions from this model could be explained on a per sample basis using the attention mechanism, and visualised in an interactive format for humans to interpret. It was possible to separate the different phases of a shooting scenario from body posture with a high classification accuracy (80%). Shooting performance could be detected to an extent by separating participants using their strong and weak shooting hand. The dataset and pre-processing pipeline, as well as the techniques for generating explainable predictions presented in this study have laid the groundwork for future research in the sports shooting domain.

  • 40.
    García Martín, Eva
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Bifet, Albert
    Télécom ParisTech, FRA.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science. Jönköping University, SWE.
    Energy Modeling of Hoeffding Tree Ensembles2021In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. 25, no 1, p. 81-104Article in journal (Refereed)
    Abstract [en]

    Energy consumption reduction has been an increasing trend in machine learning over the past few years due to its socio-ecological importance. In new challenging areas such as edge computing, energy consumption and predictive accuracy are key variables during algorithm design and implementation. State-of-the-art ensemble stream mining algorithms are able to create highly accurate predictions at a substantial energy cost. This paper introduces the nmin adaptation method to ensembles of Hoeffding tree algorithms, to further reduce their energy consumption without sacrificing accuracy. We also present extensive theoretical energy models of such algorithms, detailing their energy patterns and how nmin adaptation affects their energy consumption. We have evaluated the energy efficiency and accuracy of the nmin adaptation method on five different ensembles of Hoeffding trees under 11 publicly available datasets. The results show that we are able to reduce the energy consumption significantly, by 21% on average, affecting accuracy by less than one percent on average. © 2021 - IOS Press. All rights reserved.

    Download full text (pdf)
    fulltext
  • 41.
    García Martín, Eva
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Is it ethical to avoid error analysis?2017Conference paper (Refereed)
    Abstract [en]

    Machine learning algorithms tend to create more accurate models with the availability of large datasets. In some cases, highly accurate models can hide the presence of bias in the data. There are several studies published that tackle the development of discriminatory-aware machine learning algorithms. We center on the further evaluation of machine learning models by doing error analysis, to understand under what conditions the model is not working as expected. We focus on the ethical implications of avoiding error analysis, from a falsification of results and discrimination perspective. Finally, we show different ways to approach error analysis in non-interpretable machine learning algorithms such as deep learning.

    Download full text (pdf)
    fulltext
  • 42.
    García Martín, Eva
    et al.
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Lavesson, Niklas
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL). Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Grahn, Håkan
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Energy Efficiency Analysis of the Very Fast Decision Tree Algorithm2017In: Trends in Social Network Analysis: Information Propagation, User Behavior Modeling, Forecasting, and Vulnerability Assessment / [ed] Rokia Missaoui, Talel Abdessalem, Matthieu Latapy, Cham, Switzerland: Springer, 2017, p. 229-252Chapter in book (Refereed)
    Abstract [en]

    Data mining algorithms are usually designed to optimize a trade-off between predictive accuracy and computational efficiency. This paper introduces energy consumption and energy efficiency as important factors to consider during data mining algorithm analysis and evaluation. We conducted an experiment to illustrate how energy consumption and accuracy are affected when varying the parameters of the Very Fast Decision Tree (VFDT) algorithm. These results are compared with a theoretical analysis on the algorithm, indicating that energy consumption is affected by the parameters design and that it can be reduced significantly while maintaining accuracy.

  • 43.
    García Martín, Eva
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Energy Efficiency Analysis of the Very Fast Decision Tree Algorithm2017In: Trends in Social Network Analysis: Information Propagation, User Behavior Modeling, Forecasting, and Vulnerability Assessment / [ed] Rokia Missaoui, Talel Abdessalem, Matthieu Latapy, Cham, Switzerland: Springer, 2017, p. 229-252Chapter in book (Refereed)
    Abstract [en]

    Data mining algorithms are usually designed to optimize a trade-off between predictive accuracy and computational efficiency. This paper introduces energy consumption and energy efficiency as important factors to consider during data mining algorithm analysis and evaluation. We conducted an experiment to illustrate how energy consumption and accuracy are affected when varying the parameters of the Very Fast Decision Tree (VFDT) algorithm. These results are compared with a theoretical analysis on the algorithm, indicating that energy consumption is affected by the parameters design and that it can be reduced significantly while maintaining accuracy.

    Download full text (pdf)
    fulltext
  • 44.
    García Martín, Eva
    et al.
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Lavesson, Niklas
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Grahn, Håkan
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Energy Efficiency in Data Stream Mining2015In: ASONAM '15: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining / [ed] Jian Pei,Fabrizio Silvestri & Jie Tang, ACM Digital Library, 2015, p. 1125-1132Conference paper (Refereed)
    Abstract [en]

    Data mining algorithms are usually designed to optimize a trade-off between predictive accuracy and computational efficiency. This paper introduces energy consumption and energy efficiency as important factors to consider during data mining algorithm analysis and evaluation. We extended the CRISP (Cross Industry Standard Process for Data Mining) framework to include energy consumption analysis. Based on this framework, we conducted an experiment to illustrate how energy consumption and accuracy are affected when varying the parameters of the Very Fast Decision Tree (VFDT) algorithm. The results indicate that energy consumption can be reduced by up to 92.5% (557 J) while maintaining accuracy.

  • 45.
    García Martín, Eva
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Energy Efficiency in Data Stream Mining2015In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2015, p. 1125-1132Conference paper (Refereed)
    Abstract [en]

    Data mining algorithms are usually designed to optimize a trade-off between predictive accuracy and computational efficiency. This paper introduces energy consumption and energy efficiency as important factors to consider during data mining algorithm analysis and evaluation. We extended the CRISP (Cross Industry Standard Process for Data Mining) framework to include energy consumption analysis. Based on this framework, we conducted an experiment to illustrate how energy consumption and accuracy are affected when varying the parameters of the Very Fast Decision Tree (VFDT) algorithm. The results indicate that energy consumption can be reduced by up to 92.5% (557 J) while maintaining accuracy.

  • 46.
    García Martín, Eva
    et al.
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Lavesson, Niklas
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL). Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Grahn, Håkan
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Identification of Energy Hotspots: A Case Study of the Very Fast Decision Tree2017In: GPC 2017: Green, Pervasive, and Cloud Computing / [ed] Au M., Castiglione A., Choo KK., Palmieri F., Li KC., Cham, Switzerland: Springer , 2017, p. 267-281Conference paper (Refereed)
    Abstract [en]

    Large-scale data centers account for a significant share of the energy consumption in many countries. Machine learning technology requires intensive workloads and thus drives requirements for lots of power and cooling capacity in data centers. It is time to explore green machine learning. The aim of this paper is to profile a machine learning algorithm with respect to its energy consumption and to determine the causes behind this consumption. The first scalable machine learning algorithm able to handle large volumes of streaming data is the Very Fast Decision Tree (VFDT), which outputs competitive results in comparison to algorithms that analyze data from static datasets. Our objectives are to: (i) establish a methodology that profiles the energy consumption of decision trees at the function level, (ii) apply this methodology in an experiment to obtain the energy consumption of the VFDT, (iii) conduct a fine-grained analysis of the functions that consume most of the energy, providing an understanding of that consumption, (iv) analyze how different parameter settings can significantly reduce the energy consumption. The results show that by addressing the most energy intensive part of the VFDT, the energy consumption can be reduced up to a 74.3%.

  • 47.
    García Martín, Eva
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Identification of Energy Hotspots: A Case Study of the Very Fast Decision Tree2017In: GPC 2017: Green, Pervasive, and Cloud Computing / [ed] Au M., Castiglione A., Choo KK., Palmieri F., Li KC., Cham, Switzerland: Springer, 2017, Vol. 10232, p. 267-281Conference paper (Refereed)
    Abstract [en]

    Large-scale data centers account for a significant share of the energy consumption in many countries. Machine learning technology requires intensive workloads and thus drives requirements for lots of power and cooling capacity in data centers. It is time to explore green machine learning. The aim of this paper is to profile a machine learning algorithm with respect to its energy consumption and to determine the causes behind this consumption. The first scalable machine learning algorithm able to handle large volumes of streaming data is the Very Fast Decision Tree (VFDT), which outputs competitive results in comparison to algorithms that analyze data from static datasets. Our objectives are to: (i) establish a methodology that profiles the energy consumption of decision trees at the function level, (ii) apply this methodology in an experiment to obtain the energy consumption of the VFDT, (iii) conduct a fine-grained analysis of the functions that consume most of the energy, providing an understanding of that consumption, (iv) analyze how different parameter settings can significantly reduce the energy consumption. The results show that by addressing the most energy intensive part of the VFDT, the energy consumption can be reduced up to a 74.3%.

    Download full text (pdf)
    fulltext
  • 48.
    García Martín, Eva
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Lavesson, Niklas
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Grahn, Håkan
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Casalicchio, Emiliano
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Boeva, Veselka
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
    Energy-Aware Very Fast Decision Tree2021In: International Journal of Data Science and Analytics, ISSN 2364-415X, Vol. 11, no 2, p. 105-126Article in journal (Refereed)
    Abstract [en]

    Recently machine learning researchers are designing algorithms that can run in embedded and mobile devices, which introduces additional constraints compared to traditional algorithm design approaches. One of these constraints is energy consumption, which directly translates to battery capacity for these devices. Streaming algorithms, such as the Very Fast Decision Tree (VFDT), are designed to run in such devices due to their high velocity and low memory requirements. However, they have not been designed with an energy efficiency focus. This paper addresses this challenge by presenting the nmin adaptation method, which reduces the energy consumption of the VFDT algorithm with only minor effects on accuracy. nmin adaptation allows the algorithm to grow faster in those branches where there is more confidence to create a split, and delays the split on the less confident branches. This removes unnecessary computations related to checking for splits but maintains similar levels of accuracy. We have conducted extensive experiments on 29 public datasets, showing that the VFDT with nmin adaptation consumes up to 31% less energy than the original VFDT, and up to 96% less energy than the CVFDT (VFDT adapted for concept drift scenarios), trading off up to 1.7 percent of accuracy.

    Download full text (pdf)
    fulltext
  • 49.
    García Martín, Eva
    et al.
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Lavesson, Niklas
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Grahn, Håkan
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Casalicchio, Emiliano
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Boeva, Veselka
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Hoeffding Trees with nmin adaptationManuscript (preprint) (Other academic)
    Abstract [en]

    Machine learning software accounts for a significant amount of energy consumed in data centers. These algorithms are usually optimized towards predictive performance, i.e. accuracy, and scalability. This is the case of data stream mining algorithms. Although these algorithms are adaptive to the incoming data, they have fixed parameters from the beginning of the execution, which lead to energy hotspots. We present dynamic parameter adaptation for data stream mining algorithms to trade-off energy efficiency against accuracy during runtime. To validate this approach, we introduce the nmin adaptation method to improve parameter adaptation in Hoeffding trees. This method dynamically adapts the number of instances needed to make a split (nmin) and thereby reduces the overall energy consumption. We created an experiment to compare the Very Fast Decision Tree algorithm (VFDT, original Hoeffding tree algorithm) with nmin adaptation and the standard VFDT. The results show that VFDT with nmin adaptation consumes up to 89% less energy than the standard VFDT, trading off a few percent of accuracy. Our approach can be used to trade off energy consumption with predictive and computational performance in the strive towards resource-aware machine learning. 

  • 50.
    García Martín, Eva
    et al.
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Lavesson, Niklas
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics, JTH, Jönköping AI Lab (JAIL). Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Grahn, Håkan
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Casalicchio, Emiliano
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Boeva, Veselka
    Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik.
    Hoeffding Trees with nmin adaptation2018In: The 5th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2018), IEEE, 2018, p. 70-79Conference paper (Refereed)
    Abstract [en]

    Machine learning software accounts for a significant amount of energy consumed in data centers. These algorithms are usually optimized towards predictive performance, i.e. accuracy, and scalability. This is the case of data stream mining algorithms. Although these algorithms are adaptive to the incoming data, they have fixed parameters from the beginning of the execution. We have observed that having fixed parameters lead to unnecessary computations, thus making the algorithm energy inefficient.In this paper we present the nmin adaptation method for Hoeffding trees. This method adapts the value of the nmin pa- rameter, which significantly affects the energy consumption of the algorithm. The method reduces unnecessary computations and memory accesses, thus reducing the energy, while the accuracy is only marginally affected. We experimentally compared VFDT (Very Fast Decision Tree, the first Hoeffding tree algorithm) and CVFDT (Concept-adapting VFDT) with the VFDT-nmin (VFDT with nmin adaptation). The results show that VFDT-nmin consumes up to 27% less energy than the standard VFDT, and up to 92% less energy than CVFDT, trading off a few percent of accuracy in a few datasets.

    Download full text (pdf)
    FULLTEXT01
1234 1 - 50 of 156
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf