Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Evaluating clustering techniques in financial time series
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Systems and Control.
2023 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

This degree project aims to investigate different evaluation strategies for clustering methodsused to cluster multivariate financial time series. Clustering is a type of data mining techniquewith the purpose of partitioning a data set based on similarity to data points in the same cluster,and dissimilarity to data points in other clusters. By clustering the time series of mutual fundreturns, it is possible to help individuals select funds matching their current goals and portfolio. Itis also possible to identify outliers. These outliers could be mutual funds that have not beenclassified accurately by the fund manager, or potentially fraudulent practices.

To determine which clustering method is the most appropriate for the current data set it isimportant to be able to evaluate different techniques. Using robust evaluation methods canassist in choosing the parameters to ensure optimal performance. The evaluation techniquesinvestigated are conventional internal validation measures, stability measures, visualizationmethods, and evaluation using domain knowledge about the data. The conventional internalvalidation methods and stability measures were used to perform model selection to find viableclustering method candidates. These results were then evaluated using visualization techniquesas well as qualitative analysis of the result. Conventional internal validation measures testedmight not be appropriate for model selection of the clustering methods, distance metrics, or datasets tested. The results often contradicted one another or suggested trivial clustering solutions,where the number of clusters is either 1 or equal to the number of data points in the data sets.Similarly, a stability validation metric called the stability index typically favored clustering resultscontaining as few clusters as possible. The only method used for model selection thatconsistently suggested clustering algorithms producing nontrivial solutions was the CLOSEscore. The CLOSE score was specifically developed to evaluate clusters of time series bytaking both stability in time and the quality of the clusters into account.

We use cluster visualizations to show the clusters. Scatter plots were produced by applyingdifferent methods of dimension reduction to the data, Principal Component Analysis (PCA) andt-Distributed Stochastic Neighbor Embedding (t-SNE). Additionally, we use cluster evolutionplots to display how the clusters evolve as different parts of the time series are used to performthe clustering thus emphasizing the temporal aspect of time series clustering. Finally, the resultsindicate that a manual qualitative analysis of the clustering results is necessary to finely tune thecandidate clustering methods. Performing this analysis highlights flaws of the other validationmethods, as well as allows the user to select the best method out of a few candidates based onthe use case and the reason for performing the clustering.

Place, publisher, year, edition, pages
2023. , p. 82
Series
UPTEC F, ISSN 1401-5757 ; 23021
Keywords [en]
clustering, machine learning, financial time series, time series, unsupervised learning, cluster validation, cluster evaluation
Keywords [sv]
klustring, klusteranalys, finansiella tidsserier, maskininlärning, klustervalidering, evalueringsteknik
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:uu:diva-504591OAI: oai:DiVA.org:uu-504591DiVA, id: diva2:1767708
External cooperation
Kidbrooke Advisory AB
Subject / course
Statistics
Educational program
Master Programme in Engineering Physics
Supervisors
Examiners
Available from: 2023-06-15 Created: 2023-06-14 Last updated: 2023-06-15Bibliographically approved

Open Access in DiVA

fulltext(3527 kB)956 downloads
File information
File name FULLTEXT01.pdfFile size 3527 kBChecksum SHA-512
d8da496e460b0a91ca262d35a04df75f6e76e5e63648d0ea78141009b224009d1428e6008b71e74b1a4cc79650d583599b9069d581b9adbb58e219b71f874096
Type fulltextMimetype application/pdf

By organisation
Division of Systems and Control
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 956 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 706 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf