Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Data Augmentation of Financial Time Series Using GANs
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

The thesis addresses the challenges posed by data quality and quantity in financial time series analysis, particularly in emerging markets. Given the need for robust methodologies to handle data limitations, the study investigates the effectiveness of Generative Adversarial Networks (GANs) in augmenting financial market data to improve the predictive and classification capabilities of machine learning models. The thesis aims to answer the following research questions: (1) How different is the correlation between the features in simulated data to the correlation between the features in real data? and (2) To what extend does the performance of classification models for sovereign yield spreads (aggregation of bond prices) in emerging markets improve when adding generated data to the training of the model? By applying the experiment as the research strategy, the performance of the GAN was assessed in three ways, the first one by measuring the average distance between each correlation in the correlation matrices of the original data and the synthetic data, resulting in an average distance of 0.19 with the best sequence length of the data. The second one by measuring the performance of a classification model, a feed-forward network that labels the data by either increase, decrease or insignificant change, using different ratios of synthetic data in the training set, resulting in a decrease of performance when adding the augmented data. The third way is by applying different analyses such as PCA and t-SNE, and plotting the results of the original data and the generated data against each other in order to check how the data is distributed. The study concludes with limitations of GANs in data augmentation, such as hardware limitation, and type of possible data generated along with suggestions for future research to improve the classification model and the authenticity of the synthetic data.

Place, publisher, year, edition, pages
2024.
Keywords [en]
GAN, TimeGAN, Emerging market, Synthetic data, Sovereign yield spreads
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:su:diva-242673OAI: oai:DiVA.org:su-242673DiVA, id: diva2:1955564
Available from: 2025-04-30 Created: 2025-04-30

Open Access in DiVA

fulltext(1256 kB)23 downloads
File information
File name FULLTEXT01.pdfFile size 1256 kBChecksum SHA-512
da4216e781aab7bfa0c6a6b2a9f48a29a514f9f326cbb79b061f9c0afd1b2105f9b3f8071924cc8fc78fca0e7038b619538d267915af8c3adc35bf745349e695
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Diaconu, Bogdan-Alexandru
By organisation
Department of Computer and Systems Sciences
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 23 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 23 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf