Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Inference of buffer queue times in data processing systems using Gaussian Processes: An introduction to latency prediction for dynamic software optimization in high-end trading systems
KTH, School of Computer Science and Communication (CSC).
2017 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Inferens av buffer-kötider i dataprocesseringssystem med hjälp av Gaussiska processer (Swedish)
Abstract [en]

This study investigates whether Gaussian Process Regression can be applied to evaluate buffer queue times in large scale data processing systems. It is additionally considered whether high-frequency data stream rates can be generalized into a small subset of the sample space. With the aim of providing basis for dynamic software optimization, a promising foundation for continued research is introduced.

The study is intended to contribute to Direct Market Access financial trading systems which processes immense amounts of market data daily. Due to certain limitations, we shoulder a naïve approach and model latencies as a function of only data throughput in eight small historical intervals. The training and test sets are represented from raw market data, and we resort to pruning operations to shrink the datasets by a factor of approximately 0.0005 in order to achieve computational feasibility. We further consider four different implementations of Gaussian Process Regression.

The resulting algorithms perform well on pruned datasets, with an average R2 statistic of 0.8399 over six test sets of approximately equal size as the training set. Testing on non-pruned datasets indicate shortcomings from the generalization procedure, where input vectors corresponding to low-latency target values are associated with less accuracy. We conclude that depending on application, the shortcomings may be make the model intractable. However for the purposes of this study it is found that buffer queue times can indeed be modelled by regression algorithms. We discuss several methods for improvements, both in regards to pruning procedures and Gaussian Processes, and open up for promising continued research. 

Abstract [sv]

Denna studie undersöker huruvida Gaussian Process Regression kan appliceras för att utvärdera buffer-kötider i storskaliga dataprocesseringssystem. Dessutom utforskas ifall dataströmsfrekvenser kan generaliseras till en liten delmängd av utfallsrymden. Medmålet att erhålla en grund för dynamisk mjukvaruoptimering introduceras en lovandestartpunkt för fortsatt forskning.

Studien riktas mot Direct Market Access system för handel på finansiella marknader, somprocesserar enorma mängder marknadsdata dagligen. På grund av vissa begränsningar axlas ett naivt tillvägagångssätt och väntetider modelleras som en funktion av enbartdatagenomströmning i åtta små historiska tidsinterval. Tränings- och testdataset representeras från ren marknadsdata och pruning-tekniker används för att krympa dataseten med en ungefärlig faktor om 0.0005, för att uppnå beräkningsmässig genomförbarhet. Vidare tas fyra olika implementationer av Gaussian Process Regression i beaktning.

De resulterande algorithmerna presterar bra på krympta dataset, med en medel R2 statisticpå 0.8399 över sex testdataset, alla av ungefär samma storlek som träningsdatasetet. Tester på icke krympta dataset indikerar vissa brister från pruning, där input vektorermotsvararande låga latenstider är associerade med mindre exakthet. Slutsatsen dras att beroende på applikation kan dessa brister göra modellen obrukbar. För studiens syftefinnes emellertid att latenstider kan sannerligen modelleras av regressionsalgoritmer. Slutligen diskuteras metoder för förbättrning med hänsyn till både pruning och GaussianProcess Regression, och det öppnas upp för lovande vidare forskning.

Place, publisher, year, edition, pages
2017. , p. 72
Keyword [en]
gaussian processes; latency prediction; buffer queue times; non-parametric models; machine learning; direct market access; trading systems; microstructure market data; dynamic software optimisation
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-214791OAI: oai:DiVA.org:kth-214791DiVA, id: diva2:1143261
External cooperation
Rolf Andersson
Subject / course
Computer Science; Computer Science
Educational program
Master of Science in Engineering - Industrial Engineering and Management; Master of Science - Machine Learning
Supervisors
Examiners
Available from: 2017-10-23 Created: 2017-09-21 Last updated: 2018-01-13Bibliographically approved

Open Access in DiVA

fulltext(4430 kB)13 downloads
File information
File name FULLTEXT01.pdfFile size 4430 kBChecksum SHA-512
f4f49910a789e018314931a62266ad7e61d7fa1014f5c994aa0163c374ca80f03be84969eaac4d30ede3aa33f076df070c4076f829d8aab0d0cb00baff2e263d
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 13 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 931 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf