Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Study of bitwise operations on non-scarce attribute based data structures in PostgreSQL
KTH, School of Electrical Engineering and Computer Science (EECS).
2018 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

This report investigates the viability of bitwise operations on non-scarce attribute based data structures in PostgreSQL. For applications where computation can’t be avoided, it most probably can be optimized. In an attempt of bringing the computation closer to hardware and the underlying data, operations directly on the database system are explored, taking inspiration from the research field of comparative genomics. With the case-study of an online job platform in mind, where possible matchings between candidate and job descriptions are calculated by a matching engine, a binary encoding is proposed and the computational components identified. The ultimate goal was to evaluate the scalability of the bitwise strategy with respect to the current matching engine. Through an iterative approach, this report conducts quantitative experiments on the presented components. Most notably, an implementation of the population count in the form of a C extension was introduced. It was found, that even for large sequence lengths, the operation is highly efficient. Among the chosen algorithms Lookup Table, Hamming Weight, Intrinsic functions and Unrolled Inline Assembly, the 64 bit intrinsic function displayed the best performance. Benchmarks determined, that the proposed bitwise approach is an excellent strategy for the outlined use-case. Despite the tradeoff of additional complexity in the encoding and decoding of data, the speedup is so significant, that the targeted user base of 100000 can easily be managed and allows for the deprecation of caching mechanisms.

Abstract [sv]

Denna rapport undersöker gångbarheten för bitwise-operationer på icke-knappa attributbaserade datastrukturer i PostgreSQL. För applikationer där komputationen inte kan undvikas, kan den högst troligen optimeras. I ett försök att föra beräkningen närmare hårdvaran och den underliggande datan, undersöks operationer direkt på databasen med inspiration från forskningsområdet inom komparativ genomik. Med fallstudien av en online rekryteringsplattform i åtanke, där möjliga matchningar mellan kandidatoch arbetsbeskrivningar beräknas av en matchningsmotor, föreslås en binär kodning och komputationskomponenterna identifieras. Det slutgiltiga målet var att utvärdera skalbarheten hos bitwise-strategin med avseende till den aktuella matchningsmotorn. Genom ett iterativ tillvägagångssätt utför denna rapport kvantitativa experiment på de presenterade komponenterna. Framför allt infördes en implementering av population count i form av ett C-tillägg i databasen. Det visade sig att även för större sekvenslängder

är operationen mycket effektiv. Bland de utvalda algoritmerna Lookup Table, Hamming Weight, Intrinsic-funktioner och Unrolled Inline Assembly, visade 64-bitars Intrisicfunktionen den bästa prestandan. Experimenten fastställde att det föreslagna bitwisetillvägagångssättet är en utmärkt strategi för den valda fallstudien. Trots avvägningen med ytterligare komplexitet vid kodning och avkodning av data är hastigheten så signifikant att ett användarantal på 100000 enkelt kan hanteras och möjliggör uteslutning av cache-mekanismer.

Place, publisher, year, edition, pages
2018. , p. 34
Series
TRITA-EECS-EX ; 2018:152
Keywords [en]
matching engine; scaling; binary encoding; bitwise; attribute data; PostgreSQL; population count
Keywords [sv]
matchningsmotor; skalning; binär kodning; bitwise; attributdata; PostgreSQL; population count
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-232075OAI: oai:DiVA.org:kth-232075DiVA, id: diva2:1232119
Subject / course
Information and Communication Technology
Educational program
Bachelor of Science - Information and Communication Technology
Supervisors
Examiners
Available from: 2018-07-10 Created: 2018-07-10 Last updated: 2018-07-10Bibliographically approved

Open Access in DiVA

fulltext(550 kB)5 downloads
File information
File name FULLTEXT01.pdfFile size 550 kBChecksum SHA-512
aa4db7283502450606c160916eb6b35e00ccf79435e71e5a23440e7f629bfdc3b87f3dc51123c8579733ecc1cc19d2af14e87a6dc85fac485f029c912eb399a4
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 5 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 15 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf