Efficient reduction over threads
Independent thesis Basic level (university diploma), 20 credits / 30 HE creditsStudent thesis
The increasing number of cores in both desktops and servers leads to a demand for efficient parallel algorithms. This project focuses on the fundamental collective operation reduce, which merges several arrays into one by applying a binary operation element wise. Several reduce algorithms are evaluated in terms of performance and scalability and a novel algorithm is introduced that takes advantage of shared memory and exploits load imbalance. To do so, the concept of dynamic pair generation is introduced which implies constructing a binary reduce tree dynamically based on the order of thread arrival, where pairs are formed in a lock-free manner. We conclude that the dynamic algorithm, given enough spread in the arriving times, can outperform the reference algorithms for some or all array sizes.
Place, publisher, year, edition, pages
2011. , 43 p.
Trita-FYS, ISSN 0280-316X ; 57
Computer and Information Science
IdentifiersURN: urn:nbn:se:kth:diva-49818OAI: oai:DiVA.org:kth-49818DiVA: diva2:460393
Master of Science in Engineering - Computer Science and Technology