Change search
ReferencesLink to record
Permanent link

Direct link
Optimizing a High Energy Physics (HEP) Toolkit on Heterogeneous Architectures
Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, Department of Computer and Information Science.
2011 (English)MasteroppgaveStudent thesis
Abstract [en]

A desired trend within high energy physics is to increase particle accelerator luminosities, leading to production of more collision data and higher probabilities of finding interesting physics results. A central data analysis technique used to determine whether results are interesting or not is the maximum likelihood method, and the corresponding evaluation of the negative log-likelihood, which can be computationally expensive. As the amount of data grows, it is important to take benefit from the parallelism in modern computers. This, in essence, means to exploit vector registers and all available cores on CPUs, as well as utilizing co-processors as GPUs. This thesis describes the work done to optimize and parallelize a prototype of a central data analysis tool within the high energy physics community. The work consists of optimiza- tions for multicore processors, GPUs, as well as a mechanism to balance the load between both CPUs and GPUs with the aim to fully exploit the power of modern commodity comput- ers. We explore the OpenCL standard thoroughly and we give an overview of its limitations when used in a large real-world software package. We reach a single-core speedup of ∼ 7.8x compared to the original implementation of the toolkit for the physical model we use through- out this thesis. On top of that follows an increase of ∼ 3.6x with 4 threads on a commodity Intel processor, as well as almost perfect scalability on NUMA systems when thread affinity is applied. GPUs give varying speedups depending on the complexity of the physics model used. With our model, price-comparable GPUs give a speedup of ∼ 2.5x compared to a modern Intel CPU utilizing 8 SMT threads. The balancing mechanism is based on real timings of each device and works optimally for large workloads when the API calls to the OpenCL implementation impose a small overhead and when computation timings are accurate.

Place, publisher, year, edition, pages
Institutt for datateknikk og informasjonsvitenskap , 2011. , 152 p.
Keyword [no]
ntnudaim:5795, MTDT datateknikk, Komplekse datasystemer
URN: urn:nbn:no:ntnu:diva-14478Local ID: ntnudaim:5795OAI: diva2:454090
Available from: 2011-11-04 Created: 2011-11-04

Open Access in DiVA

fulltext(2760 kB)198 downloads
File information
File name FULLTEXT01.pdfFile size 2760 kBChecksum SHA-512
Type fulltextMimetype application/pdf
cover(95 kB)28 downloads
File information
File name COVER01.pdfFile size 95 kBChecksum SHA-512
Type coverMimetype application/pdf

By organisation
Department of Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 198 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 33 hits
ReferencesLink to record
Permanent link

Direct link