Change search
ReferencesLink to record
Permanent link

Direct link
OpenACC-based Snow Simulation
Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, Department of Computer and Information Science.
2013 (English)MasteroppgaveStudent thesis
Abstract [en]

In recent years, the GPU platform has risen in popularity in high performance com- puting due to its cost effectiveness and high computing power offered through its many parallel cores. The GPUs computing power can be harnessed using the low-level GPGPU programming APIs CUDA and OpenCL. While both CUDA and OpenCL gives the pro- grammer fine-grained control of a GPUs resources, they are both generally considered difficult to use and can potentially lead to complicated software design. To simplify GPGPU programming and gain more mainstream usage of GPUs, there is an increased interest in moving the complexity of GPGPU programming over to the compiler. This has lead to the development of the directive-based standard for heterogeneous computing called OpenACC, supported by NVIDIA, Cray, PGI, CAPS and others. In this thesis, we explore using OpenACC on a high performance snow simulator code de- veloped by the HPC-Lab at NTNU. The snow simulator consists of two main simulation components; the simulation of wind, and the simulation of snow particle movement. The OpenACC version of the snow simulator is made by first updating the current CUDA version, porting it to a sequential CPU implementation, and applying OpenACC directives to accelerate compute intensive regions in the code. The OpenACC port is also optimized by reducing datamovement between host and device using OpenACC library routines. Due to the heterogeneous nature of OpenACC, we show that the inability to explicitly use shared memory as temporary storage and not being able to use texture memory for hardware based interpolation and 3D caching, are the largest performance bottlenecks when comparing to the CUDA version. This is supported by the benchmarks of the OpenACC implementation which is shown to give only 40.6% performance of the CUDA version with an average speedup of 3.2x when scaling the amount of snow particles simulated and using a balanced windfield dimension. When scaling the windfield with constant snow particles 58% of the CUDA performance is reached with an average speedup of 4.84x. The best real-time performance is found at about 1.5M snow particles when using a balanced windfield with about 524K grid cells. Using OpenACC for accelerating high performance graphical simulations can be a viable option if the goal is high code portability, however, when the goal is to achieve the best possible performance, our experience show that it is still better to use the more low-level alternatives CUDA or OpenCL.

Place, publisher, year, edition, pages
Institutt for datateknikk og informasjonsvitenskap , 2013. , 122 p.
URN: urn:nbn:no:ntnu:diva-23000Local ID: ntnudaim:9823OAI: diva2:655634
Available from: 2013-10-12 Created: 2013-10-12 Last updated: 2013-10-12Bibliographically approved

Open Access in DiVA

fulltext(7900 kB)538 downloads
File information
File name FULLTEXT01.pdfFile size 7900 kBChecksum SHA-512
Type fulltextMimetype application/pdf
cover(184 kB)7 downloads
File information
File name COVER01.pdfFile size 184 kBChecksum SHA-512
Type coverMimetype application/pdf

By organisation
Department of Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 538 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 64 hits
ReferencesLink to record
Permanent link

Direct link