Multi-GPU Implementations of Parallel 3D Sweeping Algorithms with Application to Geological Folding
2015 (English)In: INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2015 COMPUTATIONAL SCIENCE AT THE GATES OF NATURE, ELSEVIER SCIENCE BV , 2015, Vol. 51, 1494-1503 p.Conference paper (Refereed)Text
This paper studies the CUDA programming challenges with using multiple GPUs inside a single machine to carry out plane-by-plane updates in parallel 3D sweeping algorithms. In particular, care must be taken to mask the overhead of various data movements between the GPUs. Multiple OpenMP threads on the CPU side should be combined multiple CUDA streams per GPU to hide the data transfer cost related to the halo computation on each 2D plane. Moreover, the technique of peer-to-peer data motion can be used to reduce the impact of 3D volumetric data shuffles that have to be done between mandatory changes of the grid partitioning. We have investigated the performance improvement of 2-and 4-GPU implementations that are applicable to 3D anisotropic front propagation computations related to geological folding. In comparison with a straightforward multi-GPU implementation, the overall performance improvement due to masking of data movements on four GPUs of the Fermi architecture was 23%. The corresponding improvement obtained on four Kepler GPUs was 47%.
Place, publisher, year, edition, pages
ELSEVIER SCIENCE BV , 2015. Vol. 51, 1494-1503 p.
, Procedia Computer Science, ISSN 1877-0509
NVIDIA GPU; CUDA programming; OpenMP; 3D sweeping; anisotropic front propagation
Computer and Information Science
IdentifiersURN: urn:nbn:se:liu:diva-127767DOI: 10.1016/j.procs.2015.05.339ISI: 000373939100152OAI: oai:DiVA.org:liu-127767DiVA: diva2:927460
15th Annual International Conference on Computational Science (ICCS)