Change search
ReferencesLink to record
Permanent link

Direct link
Heterogeneous FTDT for Seismic Processing
Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, Department of Computer and Information Science.
2013 (English)MasteroppgaveStudent thesis
Abstract [en]

In the early days of computing, scientific calculations were done by specialized hardware. More recently, increasingly powerful CPUs took over and have been dominant for a long time. Now though, scientific computation is not only for the general CPU environment anymore. GPUs are specialized processors with their own memory hierarchy requiring more effort to program, but for suitable algorithms they may significantly outperform serially optimized CPUs. In recent years, these GPUs have become a lot more easily programmable, where they in the past had to be programmed through the abstraction of a graphics pipeline. EMGS in Trondheim is an oil-finding service working with analysis of seismic readings of the ocean floor, to provide information about possible oil reservoirs. Data-centers comprised of CPU nodes does all the work today, however GPU installations could be more cost effective and faster. In this thesis we look at the implementation of the main part of one of their data analysis algorithms. For this we use the FDTD method implemented in Yee bench by Ulf Andersson. We look at how to adapt it for GPU using CUDA, parallelize the CPU implementations and how to run this efficiently together heterogeneously. It is shown that this method has great potential for use on GPUs, speedups just short of 19x over single thread CPU are achieved in this work. The FDTD method we use does however have some erratic memory operations which limits our performance compared to great GPU implementations these days which can reach speedups of over 100x. However, many of them still compare to single CPU performance. The order in which we address memory is therefore even more important, we show that optimizing memory writes when half the memory reads will not coalesce still improves our performance considerably. We show that care is needed when scheduling jobs on both CPU and GPU on the same node to avoid the total performance going down. Using all available resources on the host may not be beneficial. Utilizing several parallel CUDA streams proves effective to hide a lot of overhead and delay caused by busy CPU and main memory. This work is not a final solution for EMGS? needs for this tool, other consid- erations and options than those discussed are also of interest. These topics are included in the future work section.

Place, publisher, year, edition, pages
Institutt for datateknikk og informasjonsvitenskap , 2013. , 95 p.
URN: urn:nbn:no:ntnu:diva-22994Local ID: ntnudaim:9486OAI: diva2:655628
Available from: 2013-10-12 Created: 2013-10-12 Last updated: 2013-10-12Bibliographically approved

Open Access in DiVA

fulltext(1084 kB)238 downloads
File information
File name FULLTEXT01.pdfFile size 1084 kBChecksum SHA-512
Type fulltextMimetype application/pdf
cover(184 kB)5 downloads
File information
File name COVER01.pdfFile size 184 kBChecksum SHA-512
Type coverMimetype application/pdf
attachment(21 kB)9 downloads
File information
File name ATTACHMENT01.zipFile size 21 kBChecksum SHA-512
Type attachmentMimetype application/zip

By organisation
Department of Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 238 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 42 hits
ReferencesLink to record
Permanent link

Direct link