Change search
ReferencesLink to record
Permanent link

Direct link
A Case Study of Parallel Bilateral Filtering on the GPU
Mälardalen University, School of Innovation, Design and Engineering.
2015 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Smoothing and noise reduction of images is often an important first step in image processing applications. Simple image smoothing algorithms like the Gaussian filter have the unfortunate side effect of blurring the image which could obfuscate important information and have a negative impact on the following applications. The bilateral filter is a well-used non-linear smoothing algorithm that seeks to preserve edges and contours while removing noise. The bilateral filter comes at a heavy cost in computational speed, especially when used on larger images, since the algorithm does a greater amount of work for each pixel in the image than some simpler smoothing algorithms. In applications where timing is important, this may be enough to encourage certain developers to choose a simpler filter, at the cost of quality. However, the time cost of the bilateral filter can be greatly reduced through parallelization, as the work for each pixel can theoretically be done simultaneously. This work uses Nvidia’s Compute Unified Device Architecture (CUDA) to implement and evaluate some of the most common and effective methods for parallelizing the bilateral filter on a Graphics processing unit (GPU). This includes use of the constant and shared memories, and a technique called 1 x N tiling. These techniques are evaluated on newer hardware and the results are compared to a sequential version, and a naive parallel version not using advanced techniques. This report also intends to give a detailed and comprehensible explanation to these techniques in the hopes that the reader may be able to use the information put forth to implement them on their own. The greatest speedup is achieved in the initial parallelizing step, where the algorithm is simply converted to run in parallel on a GPU. Storing some data in the constant memory provides a slight but reliable speedup for a small amount of work. Additional time can be gained by using shared memory. However, memory transactions did not account for as much of the execution time as was expected, and therefore the memory optimizations only yielded small improvements. Test results showed 1 x N tiling to be mostly non-beneficial for the hardware that was used in this work, but there might have been problems with the implementation.

Place, publisher, year, edition, pages
2015. , 52 p.
Keyword [en]
Bilateral Filter, Image filtering, Image processing, CUDA, GPU, GPGPU
National Category
Engineering and Technology
URN: urn:nbn:se:mdh:diva-29589OAI: diva2:872665
Subject / course
Computer Science
Available from: 2015-11-27 Created: 2015-11-19 Last updated: 2015-11-27Bibliographically approved

Open Access in DiVA

DVA502_JonasLarsson(17448 kB)30 downloads
File information
File name FULLTEXT01.pdfFile size 17448 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Other links

Search in DiVA

By author/editor
Larsson, Jonas
By organisation
School of Innovation, Design and Engineering
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 30 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 43 hits
ReferencesLink to record
Permanent link

Direct link