A Case Study of Parallel Bilateral Filtering on the GPU
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Smoothing and noise reduction of images is often an important ﬁrst step in image processing applications. Simple image smoothing algorithms like the Gaussian ﬁlter have the unfortunate side eﬀect of blurring the image which could obfuscate important information and have a negative impact on the following applications. The bilateral ﬁlter is a well-used non-linear smoothing algorithm that seeks to preserve edges and contours while removing noise. The bilateral ﬁlter comes at a heavy cost in computational speed, especially when used on larger images, since the algorithm does a greater amount of work for each pixel in the image than some simpler smoothing algorithms. In applications where timing is important, this may be enough to encourage certain developers to choose a simpler ﬁlter, at the cost of quality. However, the time cost of the bilateral ﬁlter can be greatly reduced through parallelization, as the work for each pixel can theoretically be done simultaneously. This work uses Nvidia’s Compute Uniﬁed Device Architecture (CUDA) to implement and evaluate some of the most common and eﬀective methods for parallelizing the bilateral ﬁlter on a Graphics processing unit (GPU). This includes use of the constant and shared memories, and a technique called 1 x N tiling. These techniques are evaluated on newer hardware and the results are compared to a sequential version, and a naive parallel version not using advanced techniques. This report also intends to give a detailed and comprehensible explanation to these techniques in the hopes that the reader may be able to use the information put forth to implement them on their own. The greatest speedup is achieved in the initial parallelizing step, where the algorithm is simply converted to run in parallel on a GPU. Storing some data in the constant memory provides a slight but reliable speedup for a small amount of work. Additional time can be gained by using shared memory. However, memory transactions did not account for as much of the execution time as was expected, and therefore the memory optimizations only yielded small improvements. Test results showed 1 x N tiling to be mostly non-beneﬁcial for the hardware that was used in this work, but there might have been problems with the implementation.
Place, publisher, year, edition, pages
2015. , 52 p.
Bilateral Filter, Image filtering, Image processing, CUDA, GPU, GPGPU
Engineering and Technology
IdentifiersURN: urn:nbn:se:mdh:diva-29589OAI: oai:DiVA.org:mdh-29589DiVA: diva2:872665
Subject / course