A salient image region is defined as an image part that is clearly different from its surround in terms of a number of attributes. In bottom-up processing, these attributes are defined as: contrast, color difference, brightness, and orientation. By measuring these attributes, visual saliency algorithms aim to predict the regions in an image that would attract our attention under free viewing conditions, i.e., when the observer is viewing an image without a specific task such as searching for an object. To quantify the interesting locations in a scene, the output of the visual saliency algorithms is usually expressed as a two dimensional gray scale map where the brighter regions correspond to the highly salient regions in the original image. In addition to advancing our understanding of human visual system, visual saliency models can be used for a number of computer vision applications. These applications include: image compression, computer graphics, image matching & recognition, design, and human-computer interaction.
In this thesis the main contributions can be outlined as: first, we present a method to inspect the performance of Itti’s classic saliency algorithm in separating the salient and non-salient image locations. Based on our results we observed that, although the saliency model can provide a good discrimination for the highly salient and non-salient regions, there is a large overlap between the locations that lie in the middle range of saliency. Second, we propose a new bottom-up visual saliency model for static two-dimensional images. In our model, we calculate saliency by using the transformations associated with the dihedral group D4. Our results suggest that the proposed saliency model outperforms many state-of-the-art saliency models. By using the proposed methodology, our algorithm can be extended to calculate saliency in three-dimensional scenes, which we intend to implement in the future. Third, we propose a way to perform statistical analysis of the fixations data from different observers and different images. Based on the analysis, we present a robust metric for judging the performance of the visual saliency algorithms. Our results show that the proposed metric can indeed be used to alleviate the problems pertaining to the evaluation of saliency models. Four, we introduce a new approach to compress an image based on the salient locations predicted by the saliency models. Our results show that the compressed images do not exhibit visual artifacts and appear to be very similar to the originals. Five, we outline a method to estimate depth from eye fixations in three-dimensional virtual scenes that can be used for creating so-called gaze maps for three-dimensional scenes. In the future, this can be used as ground truth for judging the performance of saliency algorithms for three-dimensional images.
We believe that our contributions can lead to a better understanding of saliency, address the major issues associated with the evaluation of saliency models, highlight on the contribution of top-down and bottom-up processing based on the analysis of a comprehensive eye tracking dataset, promote use of human vision steered image processing applications, and pave the way for calculating saliency in three-dimensional scenes.