Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
On Fundamental Elements of Visual Navigation Systems
Blekinge Tekniska Högskola, Institutionen för kommunikationssystem.ORCID iD: 0000-0003-4692-5415
2014 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Visual navigation is a ubiquitous yet complex task which is performed by many species for the purpose of survival. Although visual navigation is actively being studied within the robotics community, the determination of elemental constituents of a robust visual navigation system remains a challenge. Motion estimation is mistakenly considered as the sole ingredient to make a robust autonomous visual navigation system and therefore efforts are made to improve the accuracy of motion estimations. On the contrary, there are other factors which are as important as motion and whose absence could result in inability to perform seamless visual navigation such as the one exhibited by humans. Therefore, it is needed that a general model for a visual navigation system be devised which would describe it in terms of a set of elemental units. In this regard, a set of visual navigation elements (i.e. spatial memory, motion memory, scene geometry, context and scene semantics) are suggested as building blocks of a visual navigation system in this thesis. A set of methods are proposed which investigate the existence and role of visual navigation elements in a visual navigation system. A quantitative research methodology in the form of a series of systematic experiments is conducted on these methods. The thesis formulates, implements and analyzes the proposed methods in the context of visual navigation elements which are arranged into three major groupings; a) Spatial memory b) Motion Memory c) Manhattan, context and scene semantics. The investigations are carried out on multiple image datasets obtained by robot mounted cameras (2D/3D) moving in different environments.

Spatial memory is investigated by evaluation of proposed place recognition methods. The recognized places and inter-place associations are then used to represent a visited set of places in the form of a topological map. Such a representation of places and their spatial associations models the concept of spatial memory. It resembles the humans’ ability of place representation and mapping for large environments (e.g. cities). Motion memory in a visual navigation system is analyzed by a thorough investigation of various motion estimation methods. This leads to proposals of direct motion estimation methods which compute accurate motion estimates by basing the estimation process on dominant surfaces. In everyday world, planar surfaces, especially the ground planes, are ubiquitous. Therefore, motion models are built upon this constraint.

Manhattan structure provides geometrical cues which are helpful in solving navigation problems. There are some unique geometric primitives (e.g. planes) which make up an indoor environment. Therefore, a plane detection method is proposed as a result of investigations performed on scene structure. The method uses supervised learning to successfully classify the segmented clusters in 3D point-cloud datasets. In addition to geometry, the context of a scene also plays an important role in robustness of a visual navigation system. The context in which navigation is being performed imposes a set of constraints on objects and sections of the scene. The enforcement of such constraints enables the observer to robustly segment the scene and to classify various objects in the scene. A contextually aware scene segmentation method is proposed which classifies the image of a scene into a set of geometric classes. The geometric classes are sufficient for most of the navigation tasks. However, in order to facilitate the cognitive visual decision making process, the scene ought to be semantically segmented. The semantic of indoor scenes as well as semantic of the outdoor scenes are dealt with separately and separate methods are proposed for visual mapping of environments belonging to each type. An indoor scene consists of a corridor structure which is modeled as a cubic space in order to build a map of the environment. A “flash-n-extend” strategy is proposed which is responsible for controlling the map update frequency. The semantics of the outdoor scenes is also investigated and a scene classification method is proposed. The method employs a Markov Random Field (MRF) based classification framework which generates a set of semantic maps.

Place, publisher, year, edition, pages
Karlskrona: Blekinge Institute of Technology , 2014. , p. 264
Series
Blekinge Institute of Technology Doctoral Dissertation Series, ISSN 1653-2090 ; 13
Keyword [en]
robot navigation, localization, visual mapping, scene understanding, semantic mapping
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:oru:diva-46484ISBN: 978-91-7295-292-8 (print)OAI: oai:DiVA.org:oru-46484DiVA, id: diva2:869034
Available from: 2015-11-23 Created: 2015-11-12 Last updated: 2018-01-10Bibliographically approved
List of papers
1. Multi-Cue Based Place Learning for Mobile Robot Navigation
Open this publication in new window or tab >>Multi-Cue Based Place Learning for Mobile Robot Navigation
2012 (English)In: Autonomous and Intelligent Systems: Proceedings / [ed] Mohamed Kamel, Fakhri Karray, Hani Hagras, Springer , 2012, p. 50-58Conference paper, Published paper (Refereed)
Abstract [en]

Place recognition is important navigation ability for autonomous navigation of mobile robots. Visual cues extracted from images provide a way to represent and recognize visited places. In this article, a multi-cue based place learning algorithm is proposed. The algorithm has been evaluated on a localization image database containing different variations of scenes under different weather conditions taken by moving the robot-mounted camera in an indoor-environment. The results suggest that joining the features obtained from different cues provide better representation than using a single feature cue.

Place, publisher, year, edition, pages
Springer, 2012
Series
Lecture notes in computer science, ISSN 0302-9743 ; 7326 LNAI
Keyword
place learning, visual cues, place recognition, robot navigation, localization.
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:oru:diva-46480 (URN)10.1007/978-3-642-31368-4_7 (DOI)2-s2.0-84864124357 (Scopus ID)978-3-642-31367-7 (ISBN)978-3-642-31368-4 (ISBN)
Conference
3rd International Conference on Autonomous and Intelligent Systems, AIS 2012, Aveiro, Portugal, June 25-27, 2012
Available from: 2012-12-03 Created: 2015-11-12 Last updated: 2018-01-10Bibliographically approved
2. Robust Place Recognition with an Application to Semantic Topological Mapping
Open this publication in new window or tab >>Robust Place Recognition with an Application to Semantic Topological Mapping
2013 (English)In: Sixth International Conference on Machine Vision (ICMV 2013) / [ed] Branislav Vuksanovic; Jianhong Zhou; Antanas Verikas, SPIE - International Society for Optical Engineering, 2013Conference paper, Published paper (Refereed)
Abstract [en]

The problem of robust and invariant representation of places is being addressed. A place recognition technique is proposed followed by an application to a semantic topological mapping. The proposed technique is evaluated on a robot localization database which consists of a large set of images taken under various weather conditions. The results show that the proposed method can robustly recognize the places and is invariant to geometric transformations, brightness changes and noise. The comparative analysis with the state-of-the-art semantic place description methods show that the method outperforms the competing methods and exhibits better average recognition rates.

Place, publisher, year, edition, pages
SPIE - International Society for Optical Engineering, 2013
Series
Proceedings of the Society of Photo-Optical Instrumentation Engineers, ISSN 0277-786X ; 9067
Keyword
Landmarks; Place recognition; Topological maps; Visual features; Visual navigation
National Category
Signal Processing Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:oru:diva-46477 (URN)10.1117/12.2051490 (DOI)000339220200048 ()2-s2.0-84901305228 (Scopus ID)9780819499967 (ISBN)
Conference
6th International Conference on Machine Vision (ICMV), London, England, November 16-17, 2013
Available from: 2015-05-25 Created: 2015-11-12 Last updated: 2018-02-26Bibliographically approved
3. Bio-inspired Metaheuristic based Visual Tracking and Ego-motion Estimation
Open this publication in new window or tab >>Bio-inspired Metaheuristic based Visual Tracking and Ego-motion Estimation
2014 (English)In: Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods / [ed] Maria De Marsico, Antoine Tabbone and Ana Fred, SciTePress , 2014, p. 569-579Conference paper, Published paper (Refereed)
Abstract [en]

The problem of robust extraction of ego-motion from a sequence of images for an eye-in-hand camera configuration is addressed. A novel approach toward solving planar template based tracking is proposed which performs a non-linear image alignment and a planar similarity optimization to recover camera transformations from planar regions of a scene. The planar region tracking problem as a motion optimization problem is solved by maximizing the similarity among the planar regions of a scene. The optimization process employs an evolutionary metaheuristic approach in order to address the problem within a large non-linear search space. The proposed method is validated on image sequences with real as well as synthetic image datasets and found to be successful in recovering the ego-motion. A comparative analysis of the proposed method with various other state-of-art methods reveals that the algorithm succeeds in tracking the planar regions robustly and is comparable to the state-of-the art methods. Such an application of evolutionary metaheuristic in solving complex visual navigation problems can provide different perspective and could help in improving already available methods.

Place, publisher, year, edition, pages
SciTePress, 2014
Keyword
Camera Tracking, Visual Odometry, Planar Template based Tracking, Particle Swarm Optimization.
National Category
Signal Processing Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:oru:diva-46475 (URN)10.5220/0004811105690579 (DOI)2-s2.0-84902308101 (Scopus ID)978-989758018-5 (ISBN)
Conference
3rd International Conference on Pattern Recognition Applications and Methods (ICPRAM 2014), Angers, Loire Valley, France, March 6-8, 2014
Available from: 2014-12-17 Created: 2015-11-12 Last updated: 2018-01-10Bibliographically approved
4. Robust visual odometry estimation of road vehicle from dominant surfaces for large-scale mapping
Open this publication in new window or tab >>Robust visual odometry estimation of road vehicle from dominant surfaces for large-scale mapping
2015 (English)In: IET Intelligent Transport Systems, ISSN 1751-956X, E-ISSN 1751-9578, Vol. 9, no 3, p. 314-322Article in journal (Refereed) Published
Abstract [en]

Every urban environment contains a rich set of dominant surfaces which can provide a solid foundation for visual odometry estimation. In this work visual odometry is robustly estimated by computing the motion of camera mounted on a vehicle. The proposed method first identifies a planar region and dynamically estimates the plane parameters. The candidate region and estimated plane parameters are then tracked in the subsequent images and an incremental update of the visual odometry is obtained. The proposed method is evaluated on a navigation dataset of stereo images taken by a car mounted camera that is driven in a large urban environment. The consistency and resilience of the method has also been evaluated on an indoor robot dataset. The results suggest that the proposed visual odometry estimation can robustly recover the motion by tracking a dominant planar surface in the Manhattan environment. In addition to motion estimation solution a set of strategies are discussed for mitigating the problematic factors arising from the unpredictable nature of the environment. The analyses of the results as well as dynamic environmental strategies indicate a strong potential of the method for being part of an autonomous or semi-autonomous system.

Place, publisher, year, edition, pages
The Institution of Engineering and Technology, 2015
Keyword
object tracking, road safety, distance measurement, stereo image processing, cameras, road vehicles, pose estimation, motion estimation
National Category
Signal Processing Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:oru:diva-46478 (URN)10.1049/iet-its.2014.0100 (DOI)000351633300009 ()2-s2.0-84925379428 (Scopus ID)
Available from: 2015-05-26 Created: 2015-11-12 Last updated: 2018-01-10Bibliographically approved
5. A novel plane extraction approach using supervised learning
Open this publication in new window or tab >>A novel plane extraction approach using supervised learning
2013 (English)In: Machine Vision and Applications, ISSN 0932-8092, E-ISSN 1432-1769, Vol. 24, no 6, p. 1229-1237Article in journal (Refereed) Published
Abstract [en]

This paper presents a novel approach for the classification of planar surfaces in an unorganized point clouds. A feature-based planner surface detection method is proposed which classifies a point cloud data into planar and non-planar points by learning a classification model from an example set of planes. The algorithm performs segmentation of the scene by applying a graph partitioning approach with improved representation of association among graph nodes. The planarity estimation of the points in a scene segment is then achieved by classifying input points as planar points which satisfy planarity constraint imposed by the learned model. The resultant planes have potential application in solving simultaneous localization and mapping problem for navigation of an unmanned-air vehicle. The proposed method is validated on real and synthetic scenes. The real data consist of five datasets recorded by capturing three-dimensional(3D) point clouds when a RGBD camera is moved in five different indoor scenes. A set of synthetic 3D scenes are constructed containing planar and non-planar structures. The synthetic data are contaminated with Gaussian and random structure noise. The results of the empirical evaluation on both the real and the simulated data suggest that the method provides a generalized solution for plane detection even in the presence of the noise and non-planar objects in the scene. Furthermore, a comparative study has been performed between multiple plane extraction methods.

Place, publisher, year, edition, pages
Springer, 2013
Keyword
Autonomous navigation, Planar surfaces, Point cloud, UAV navigation
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:oru:diva-46471 (URN)10.1007/s00138-013-0482-4 (DOI)000321871600009 ()2-s2.0-84880793024 (Scopus ID)
Available from: 2013-09-10 Created: 2015-11-12 Last updated: 2018-01-10Bibliographically approved
6. Scene perception by context-aware dominant surfaces
Open this publication in new window or tab >>Scene perception by context-aware dominant surfaces
2013 (English)In: Signal Processing and Communication Systems (ICSPCS): Proceedings / [ed] Tadeusz A. Wysocki & Beata J. Wysocki, IEEE , 2013Conference paper, Published paper (Refereed)
Abstract [en]

Most of the computer vision algorithms operate pixel-wise and process image in a small neighborhood for feature extraction. Such a feature extraction strategy ignores the context of an object in the real world. Taking geometric context into account while classifying various regions in a scene, we can discriminate the similar features obtained from different regions with respect to their context. A geometric context based scene decomposition method is proposed and is applied in a context-aware Augmented Reality (AR) system. The proposed system segments a single image of a scene into a set of semantic classes representing dominant surfaces in the scene. The classification method is evaluated on an urban driving sequence with labeled ground truths and found to be robust in classifying the scene regions into a set of dominant applicable surfaces. The classified dominant surfaces are used to generate a 3D scene. The generated 3D scene provides an input to the AR system. The visual experience of 3D scene through the contextually aware AR system provides a solution for visual touring from single images as well as an experimental tool for improving the understanding of human visual perception.

Place, publisher, year, edition, pages
IEEE, 2013
Keyword
geometric cues, image segmentation, scene understanding, single image, surface extraction, visual perception
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:oru:diva-46479 (URN)10.1109/ICSPCS.2013.6723977 (DOI)000345766100076 ()2-s2.0-84903834661 (Scopus ID)978-1-4799-1319-0 (ISBN)
Conference
7th International Conference on Signal Processing and Communication Systems (ICSPCS), Gold Coast, Australia, December 16-18, 2013
Available from: 2015-05-25 Created: 2015-11-12 Last updated: 2018-01-10Bibliographically approved
7. Semantic indoor maps
Open this publication in new window or tab >>Semantic indoor maps
2013 (English)In: Proceedings of 2013 28th International Conference of Image and Vision Computing New Zealand / [ed] Rhee, T; Rayudu, R; Hollitt, C; Lewis, J; Zhang, M, IEEE , 2013, p. 465-470Conference paper, Published paper (Refereed)
Abstract [en]

The cumbersome process of construction and incremental update of large indoor maps can be simplified by semantic maps. A novel semantic mapping method for indoor environments is proposed which employs a flash-n-extend strategy for constructing and updating the map. At the exposure of every flash event, a 3D snapshot of the environment is taken which is extended until flash event reoccurs. A flash event occurs at a motion state transition of a mobile robot which is detected by the decomposition of motion estimates. The proposed method is evaluated on a set of image sequences and is found to be robust in building indoor maps which are suitable for robust autonomous navigation. The constructed maps provide simplistic representation of the environment which makes it ideal for high-level reasoning tasks.

Place, publisher, year, edition, pages
IEEE, 2013
Series
International Conference on Image and Vision Computing New Zealand, ISSN 2151-2191
Keyword
indoor, local awareness, Semantic, simultaneous localization and mapping, SLAM, visual navigation
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:oru:diva-46474 (URN)10.1109/IVCNZ.2013.6727059 (DOI)000350291600080 ()2-s2.0-84894293773 (Scopus ID)978-1-4799-0882-0 (ISBN)978-1-4799-0883-7 (ISBN)
Conference
28th International Conference on Image and Vision Computing New Zealand (IVCNZ), Wellington, New Zealand, November 27-29, 2013
Available from: 2015-05-25 Created: 2015-11-12 Last updated: 2018-01-10Bibliographically approved
8. Semantic Urban Maps
Open this publication in new window or tab >>Semantic Urban Maps
2014 (English)In: 22nd International Conference on Pattern Recognition: Proceedings, IEEE conference proceedings, 2014, p. 4050-4055Conference paper, Published paper (Refereed)
Abstract [en]

A novel region based 3D semantic mapping method is proposed for urban scenes. The proposed Semantic Urban Maps (SUM) method labels the regions of segmented images into a set of geometric and semantic classes simultaneously by employing a Markov Random Field based classification framework. The pixels in the labeled images are back-projected into a set of 3D point-clouds using stereo disparity. The point-clouds are registered together by incorporating the motion estimation and a coherent semantic map representation is obtained. SUM is evaluated on five urban benchmark sequences and is demonstrated to be successful in retrieving both geometric as well as semantic labels. The comparison with relevant state-of-art method reveals that SUM is competitive and performs better than the competing method in average pixel-wise accuracy.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2014
Series
International Conference on Pattern Recognition, ISSN 1051-4651
Keyword
semantic classification; semantic mapping; visual navigation
National Category
Signal Processing Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:oru:diva-46476 (URN)10.1109/ICPR.2014.694 (DOI)000359818004031 ()2-s2.0-84919934206 (Scopus ID)978-1-4799-5208-3 (ISBN)
Conference
22nd International Conference on Pattern Recognition (ICPR), Stockholm, Sweden, August 24-28, 2014
Available from: 2014-12-17 Created: 2015-11-12 Last updated: 2018-01-10Bibliographically approved

Open Access in DiVA

Thesis(27448 kB)167 downloads
File information
File name FULLTEXT01.pdfFile size 27448 kBChecksum SHA-512
2c282570d27f99dcd2cc37f8c9eeebecef6319b15109ec505edf8499de8d327ae1ff5a54137ae21b096693af9df177d01a19a4ce2a2ced3ebc1c66f42905ea5a
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Siddiqui, Abujawad Rafid
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 167 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 273 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf