Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Self-supervised language grounding by active sensing combined with Internet acquired images and text
Umeå University, Faculty of Science and Technology, Department of Computing Science.
Umeå University, Faculty of Science and Technology, Department of Computing Science.
Umeå University, Faculty of Science and Technology, Department of Computing Science.
2017 (English)In: Proceedings of the Fourth International Workshop on Recognition and Action for Scene Understanding (REACTS2017) / [ed] Jorge Dias George Azzopardi, Rebeca Marf, Málaga: REACTS , 2017, p. 71-83Conference paper, Published paper (Refereed)
Abstract [en]

For natural and efficient verbal communication between a robot and humans, the robot should be able to learn names and appearances of new objects it encounters. In this paper we present a solution combining active sensing of images with text based and image based search on the Internet. The approach allows the robot to learn both object name and how to recognise similar objects in the future, all self-supervised without human assistance. One part of the solution is a novel iterative method to determine the object name using image classi- fication, acquisition of images from additional viewpoints, and Internet search. In this paper, the algorithmic part of the proposed solution is presented together with evaluations using manually acquired camera images, while Internet data was acquired through direct and reverse image search with Google, Bing, and Yandex. Classification with multi-classSVM and with five different features settings were evaluated. With five object classes, the best performing classifier used a combination of Pyramid of Histogram of Visual Words (PHOW) and Pyramid of Histogram of Oriented Gradient (PHOG) features, and reached a precision of 80% and a recall of 78%.

Place, publisher, year, edition, pages
Málaga: REACTS , 2017. p. 71-83
National Category
Computer Sciences Computer Vision and Robotics (Autonomous Systems)
Identifiers
URN: urn:nbn:se:umu:diva-138290ISBN: 978-84-608-8176-6 (print)OAI: oai:DiVA.org:umu-138290DiVA, id: diva2:1133829
Conference
Fourth International Workshop on Recognition and Action for Scene Understanding (REACTS2017), August 25, 2017, Ystad, Sweden
Available from: 2017-08-17 Created: 2017-08-17 Last updated: 2018-06-09Bibliographically approved

Open Access in DiVA

fulltext(5286 kB)38 downloads
File information
File name FULLTEXT01.pdfFile size 5286 kBChecksum SHA-512
cd5edd63e74155531c29f76d69a259dbbcd15557ec9560c596f9ac9bc018545bb3f23ffdf612b707cd372e30ccb4ce311e62ba5cc296140d1e16c6c2bc1ba5b7
Type fulltextMimetype application/pdf

Other links

URL

Search in DiVA

By author/editor
Bensch, SunaHellström, Thomas
By organisation
Department of Computing Science
Computer SciencesComputer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar
Total: 38 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1211 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf