Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Scale coding bag of deep features for human attribute and action recognition
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
University of Autonoma Barcelona, Spain.
Aalto University, Finland.
University of Florence, Italy.
Show others and affiliations
2018 (English)In: Machine Vision and Applications, ISSN 0932-8092, E-ISSN 1432-1769, Vol. 29, no 1, p. 55-71Article in journal (Refereed) Published
Abstract [en]

Most approaches to human attribute and action recognition in still images are based on image representation in which multi-scale local features are pooled across scale into a single, scale-invariant encoding. Both in bag-of-words and the recently popular representations based on convolutional neural networks, local features are computed at multiple scales. However, these multi-scale convolutional features are pooled into a single scale-invariant representation. We argue that entirely scale-invariant image representations are sub-optimal and investigate approaches to scale coding within a bag of deep features framework. Our approach encodes multi-scale information explicitly during the image encoding stage. We propose two strategies to encode multi-scale information explicitly in the final image representation. We validate our two scale coding techniques on five datasets: Willow, PASCAL VOC 2010, PASCAL VOC 2012, Stanford-40 and Human Attributes (HAT-27). On all datasets, the proposed scale coding approaches outperform both the scale-invariant method and the standard deep features of the same network. Further, combining our scale coding approaches with standard deep features leads to consistent improvement over the state of the art.

Place, publisher, year, edition, pages
SPRINGER , 2018. Vol. 29, no 1, p. 55-71
Keywords [en]
Action recognition; Attribute recognition; Bag of deep features
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
URN: urn:nbn:se:liu:diva-144448DOI: 10.1007/s00138-017-0871-1ISI: 000419583600005OAI: oai:DiVA.org:liu-144448DiVA, id: diva2:1176581
Note

Funding Agencies|Spanish Ministry of Science; Catalan project [2014 SGR 221]; CHISTERA project [PCIN-2015-251]; SSF [EMC2]; VR through the Strategic Area for ICT research ELLIIT [2016-05543]; Academy of Finland [251170]; Nvidia; [TIN2013-41751]; [TIN2016-79717-R]

Available from: 2018-01-22 Created: 2018-01-22 Last updated: 2018-02-21

Open Access in DiVA

fulltext(3740 kB)44 downloads
File information
File name FULLTEXT01.pdfFile size 3740 kBChecksum SHA-512
cb5483d60411d1f4a396ae30324630878e0e8b3e315de18832c2de615e5b48a4df38de2c2736a3e6debfe7309fd178b8014183d094603d3a874276c7b969accb
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Khan, FahadFelsberg, Michael
By organisation
Computer VisionFaculty of Science & Engineering
In the same journal
Machine Vision and Applications
Computer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar
Total: 44 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 256 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf