Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Confidence Propagation through CNNs for Guided Sparse Depth Regression
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0003-3292-7153
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0002-6096-3648
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
2020 (English)In: IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, Vol. 42, no 10Article in journal (Refereed) Published
Abstract [en]

Generally, convolutional neural networks (CNNs) process data on a regular grid, e.g. data generated by ordinary cameras. Designing CNNs for sparse and irregularly spaced input data is still an open research problem with numerous applications in autonomous driving, robotics, and surveillance. In this paper, we propose an algebraically-constrained normalized convolution layer for CNNs with highly sparse input that has a smaller number of network parameters compared to related work. We propose novel strategies for determining the confidence from the convolution operation and propagating it to consecutive layers. We also propose an objective function that simultaneously minimizes the data error while maximizing the output confidence. To integrate structural information, we also investigate fusion strategies to combine depth and RGB information in our normalized convolution network framework. In addition, we introduce the use of output confidence as an auxiliary information to improve the results. The capabilities of our normalized convolution network framework are demonstrated for the problem of scene depth completion. Comprehensive experiments are performed on the KITTI-Depth and the NYU-Depth-v2 datasets. The results clearly demonstrate that the proposed approach achieves superior performance while requiring only about 1-5% of the number of parameters compared to the state-of-the-art methods.

Place, publisher, year, edition, pages
IEEE, 2020. Vol. 42, no 10
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:liu:diva-161086DOI: 10.1109/TPAMI.2019.2929170ISI: 000567471300008OAI: oai:DiVA.org:liu-161086DiVA, id: diva2:1362784
Note

Funding agencies: Vinnova through grant CYCLAVinnova; Swedish Research CouncilSwedish Research Council [2018-04673]; VR starting grant [2016-05543]

Available from: 2019-10-21 Created: 2019-10-21 Last updated: 2025-02-07
In thesis
1. Uncertainty-Aware Convolutional Neural Networks for Vision Tasks on Sparse Data
Open this publication in new window or tab >>Uncertainty-Aware Convolutional Neural Networks for Vision Tasks on Sparse Data
2021 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Early computer vision algorithms operated on dense 2D images captured using conventional monocular or color sensors. Those sensors embrace a passive nature providing limited scene representations based on light reflux, and are only able to operate under adequate lighting conditions. These limitations hindered the development of many computer vision algorithms that require some knowledge of the scene structure under varying conditions. The emergence of active sensors such as Time-of-Flight (ToF) cameras contributed to mitigating these limitations; however, they gave a rise to many novel challenges, such as data sparsity that stems from multi-path interference, and occlusion.

Many approaches have been proposed to alleviate these challenges by enhancing the acquisition process of ToF cameras or by post-processing their output. Nonetheless, these approaches are sensor and model specific, requiring an individual tuning for each sensor. Alternatively, learning-based approaches, i.e., machine learning, are an attractive solution to these problems by learning a mapping from the original sensor output to a refined version of it. Convolutional Neural Networks (CNNs) are one example of powerful machine learning approaches and they have demonstrated a remarkable success on many computer vision tasks. Unfortunately, CNNs naturally operate on dense data and cannot efficiently handle sparse data from ToF sensors.

In this thesis, we propose a novel variation of CNNs denoted as the Normalized Convolutional Neural Networks that can directly handle sparse data very efficiently. First, we formulate a differentiable normalized convolution layer that takes in sparse data and a confidence map as input. The confidence map provides information about valid and missing pixels to the normalized convolution layer, where the missing values are interpolated from their valid vicinity. Afterwards, we propose a confidence propagation criterion that allows building cascades of normalized convolution layers similar to the standard CNNs. We evaluated our approach on the task of unguided scene depth completion and achieved state-of-the-art results using an exceptionally small network.

As a second contribution, we investigated the fusion of a normalized convolution network with standard CNNs employing RGB images. We study different fusion schemes, and we provide a thorough analysis for different components of the network. By employing our best fusion strategy, we achieve state-of-the-art results on guided depth completion using a remarkably small network.

Thirdly, to provide a statistical interpretation for confidences, we derive a probabilistic framework for the normalized convolutional neural networks. This framework estimates the input confidence in a self-supervised manner and propagates it to provide a statistically valid output confidence. When compared against existing approaches for uncertainty estimation in CNNs such as Bayesian Deep Learning, our probabilistic framework provides a higher quality measure of uncertainty at a significantly lower computational cost.

Finally, we attempt to employ our framework in a common task in CNNs, namely upsampling. We formulate the upsampling problem as a sparse problem, and we employ the normalized convolutional neural networks to solve it. In comparison to existing approaches, our proposed upsampler is structure-aware while being light-weight. We test our upsampler with various optical flow estimation networks, and we show that it consistently improves the results. When integrated with a recent optical flow network, it sets a new state-of-the-art on the most challenging optical flow dataset.

Abstract [sv]

Tidiga datorseendealgoritmer arbetade med täta 2D-bilder som spelats in i gråskala eller med färgkameror. Dessa är passiva bildsensorer som under gynnsamma ljusförhållanden ger en begränsad scenrepresentation baserad endast på ljusflöde. Dessa begränsningar hämmade utvecklingen av de många datorseendealgoritmer som kräver information om scenens struktur under varierande ljusförhållanden. Utvecklingen av aktiva sensorer såsom kameror baserade på Time-of-Flight (ToF) bidrog till att lindra dessa begränsningar. Dessa gav emellertid istället upphov till många nya utmaningar, såsom bearbetning av gles data kommen av flervägsinterferens samt ocklusion.

Man har försökt tackla dessa utmaningar genom att förbättra insamlingsprocessen i TOFkameror eller genom att efterbearbeta deras data. Tidigare föreslagna metoder har dock varit sensor- eller till och med modellspecifika där man måste ställa in varje enskild sensor. Ett attraktivt alternativ är inlärningsbaserade metoder där man istället lär sig förhållandet mellan sensordatan och en förbättrad version av dito. Ett kraftfullt exempel på inlärningsbaserade metoder är neurala faltningsnät (CNNs). Dessa har varit extremt framgångsrika inom datorseende, men förutsätter tyvärr tät data och kan därför inte på ett effektivt sätt bearbeta ToF-sensorernas glesa data.

I denna avhandling föreslår vi en ny variant av faltningsnät som vi kallar normaliserade faltningsnät (eng. Normalized Convolutional Neural Networks) och som direkt kan arbeta med gles data. Först skapar vi ett deriverbart faltningsnätlager baserat på normaliserad faltning som tar in gles data samt en konfidenskarta. Konfidenskartan innehåller information om vilka pixlar vi har mätningar för och vilka som saknar mätningar. Modulen interpolerar sedan pixlar som saknar mätningar baserat på närliggande pixlar för vilka mätningar finns. Därefter föreslår vi ett kriterie för att propagera konfidens vilket tillåter oss att bygga en kaskad av normaliserade faltningslager motsvarande kaskaden av faltningslager i ett faltningsnät. We utvärderade metoden på scendjupkompletteringsproblemet utan färgbilder och uppnådde state-of-the-art-prestanda med ett mycket litet nätverk.

Som ett andra bidrag undersökte vi sammanslagningen av normaliserade faltningsnät med konventionella faltningsnät som arbetar med vanliga färgbilder. We undersöker olika sätt att slå samman näten och ger en grundlig analys för de olika nätverksdelarna. Den bästa sammanslagningsmetoden uppnår state-of-the-art-prestanda på scendjupkompletteringsproblemed med färgbilder, återigen med ett mycket litet nätverk.

Som ett tredje bidrag försöker vi statistiskt tolka prediktionerna från det normaliserade faltningsnätet. Vi härleder ett statistiskt ramverk för detta ändamål där det normala faltningsnätet via självstyrd inlärning lär sig estimera konfidenser och propagera dessa till en statistiskt korrekt sannolikhet. När vi jämför med befintliga metoder för att prediktera osäkerhet i faltningsnät, exempelvis via Bayesiansk djupinlärning, så ger vårt probabilistiska ramverk bättre estimat till en lägre beräkningskostnad.

Slutligen försöker vi använda vårt ramverk för en uppgift man ofta löser med vanliga faltningsnät, nämligen uppsampling. We formulerar uppsamplingsproblemet som om vi fått in gles data och löser det med normaliserade faltningsnät. Jämfört med befintliga metoder är den föreslagna metoden både medveten om lokal bildstruktur och lättviktig. Vi testar vår uppsamplare diverse optisktflödesnät och visar att den konsekvent ger förbättrade resultat. När vi integrerar den med ett nyligen föreslaget optisktflödesnät slår vi alla befintliga metoder för estimering av optiskt flöde.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2021. p. 59
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2123
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-175307 (URN)10.3384/diss.diva-175307 (DOI)9789179297015 (ISBN)
Public defence
2021-06-18, Online through Zoom (contact carina.e.lindstrom@liu.se) and Ada Lovelace, B Building, Campus Valla, Linköping, 13:00 (English)
Opponent
Supervisors
Funder
Swedish Research Council, 2018-04673Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2021-05-26 Created: 2021-04-28 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

fulltext(3821 kB)775 downloads
File information
File name FULLTEXT01.pdfFile size 3821 kBChecksum SHA-512
cd240d48deef109da41ed966ad91872d95ff9a13b2b78d9460dba76f2e4094c15f3966d236055164095b7e85ff160ffd7434c61fc88f958f8d04c6b3f5ab3bde
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Eldesokey, AbdelrahmanFelsberg, MichaelKhan, Fahad Shahbaz
By organisation
Computer VisionFaculty of Science & Engineering
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar
Total: 775 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 473 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf