Unsupervised monocular depth estimation with multi-scale structural similarity powered loss function / Ali Kohan

Depth Estimation refers to a set of techniques and algorithms that aim to obtain a representation of spatial information of a scene. Nowadays specific hardware such as sensors, radars and multiple-view-recording cameras are being used in order to acquire depth data of a scene. Modern approaches use...

Full description

Saved in:
Bibliographic Details
Main Author: Ali, Kohan
Format: Thesis
Published: 2020
Subjects:
Online Access:http://studentsrepo.um.edu.my/14369/2/Ali_Kohan.pdf
http://studentsrepo.um.edu.my/14369/1/Ali_Kohan.pdf
http://studentsrepo.um.edu.my/14369/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Depth Estimation refers to a set of techniques and algorithms that aim to obtain a representation of spatial information of a scene. Nowadays specific hardware such as sensors, radars and multiple-view-recording cameras are being used in order to acquire depth data of a scene. Modern approaches use deep learning to address this task by trying to learn depth information in a supervised manner. However, this approach requires a large amount ground-truth data for a particular scene so that a model can be trained successfully. Also preparing ground-truth data for a range of environments is a challenging and expensive task to accomplish. Most recent works in this context have proposed self-supervised learning approaches, where they implicitly infer the target data from a stereo pair of images and use that self-obtained target data to train a deep neural network to learn disparities of the two views from the image pair. Disparities between two horizontal views of a same object, says all about how much that object moves on the horizontal line from one view to the other. Predicting the disparities will help calculate the depth data of the scene using simple geometric formulas. This approach however has shown some flaws in estimating depth on specular and transparent surfaces, where they end up predicting inconsistent depth for such surfaces. In this work a novel training objective is proposed, where a deep convolutional neural network learns to predict depth from a single image, where it improves the quality of depth prediction for specular and transparent surfaces. This proposed method follows the previous works that try to reconstruct the right-view of a scene, given the left one. On top of that, having considered the importance of loss layers in the performance of neural networks, it suggests a new image reconstruction and matching loss function that is aimed to improve depth estimation consistency on specular and transparent surfaces. The proposed loss function is perceptually motivated by the human visual system, assuming that it will help increase image reconstruction quality while maintaining key structures of a scene; hoping that it will impact directly on depth prediction which resolves the aforementioned deficiencies of the predecessor works.