Regularization of deep neural network with batch contrastive loss

Neural networks have become deeper in recent years and this has improved its capacity to handle more complex tasks. However, deep neural network has more parameters and is easier to overfit, especially when training samples are insufficient. In this paper, we present a new regularization technique c...

Full description

Saved in:
Bibliographic Details
Main Authors: Tanveer, Muhammad, Tan, Hung-Khoon, Ng, Hui-Fuang, Leung, Maylor Karhang, Chuah, Joon Huang
Format: Article
Published: Institute of Electrical and Electronics Engineers 2021
Subjects:
Online Access:http://eprints.um.edu.my/28105/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Neural networks have become deeper in recent years and this has improved its capacity to handle more complex tasks. However, deep neural network has more parameters and is easier to overfit, especially when training samples are insufficient. In this paper, we present a new regularization technique called batch contrastive regularization to improve generalization performance. The loss function is based on contrastive loss which enforces intra-class compactness and inter-class separability of batch samples. We explore three different contrastive losses: (1) the center contrastive loss which regularizes based on distances between data points and their corresponding class centroid, (2) the sample contrastive loss which is based on batch sample-pair distances, and (3) the multicenter loss which is similar to center contrastive loss except that the cluster centers are discovered from training. The proposed network has two heads, one for classification and the other for regularization. The regularization head is discarded during inference. We also introduce bag sampling to ensure that all classes in a batch are well represented. The performance of the proposed architecture is evaluated on the CIFAR-10 and CIFAR-100 datasets. Our experiments show that network regularized by batch contrastive loss display impressive generalization performance over a wide variety of classes, yielding more than 11% improvement for ResNet50 on CIFAR-100 when trained from scratch.