实物特征: Loop and distillation: Attention weights fusion transformer for fine‐grained representation

Loop and distillation: Attention weights fusion transformer for fine‐grained representation

Learning subtle discriminative feature representation plays a significant role in Fine-Grained Visual Categorisation (FGVC). The vision transformer (ViT) achieves promising performance in the traditional image classification filed due to its multi-head self-attention mechanism. Unfortunately, ViT ca...

全面介绍

Saved in:

书目详细资料
Main Authors:	Sun, Fayou, Ngo, Hea Choon, Zuqiang, Meng, Sek, Yong Wee
格式:	Article
语言:	English
出版:	John Wiley & Sons Ltd 2023
在线阅读:	http://eprints.utem.edu.my/id/eprint/27758/2/0130221062024102412871.pdf http://eprints.utem.edu.my/id/eprint/27758/ https://ietresearch.onlinelibrary.wiley.com/doi/full/10.1049/cvi2.12181 https://doi.org/10.1049/cvi2.12181
标签:	添加标签没有标签, 成为第一个标记此记录!

实物特征
总结:	Learning subtle discriminative feature representation plays a significant role in Fine-Grained Visual Categorisation (FGVC). The vision transformer (ViT) achieves promising performance in the traditional image classification filed due to its multi-head self-attention mechanism. Unfortunately, ViT cannot effectively capture critical feature regions for FGVC due to only focusing on classification token and adopting the strategy of one-time image input. Besides, the advantage of attention weights fusion is not applied to ViT. To promote the performance of capturing vital regions for FGVC, the authors propose a novel model named RDTrans, which proposes discriminative region with top priority in a recurrent learning way. Specifically, proposed vital regions at each scale will be cropped and amplified as the next input parameters to finally locate the most discriminative region. Furthermore, a distillation learning method is employed to provide better supervision for elevating the generalisation ability. Concurrently, RDTrans can be easily trained end-to-end in a weakly-supervised learning way. Extensive experiments demonstrate that RDTrans yields state-of-the-art performance on four widely used fine-grained benchmarks, including CUB-200-2011, Stanford Cars, Stanford Dogs, and iNat2017.

Loop and distillation: Attention weights fusion transformer for fine‐grained representation

相似书籍