Skip to content

[ICCV 2023] Pytorch implementation of "Category-aware Allocation Transformer for Weakly Supervised Object Localization".

Notifications You must be signed in to change notification settings

zhiweichen0012/CATR

Repository files navigation

Category-aware Allocation Transformer for Weakly Supervised Object Localization (ICCV 2023)

PyTorch implementation of ''Category-aware Allocation Transformer for Weakly Supervised Object Localization''.

📋 Table of content

  1. 📎 Paper Link
  2. 💡 Abstract
  3. 📖 Method
  4. 📃 Requirements
  5. ✏️ Usage
    1. Start
    2. Prepare Datasets
    3. Model Zoo
    4. Training
    5. Inference
  6. 🔍 Citation
  7. ❤️ Acknowledgement

📎 Paper Link

Category-aware Allocation Transformer for Weakly Supervised Object Localization (link)

  • Authors: Zhiwei Chen, Jinren Ding, Liujuan Cao, Yunhang Shen, Shengchuan Zhang, Guannan Jiang, Rongrong Ji
  • Institution: Xiamen University, Xiamen, China. Tencent Youtu Lab, Shanghai, China. CATL, China

💡 Abstract

Weakly supervised object localization (WSOL) aims to localize objects based on only image-level labels as supervision. Recently, transformers have been introduced into WSOL, yielding impressive results. The self-attention mechanism and multilayer perceptron structure in transformers preserve long-range feature dependency, facilitating complete localization of the full object extent. However, current transformer-based methods predict bounding boxes using category-agnostic attention maps, which may lead to confused and noisy object localization. To address this issue, we propose a novel Category-aware Allocation TRansformer (CATR) that learns category-aware representations for specific objects and produces corresponding category-aware attention maps for object localization. First, we introduce a Category-aware Stimulation Module (CSM) to induce learnable category biases for self-attention maps, providing auxiliary supervision to guide the learning of more effective transformer representations. Second, we design an Object Constraint Module (OCM) to refine the object regions for the category-aware attention maps in a self-supervised manner. Extensive experiments on the CUB-200-2011 and ILSVRC datasets demonstrate that the proposed CATR achieves significant and consistent performance improvements over competing approaches.

📖 Method


The architecture of the proposed CATR. It consists of a vision transformer backbone, a category-aware stimulation module (CSM), and an object constraint module (OCM).

📃 Requirements

  • PyTorch==1.10.1
  • torchvision==0.11.2
  • timm==0.4.12

✏️ Usage

Start

git clone git@github.com:zhiweichen0012/CATR.git
cd CATR

Prepare Datasets

The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Model Zoo

We provide the trained CATR models.

Name Loc. Acc@1 Loc. Acc@5 URL
CATR_CUB (This repository) 80.066 91.992 model
CATR_ILSVRC (This repository) 56.976 66.794 model

Training

To train CATR on CUB with 4 GPUs run:

bash scripts/train.sh deit_small_patch16_224_CATR_cub CUB 80 output_ckpt/CUB

To train CATR on ILSVRC with 4 GPUs run:

bash scripts/train.sh deit_small_patch16_224_CATR_imnet IMNET 14 output_ckpt/IMNET

NOTE: Please check the paths to the "torchrun" command, the dataset, and the pre-training weights in the scripts/train.sh.

Inference

To test the CUB models, you can run:

bash scripts/test.sh deit_small_patch16_224_CATR_cub CUB /path/to/CATR_CUB_model

To test the ILSVRC models, you can run:

bash scripts/test.sh deit_small_patch16_224_CATR_imnet IMNET /path/to/LCTR_IMNET_model

NOTE: Please check the paths to the "python" command and the dataset in the scripts/test.sh.

🔍 Citation

@inproceedings{chen2023category,
  title={Category-aware Allocation Transformer for Weakly Supervised Object Localization},
  author={Chen, Zhiwei and Ding, Jinren and Cao, Liujuan and Shen, Yunhang and Zhang, Shengchuan and Jiang, Guannan and Ji, Rongrong},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={6643--6652},
  year={2023}
}

❤️ Acknowledgement

We use deit and their pre-trained weights as the backbone. Many thanks to their brilliant works!

About

[ICCV 2023] Pytorch implementation of "Category-aware Allocation Transformer for Weakly Supervised Object Localization".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published