Category-aware Allocation Transformer for Weakly Supervised Object Localization (ICCV 2023)

PyTorch implementation of ''Category-aware Allocation Transformer for Weakly Supervised Object Localization''.

📋 Table of content

📎 Paper Link
💡 Abstract
📖 Method
📃 Requirements
✏️ Usage
🔍 Citation
❤️ Acknowledgement

📎 Paper Link

Category-aware Allocation Transformer for Weakly Supervised Object Localization (link)

Authors: Zhiwei Chen, Jinren Ding, Liujuan Cao, Yunhang Shen, Shengchuan Zhang, Guannan Jiang, Rongrong Ji
Institution: Xiamen University, Xiamen, China. Tencent Youtu Lab, Shanghai, China. CATL, China

💡 Abstract

Weakly supervised object localization (WSOL) aims to localize objects based on only image-level labels as supervision. Recently, transformers have been introduced into WSOL, yielding impressive results. The self-attention mechanism and multilayer perceptron structure in transformers preserve long-range feature dependency, facilitating complete localization of the full object extent. However, current transformer-based methods predict bounding boxes using category-agnostic attention maps, which may lead to confused and noisy object localization. To address this issue, we propose a novel Category-aware Allocation TRansformer (CATR) that learns category-aware representations for specific objects and produces corresponding category-aware attention maps for object localization. First, we introduce a Category-aware Stimulation Module (CSM) to induce learnable category biases for self-attention maps, providing auxiliary supervision to guide the learning of more effective transformer representations. Second, we design an Object Constraint Module (OCM) to refine the object regions for the category-aware attention maps in a self-supervised manner. Extensive experiments on the CUB-200-2011 and ILSVRC datasets demonstrate that the proposed CATR achieves significant and consistent performance improvements over competing approaches.

📖 Method

The architecture of the proposed CATR. It consists of a vision transformer backbone, a category-aware stimulation module (CSM), and an object constraint module (OCM).

📃 Requirements

PyTorch==1.10.1
torchvision==0.11.2
timm==0.4.12

✏️ Usage

Start

git clone git@github.com:zhiweichen0012/CATR.git
cd CATR

Prepare Datasets

CUB (http://www.vision.caltech.edu/datasets/cub_200_2011/)
ILSVRC (https://www.image-net.org/challenges/LSVRC/)

The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Model Zoo

We provide the trained CATR models.

Name	Loc. Acc@1	Loc. Acc@5	URL
CATR_CUB (This repository)	80.066	91.992	model
CATR_ILSVRC (This repository)	56.976	66.794	model

Training

To train CATR on CUB with 4 GPUs run:

bash scripts/train.sh deit_small_patch16_224_CATR_cub CUB 80 output_ckpt/CUB

To train CATR on ILSVRC with 4 GPUs run:

bash scripts/train.sh deit_small_patch16_224_CATR_imnet IMNET 14 output_ckpt/IMNET

NOTE: Please check the paths to the "torchrun" command, the dataset, and the pre-training weights in the scripts/train.sh.

Inference

To test the CUB models, you can run:

bash scripts/test.sh deit_small_patch16_224_CATR_cub CUB /path/to/CATR_CUB_model

To test the ILSVRC models, you can run:

bash scripts/test.sh deit_small_patch16_224_CATR_imnet IMNET /path/to/LCTR_IMNET_model

NOTE: Please check the paths to the "python" command and the dataset in the scripts/test.sh.

🔍 Citation

@inproceedings{chen2023category,
  title={Category-aware Allocation Transformer for Weakly Supervised Object Localization},
  author={Chen, Zhiwei and Ding, Jinren and Cao, Liujuan and Shen, Yunhang and Zhang, Shengchuan and Jiang, Guannan and Ji, Rongrong},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={6643--6652},
  year={2023}
}

❤️ Acknowledgement

We use deit and their pre-trained weights as the backbone. Many thanks to their brilliant works!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Img		Img
scripts		scripts
.gitignore		.gitignore
README.md		README.md
att_CNN.py		att_CNN.py
cait_models.py		cait_models.py
datasets.py		datasets.py
engine.py		engine.py
hubconf.py		hubconf.py
losses.py		losses.py
main.py		main.py
models.py		models.py
resmlp_models.py		resmlp_models.py
samplers.py		samplers.py
utils.py		utils.py
vision_transformer.py		vision_transformer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Category-aware Allocation Transformer for Weakly Supervised Object Localization (ICCV 2023)

📋 Table of content

📎 Paper Link

💡 Abstract

📖 Method

📃 Requirements

✏️ Usage

Start

Prepare Datasets

Model Zoo

Training

Inference

🔍 Citation

❤️ Acknowledgement

About

Releases

Packages

Languages

zhiweichen0012/CATR

Folders and files

Latest commit

History

Repository files navigation

Category-aware Allocation Transformer for Weakly Supervised Object Localization (ICCV 2023)

📋 Table of content

📎 Paper Link

💡 Abstract

📖 Method

📃 Requirements

✏️ Usage

Start

Prepare Datasets

Model Zoo

Training

Inference

🔍 Citation

❤️ Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages