Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception

This is official implementation of our CVPR 2024 paper "Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception"

paper

Updates

[2024-05-15] We are preparing the code and expect to release it before June 19.
[2024-06-11] Initialize the release code.

Requirements

python=3.9
pytorch=2.1.0
lightning=2.1.0

conda create -n py39_pyt210_cu118 python==3.9 -y
conda activate py39_pyt210_cu118

# install pytorch==2.1.0
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia -y
or
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118

# install MinkowskiEngine following https://github.com/NVIDIA/MinkowskiEngine
# for example:
pip install ninja
git clone https://github.com/NVIDIA/MinkowskiEngine
cd MinkowskiEngine
python setup.py install --blas_include_dirs=${CONDA_PREFIX}/include --blas=openblas

# install torch-scatter
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.1.0+cu118.html

pip install -r requirements.txt

Datasets

Download nuScenes dataset from the official link, and put the dataset in {project_root}/datasets/nuscenes
Download the superpixels superpixels_dinov2_ade20k.zip from the BAIDU, and unzip the file under {project_root}/superpixels/nuscenes

the project structure should be like:

{project_root}
|--config
|--downstream
|--model
|--pretrain
|--utils
|--datasets
  |--nuscenes
    |--samples
    |--sweeps
    |--lidarseg
    |--nuScenes-panoptic-v1.0-all
    |--v1.0-trainval
|--superpixels
  |--nuscenes
    |--superpixels_dinov2_ade20k
|--...

Experiments

3D Semantic Segmentation

# 1. pre-train the 3d backbone MinkUNet
CUDA_VISIBLE_DEVICES=0,1 python pretrain_cluster_prototype.py --cfg config/pretrain/csc_minkunet_dinov2_g2b16.yaml
# the {pretrain_weights_path}   will be found in `{project_root}/output/pretrain/nuscenes/cp/v1_1/{year}_{month}_{day}_{hour}_{minute}/final_model_cp_v1_1.pt`

# 2. fine-tune the 3d backbone using our provided script
sh downstream_semseg_finetune.sh 0,1 {pretrain_weights_path} csc_sem_seg

3D Object Detection

#1. pre-train the 3D backbone VoxelNet
CUDA_VISIBLE_DEVICES=0,1 python pretrain_cluster_prototype.py --cfg  config/pretrain/csc_voxelnet_dinov2_g2b16.yaml

#2. fine-tune the VoxelNet using OpenPCDet, https://github.com/open-mmlab/OpenPCDet. 
# Please refer to the TriCC https://openaccess.thecvf.com/content/CVPR2023/html/Pang_Unsupervised_3D_Point_Cloud_Representation_Learning_by_Triangle_Constrained_Contrast_CVPR_2023_paper.html

3D Panoptic Segmentation

# 1. pre-train the 3d backbone Cylinder3D
CUDA_VISIBLE_DEVICES=0,1 python  pretrain_cluster_prototype.py --cfg_file config/pretrain/csc_cylinder3d_dinov2_g2b16.yaml
# the pre-training weights will be found in `{project_root}/output/pretrain/cp_V1_1/panoptic_polarnet_cylinder3d/dinov2_ade20k/{year}_{month}_{day}_{hour}_{minute}/model.pt`

# 2. fine-tune the 3d backbone using our provided script
sh downstream_panseg_finetune.sh 0,1 {pretrain_weights_path} csc_pan_seg

Acknowledgement

The codebase is adapted from SLidR.

Citation

@InProceedings{chen2024building,
   title={Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception},
   author={Chen, Haoming and Zhang, Zhizhong and Qu, Yanyun and Zhang, Ruixin and Tan, Xin and Xie, Yuan},
   booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
   month = {June},
   year= {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
config		config
downstream		downstream
model		model
preprocss		preprocss
pretrain		pretrain
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
downstream_panoptic.py		downstream_panoptic.py
downstream_panseg_finetune.sh		downstream_panseg_finetune.sh
downstream_semantic.py		downstream_semantic.py
downstream_semseg_finetune.sh		downstream_semseg_finetune.sh
evaluate_downstream.py		evaluate_downstream.py
pretrain_cluster_prototype.py		pretrain_cluster_prototype.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception

Updates

Requirements

Datasets

Experiments

3D Semantic Segmentation

3D Object Detection

3D Panoptic Segmentation

Acknowledgement

Citation

About

Releases

Packages

Languages

License

chenhaomingbob/CSC

Folders and files

Latest commit

History

Repository files navigation

Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception

Updates

Requirements

Datasets

Experiments

3D Semantic Segmentation

3D Object Detection

3D Panoptic Segmentation

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages