AgriCLIP: Adapting CLIP for Agriculture and Livestock via Domain-Specialized Cross-Model Alignment

Umair Nawaz, Awais Muhammad, Hanan Gani, Muzammal Naseer, Fahad Khan, Salman Khan, Rao M. Anwer

📢 Latest Updates

Oct-03-24: AgriCLIP paper, pretraining dataset, and the code are released.

Overview

We present AgriCLIP, a vision-language foundational model dedicated to the domain of agriculture and livestock. First, we propose a large-scale dataset, named ALive, that leverages customized prompt generation strategy to overcome the scarcity of expert annotations. Our ALive dataset covers crops, livestock, and fishery, with around 600,000 image-text pairs. Second, we propose a training pipeline that integrates both contrastive and self-supervised learning to learn both global semantic and local fine-grained domain-specialized features. Experiments on diverse set of 20 downstream tasks demonstrate the effectiveness of AgriCLIP framework.

🏆 Contributions

Our primary contribution is the creation of a large, diverse image-text dataset derived solely from vision-based agricultural datasets.
Our second contribution is a training pipeline that combines image-text contrastive and image-only self-supervised learning to boost global semantic features with fine-grained visual details.
We followed three-stage training pipeline, combining contrastive learning, DINO-based training, and encoders alignment to capture both global semantic and local fine-grained features.
We conduct comprehensive evaluation on different downstream tasks demonstrating AgriCLIP's effectiveness in zero-shot performance.

📂 ALive Dataset Access

We gather 25 training datasets across crops, fish, and livestock, creating the Agriculture and Livestock (ALive) dataset with 600k images covering a wide range of conditions. This includes various crop growth stages, classifications, and different farming environments for animals and fish. Next, we design a customized prompt generation strategy where the text based on dataset and class-level information is leveraged to provide context and fine-grained details for each image. For instance, instead of using a generic CLIP prompt like “a photo of a boron-deficient leaf,” we craft prompts like “a photo of a leaf with boron deficiency, characterized by yellow patches and curled edges.” We then use GPT-4 to generate diverse variation of these prompts.

📥 Download the Pre-Training Dataset: Access our pre-training dataset: ALive Dataset.

To evaluate the performance of AgriCLIP, we assemble a set of 20 datasets (Downstream data) to test the model’s ability to generalize to unseen concepts. The evaluation set is entirely disjoint from the ALive pre-training set.

📥 Download the Downstream data: Access our downstream dataset: Downstream Dataset.

🔧 Installation

We recommend setting up a conda environment for the project:

conda create --name=agriclip python=3.10
conda activate agriclip

git clone https://github.com/umair1221/AgriCLIP.git
cd AgriCLIP

pip install -r requirements.txt

export PYTHONPATH="./:$PYTHONPATH"

🚋 Training

1. Prepare data

Please download the dataset from ALive Dataset.

After downloading, the next step is to get the features representations for both the models i.e., the DINO and the CLIP. Then run the following command to get the aligned model as an output which will be then used for the zero-shot evaluation.

python AgriCLIP_alignment/train_linear_aligner.py --data-path "/path/to/your/dataset" \
                               --dino-weights-path "/path/to/your/dino_pretrain.pth" \
                               --clip-weights-path "/path/to/your/dino_pretrain.pth" \
                               --path-dino-features "/path/to/your/dino_features.npy" \
                               --path-clip-features "/path/to/your/clip_features.npy" \
                               --output-model-path "./path/to/save/aligned_model.pth"

🔧 Download Downstream Dataset

Downstream datasets can either be downloaded manually from here Downstream-Data or by using the script below:

pip install gdown 

python Dataset/download_downstream.py --output-dir "/path/to/your/dataset/storage"

💿 Perform Zero-Shot Classification on AgriCLIP

Please use the below command to perform zero-shot inference on AgriCLIP.

python AgriCLIP_alignment/AgriClip_zeroshot.py --dataset-name "Banana Deficiency" \
              --data-path "/path/to/dataset" \
              --dino-path "Weights/dino_pretrain.pth" \
              --aligner-path "/Weights/Aligned_Models/Agri_Dino_aligner_DPT_CPT.pth" \
              --batch-size 32 \
              --num-workers 4

💿 Model Zoo

Model Name	Weights
DINO	Pre-Trained Dino Weights
CLIP	Pre-Trained CLIP Weights
AgriCLIP	Aligned AgriCLIP Weights with Pre-Trained DINO and CLIP
AgriCLIP	Aligned AgriCLIP Weights with Pre-Trained DINO and Default CLIP
AgriCLIP	Aligned AgriCLIP Weights with Default DINO and Pre-Trained CLIP
AgriCLIP	Aligned AgriCLIP Weights with Default DINO and Default CLIP

Feature Representations

Model Name	Weights
DINO	Features representations of ALive Data for alignment purpose
CLIP	Features representations of ALive Data for alignment purpose

Acknowledgements 🙏

Text2Concept: Our approach is inspired from this work. We are thankful for their Cross-Model alignment code.
Dino: Provides with the capability of using self-supervised training.
CLIP: A good resource for zero-shot classification using text prompts.

📜 Citation

    @misc{nawaz2024agriclip,
      title={AgriCLIP: Adapting CLIP for Agriculture and Livestock via Domain-Specialized Cross-Model Alignment}, 
      author={Umair Nawaz and Muhammad Awais and Hanan Gani and Muzammal Naseer and Fahad Khan and Salman Khan and Rao Muhammad Anwer},
      year={2024},
      eprint={2410.01407},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.01407}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgriCLIP: Adapting CLIP for Agriculture and Livestock via Domain-Specialized Cross-Model Alignment

Umair Nawaz, Awais Muhammad, Hanan Gani, Muzammal Naseer, Fahad Khan, Salman Khan, Rao M. Anwer

📢 Latest Updates

Overview

🏆 Contributions

📂 ALive Dataset Access

🔧 Installation

🚋 Training

🔧 Download Downstream Dataset

💿 Perform Zero-Shot Classification on AgriCLIP

💿 Model Zoo

Feature Representations

Acknowledgements 🙏

📜 Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
AgriCLIP_alignment		AgriCLIP_alignment
Dataset		Dataset
Weights		Weights
dino		dino
images		images
open_clip		open_clip
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

umair1221/AgriCLIP

Folders and files

Latest commit

History

Repository files navigation

AgriCLIP: Adapting CLIP for Agriculture and Livestock via Domain-Specialized Cross-Model Alignment

Umair Nawaz, Awais Muhammad, Hanan Gani, Muzammal Naseer, Fahad Khan, Salman Khan, Rao M. Anwer

📢 Latest Updates

Overview

🏆 Contributions

📂 ALive Dataset Access

🔧 Installation

🚋 Training

🔧 Download Downstream Dataset

💿 Perform Zero-Shot Classification on AgriCLIP

💿 Model Zoo

Feature Representations

Acknowledgements 🙏

📜 Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages