GitHub - GenLLMGuard/BackdoorDetection: This repository contains the code for the paper titled "GenLLMGuard: Detecting Backdoors in LLMs for Open-Ended Text Generation Through Trigger Inversion".

GenLLMGuard / BackdoorDetection Public

Notifications You must be signed in to change notification settings
Fork 0
Star 1

This repository contains the code for the paper titled "GenLLMGuard: Detecting Backdoors in LLMs for Open-Ended Text Generation Through Trigger Inversion".

1 star 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
LICENSE		LICENSE
README.md		README.md
Sentence_invert.py		Sentence_invert.py
requirements.txt		requirements.txt

Repository files navigation

Follow these steps:

Clone the repository: git clone https://github.com/GenLLMGuard/BackdoorDetection.git
Navigate to the directory: cd BackdoorDetection
Install dependencies (optional): pip install -r requirements.txt
Import the function: from Sentence_invert import run_sentence_invert
Run the function: best_sentence, intermediate_sents, best_sent_norm_attention_wghts = run_sentence_invert(model, tokenizer, user_prompt='User:')

Additional Information

The model and tokenizer need to be loaded first and provided to the function. If the LLM is fine-tuned to have a specific prompting structure (e.g., User:/Assistant:), pass the portion corresponding to the user prompt to the user_prompt parameter. By default, user_prompt is set to 'User:'.

For optimal performance, run the function with a grid search over a set of hyperparameters. Varying alpha_2 and alpha_3 should be sufficient, as this accommodates differences in dictionary size and attention mechanisms between LLMs. The remaining hyperparameters do not require adjustment. Recommended values for the hyperparameters are provided in the paper.

alpha_2: Weight of the diversity loss. Defaults to 0.5.

alpha_3: Weight of the attention loss. Defaults to 0.5.

len_opt: Number of tokens updated in each optimization iteration. Defaults to 50.

num_iterations: Total number of optimization iterations. Defaults to 200.

len_seq: Overall number of tokens in the trainable sentence. Defaults to 200.

About

This repository contains the code for the paper titled "GenLLMGuard: Detecting Backdoors in LLMs for Open-Ended Text Generation Through Trigger Inversion".

security llm backdoor-detection

Report repository

Languages

Python 100.0%