candle-lora

LoRA (low rank adaptation) implemented in Rust for use with Candle. This technique interchanges the fully-trainable layers of the model with new, LoRA layers. These LoRA layers act as a wrapper over the original layers, but freeze the original layers. Because they contain fewer trainable parameters, LoRA allows for more efficient fine-tuning.

However, using a fine-tuned LoRA model for inference will have a negative impact on performance. This is because the original layer must still be used to calculate the outputs. However, for a LoRA model, an algorithm known as weight merging nullifies the added cost of using the fine-tuned LoRA model by merging the LoRA and original weights. Weights may also be unmerged.

Please see our recent paper X-LoRA. We introduce a MoE inspired method to densely gate LoRA adapters powered by a model self-reflection forward pass. For inference, we have created mistral.rs, which is written in Rust and enables inference of X-LoRA and other models including quantized.

Get started

To install, run the following:

cargo add --git https://github.com/EricLBuehler/candle-lora.git candle-lora candle-lora-macro

To allow candle-lora to swap layers, do the following for each model struct
- Derive AutoLoraConvert from candle-lora-macro
- Add the replace_layer_fields attribute macro.
During instantiation of each model struct, call get_lora_model with the appropriate parameters to convert.

Features

Convert Linear, Conv1d, Conv2d, Embedding layers into LoRA layers
- All conversions are implemented in accordance with HuggingFace's official LoRA implementation
Weight merging is implemented to improve inference performance
Weight unmerging
Easy-to-use APIs
Extensible trait-based layer swapping mechanism

Conversion Ergonomics

candle-lora-macro makes using candle-lora as simple as adding 2 macros to your model structs and calling a method!

It is inspired by the simplicity of the Python peft library's get_peft_model method. Together, these macros mean that candle-lora can be added to any candle model with minimal code changes!

LoRA transformers

See transformers from Candle which have LoRA integrated here. Currently, the following transformers have been converted:

llama
mistral
falcon
bert
stable_lm
t5
dinov2
resnet
mpt
blip
starcoder

To use a LoRA transformer, simply replace the model from candle-transformers with its counterpart in candle-lora-transformers!

Saving and loading

candle_lora supports retrieving weights for LoRA adapters via the get_tensors method, defined automatically in #[auto_layer_convert]. This function is meant to be used with candle_core::safetensors::save(). To load, simply load the VarBuilder and pass that to get_lora_model.

candle_lora's weight naming is not compatible with peft yet.

Resources

candle-lora's LoRA conversion implementations are based on HuggingFace's peft library. See the original paper here, as well as Microsoft's implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 228 Commits
.github		.github
candle-lora-examples		candle-lora-examples
candle-lora-macro		candle-lora-macro
candle-lora-transformers		candle-lora-transformers
candle-lora		candle-lora
.gitignore		.gitignore
.typos.toml		.typos.toml
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

candle-lora

Get started

Features

Conversion Ergonomics

LoRA transformers

Saving and loading

Resources

About

Releases

Packages

Contributors 2

Languages

License

EricLBuehler/candle-lora

Folders and files

Latest commit

History

Repository files navigation

candle-lora

Get started

Features

Conversion Ergonomics

LoRA transformers

Saving and loading

Resources

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages