Tip
If you have a relevant paper not included in the library, or have any clarification about the content of the paper, please contact us!
A comprehensive list of papers about 'Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities'.
Model merging is an efficient empowerment technique in the machine learning community that does not require the collection of raw training data and does not require expensive computation. As model merging becomes increasingly prevalent across various fields, it is crucial to understand the available model merging techniques comprehensively. However, there is a significant gap in the literature regarding a systematic and thorough review of these techniques. To address this gap, this survey provides a comprehensive overview of model merging methods and theories, their applications in various domains and settings, and future research directions. Specifically, we first propose a new taxonomic approach that exhaustively discusses existing model merging methods. Secondly, we discuss the application of model merging techniques in large language models, multimodal large language models, and 10+ machine learning subfields, including continual learning, multi-task learning, few-shot learning, etc. Finally, we highlight the remaining challenges of model merging and discuss future research directions.
If you find our paper or this resource helpful, please consider cite:
@article{Survery_ModelMerging_2024,
title={Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities},
author={Yang, Enneng and Shen, Li and Guo, Guibing and Wang, Xingwei and Cao, Xiaochun and Zhang, Jie and Tao, Dacheng},
journal={arXiv preprint arXiv:2408.07666},
year={2024}
}
Thanks!
- Awesome-Model-Merging-Methods-Theories-Applications
- Advanced Methods
- Application of Model Merging in Foundation Models
- Application of Model Merging in Different Machine Learning Subfields
- Other Applications
Paper Title | Year | Conference/Journal |
---|---|---|
Fine-Tuning Linear Layers Only Is a Simple yet Effective Way for Task Arithmetic | 2024 | Arxiv |
Tangent Transformers for Composition,Privacy and Removal | 2024 | ICLR |
Parameter Efficient Multi-task Model Fusion with Partial Linearization | 2024 | ICLR |
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models | 2023 | NeurIPS |
Paper Title | Year | Conference/Journal |
---|---|---|
Knowledge fusion of large language models | 2024 | ICLR |
Knowledge Fusion of Chat LLMs: A Preliminary Technical Report | 2024 | Arxiv |
On Cross-Layer Alignment for Model Fusion of Heterogeneous Neural Networks | 2023 | ICASSP |
GAN Cocktail: mixing GANs without dataset access | 2022 | ECCV |
Paper Title | Year | Conference/Journal |
---|---|---|
Weight Scope Alignment: A Frustratingly Easy Method for Model Merging | 2024 | Arxiv |
Paper Title | Year | Conference/Journal |
---|---|---|
Composing parameter-efficient modules with arithmetic operation | 2023 | NeurIPS |
Editing models with task arithmetic | 2023 | ICLR |
Model fusion via optimal transport | 2020 | NeurIPS |
Weight averaging for neural networks and local resampling schemes | 1996 | AAAI Workshop |
Paper Title | Year | Conference/Journal |
---|---|---|
Representation Surgery for Multi-Task Model Merging | 2024 | ICML |
Paper Title | Year | Conference/Journal |
---|---|---|
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation | 2024 | AAAI |
Mitigating Social Biases in Language Models through Unlearning | 2024 | Arxiv |
Fine-Grained Detoxification via Instance-Level Prefixes for Large Language Models | 2024 | Arxiv |
Composing Parameter-Efficient Modules with Arithmetic Operation | 2023 | NeurIPS |
Editing models with task arithmetic | 2023 | ICLR |
Paper Title | Year | Conference/Journal |
---|---|---|
Strong Copyright Protection for Language Models via Adaptive Model Fusion | 2024 | ICML |
Avoiding Copyright Infringement via Machine Unlearning | 2024 | Arxiv |
Towards Safer Large Language Models through Machine Unlearning | 2024 | ACL |
Editing models with task arithmetic | 2023 | ICLR |
Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Model | 2023 | Arxiv |
Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion | 2023 | Arxiv |
Paper Title | Year | Conference/Journal |
---|---|---|
DEM: Distribution Edited Model for Training with Mixed Data Distributions | 2024 | Arxiv |
Checkpoint Merging via Bayesian Optimization in LLM Pretraining | 2024 | Arxiv |
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning | 2023 | ACL |
Early Weight Averaging meets High Learning Rates for LLM Pre-training | 2023 | NeurIPS Workshop |
Stop wasting my time! saving days of imagenet and bert training with latest weight averaging | 2022 | NeurIPS Workshop |
Fusing finetuned models for better pretraining | 2022 | Arxiv |
Paper Title | Year | Conference/Journal |
---|---|---|
Jointly training large autoregressive multimodal models | 2024 | ICLR |
Model Composition for Multimodal Large Language Models | 2024 | ACL |
Ļ-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation | 2023 | ICML |
An Empirical Study of Multimodal Model Merging | 2023 | EMNLP |
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks | 2023 | TMLR |
Paper Title | Year | Conference/Journal |
---|---|---|
Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification | 2024 | ICASSP Workshop |
Paper Title | Year | Conference/Journal |
---|---|---|
Diffusion Soup: Model Merging for Text-to-Image Diffusion Models | 2024 | Arxiv |
MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models | 2024 | Arxiv |
MoLE: Mixture of LoRA Experts | 2024 | ICLR |
Merging loras | 2023 | (github) |
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs | 2023 | Arxiv |
GAN Cocktail: mixing GANs without dataset access | 2022 | ECCV |
Paper Title | Year | Conference/Journal |
---|---|---|
Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better | 2024 | Arxiv |
A Unified Module for Accelerating STABLE-DIFFUSION: LCM-LORA | 2024 | Arxiv |
Paper Title | Year | Conference/Journal |
---|---|---|
SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data | 2024 | Arxiv |
Paper Title | Year | Conference/Journal |
---|---|---|
Task Prompt Vectors: Effective Initialization through Multi-Task Soft-Prompt Transfer | 2024 | Arxiv |
Evolutionary optimization of model merging recipes | 2024 | Arxiv |
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch | 2024 | ICML |
Representation Surgery for Multi-Task Model Merging | 2024 | ICML |
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts | 2024 | ICML |
ZipIt! Merging Models from Different Tasks without Training | 2024 | ICLR |
AdaMerging: Adaptive Model Merging for Multi-Task Learning | 2024 | ICLR |
Resolving Interference When Merging Models | 2023 | NeurIPS |
Editing models with task arithmetic | 2023 | ICLR |
Paper Title | Year | Conference/Journal |
---|---|---|
You Only Merge Once: Learning the Pareto Set of Preference-Aware Model Merging | 2024 | Arxiv |
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion | 2024 | Arxiv |
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation | 2024 | Arxiv |
Paper Title | Year | Conference/Journal |
---|---|---|
DEM: Distribution Edited Model for Training with Mixed Data Distributions | 2024 | Arxiv |
Merging Vision Transformers from Different Tasks and Domains | 2023 | Arxiv |
Paper Title | Year | Conference/Journal |
---|---|---|
ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning | 2023 | NeurIPS |
Paper Title | Year | Conference/Journal |
---|---|---|
Training-Free Model Merging for Multi-target Domain Adaptation | 2024 | Arxiv |
Ensemble of averages: Improving model selection and boosting performance in domain generalization | 2022 | NeurIPS |
Swad: Domain generalization by seeking flat minima | 2021 | NeurIPS |
Paper Title | Year | Conference/Journal |
---|---|---|
BadMerging: Backdoor Attacks Against Model Merging | 2024 | CCS |
LoRA-as-an-Attack! Piercing LLM Safety Under The Share-and-Play Scenario | 2024 | ACL |
Paper Title | Year | Conference/Journal |
---|---|---|
Hereās a Free Lunch: Sanitizing Backdoored Models with Model Merge | 2024 | ACL |
Merging Improves Self-Critique Against Jailbreak Attacks | 2024 | Arxiv |
Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging | 2024 | Arxiv |
Revisiting adapters with adversarial training | 2023 | ICLR |
Seasoning model soups for robustness to adversarial and natural distribution shifts | 2023 | CVPR |
Star History
We welcome all researchers to contribute to this repository 'model merging in foundation models or machine learning'.
If you have a related paper that was not added to the library, please contact us.