Awesome-Model-Merging-Methods-Theories-Applications

Tip

If you have a relevant paper not included in the library, or have any clarification about the content of the paper, please contact us!

A comprehensive list of papers about 'Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities'.

Abstract

Model merging is an efficient empowerment technique in the machine learning community that does not require the collection of raw training data and does not require expensive computation. As model merging becomes increasingly prevalent across various fields, it is crucial to understand the available model merging techniques comprehensively. However, there is a significant gap in the literature regarding a systematic and thorough review of these techniques. To address this gap, this survey provides a comprehensive overview of model merging methods and theories, their applications in various domains and settings, and future research directions. Specifically, we first propose a new taxonomic approach that exhaustively discusses existing model merging methods. Secondly, we discuss the application of model merging techniques in large language models, multimodal large language models, and 10+ machine learning subfields, including continual learning, multi-task learning, few-shot learning, etc. Finally, we highlight the remaining challenges of model merging and discuss future research directions.

Citation

If you find our paper or this resource helpful, please consider cite:

@article{Survery_ModelMerging_2024,
  title={Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities},
  author={Yang, Enneng and Shen, Li and Guo, Guibing and Wang, Xingwei and Cao, Xiaochun and Zhang, Jie and Tao, Dacheng},
  journal={arXiv preprint arXiv:2408.07666},
  year={2024}
}

Thanks!

Framework

Awesome-Model-Merging-Methods-Theories-Applications

Advanced Methods

Pre-Merging Methods

Linearization Fine-tuning

Paper Title	Year	Conference/Journal
Fine-Tuning Linear Layers Only Is a Simple yet Effective Way for Task Arithmetic	2024	Arxiv
Tangent Transformers for Composition,Privacy and Removal	2024	ICLR
Parameter Efficient Multi-task Model Fusion with Partial Linearization	2024	ICLR
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models	2023	NeurIPS

Architecture Transformation

Paper Title	Year	Conference/Journal
Knowledge fusion of large language models	2024	ICLR
Knowledge Fusion of Chat LLMs: A Preliminary Technical Report	2024	Arxiv
On Cross-Layer Alignment for Model Fusion of Heterogeneous Neural Networks	2023	ICASSP
GAN Cocktail: mixing GANs without dataset access	2022	ECCV

Weight Alignment

Paper Title	Year	Conference/Journal
Equivariant Deep Weight Space Alignment	2024	ICML
Harmony in diversity: Merging neural networks with canonical correlation analysis	2024	ICML
Transformer fusion with optimal transport	2024	ICLR
Layerwise linear mode connectivity	2024	ICLR
Proving linear mode connectivity of neural networks via optimal transport	2024	AISTATS
Training-Free Pretrained Model Merging	2024	CVPR
Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering	2024	Arxiv
C2M3: Cycle-Consistent Multi Model Merging	2024	NeurIPS
PLeaS--Merging Models with Permutations and Least Squares	2024	Arxiv
Rethink Model Re-Basin and the Linear Mode Connectivity	2024	Arxiv
Git Re-Basin: Merging Models modulo Permutation Symmetries	2023	ICLR
Re-basin via implicit Sinkhorn differentiation	2023	CVPR
Plateau in Monotonic Linear Interpolation--A "Biased" View of Loss Landscape for Deep Networks	2023	ICLR
Linear Mode Connectivity of Deep Neural Networks via Permutation Invariance and Renormalization	2023	ICLR
REPAIR: REnormalizing Permuted Activations for Interpolation Repair	2023	ICLR
Going beyond linear mode connectivity: The layerwise linear feature connectivity	2023	NeurIPS
The role of permutation invariance in linear mode connectivity of neural networks	2022	ICLR
What can linear interpolation of neural network loss landscapes tell us?	2022	ICML
Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling	2021	ICML
Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes	2021	ICML
Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances	2021	ICML
Linear Mode Connectivity and the Lottery Ticket Hypothesis	2020	ICML
Optimizing mode connectivity via neuron alignment	2020	NeurIPS
Model fusion via optimal transport	2020	NeurIPS
Uniform convergence may be unable to explain generalization in deep learning	2019	NeurIPS
Explaining landscape connectivity of low-cost solutions for multilayer nets	2019	NeurIPS
Essentially no barriers in neural network energy landscape	2018	ICML
Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs	2018	NeurIPS

Others

Paper Title	Year	Conference/Journal
Weight Scope Alignment: A Frustratingly Easy Method for Model Merging	2024	Arxiv

During Merging Methods

Basic Merging Methods

Paper Title	Year	Conference/Journal
Composing parameter-efficient modules with arithmetic operation	2023	NeurIPS
Editing models with task arithmetic	2023	ICLR
Model fusion via optimal transport	2020	NeurIPS
Weight averaging for neural networks and local resampling schemes	1996	AAAI Workshop

Weighted-based Merging Methods

Paper Title	Year	Conference/Journal
Knowledge Composition using Task Vectors with Learned Anisotropic Scaling	2024	Arxiv
MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic	2024	Arxiv
Checkpoint Merging via Bayesian Optimization in LLM Pretraining	2024	Arxiv
Arcee’s MergeKit: A Toolkit for Merging Large Language Models	2024	Arxiv
Evolutionary optimization of model merging recipes	2024	Arxiv
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts	2024	ACL
AdaMerging: Adaptive Model Merging for Multi-Task Learning	2024	ICLR
Model Merging by Uncertainty-Based Gradient Matching	2024	ICLR
Merging by Matching Models in Task Subspaces	2024	TMLR
Fisher Mask Nodes for Language Model Merging	2024	LREC-COLING
Erasure Coded Neural Network Inference via Fisher Averaging	2024	ISIT
Dataless Knowledge Fusion by Merging Weights of Language Models	2023	ICLR
Merging models with fisher-weighted averaging	2022	NeurIPS

Subspace-based Merging Method

Paper Title	Year	Conference/Journal
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch	2024	ICML
Localizing Task Information for Improved Model Merging and Compression	2024	ICML
Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging	2024	ICLR
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic	2024	Arxiv
Activated Parameter Locating via Causal Intervention for Model Merging	2024	Arxiv
PAFT: A Parallel Training Paradigm for Effective LLM Fine-Tuning	2024	Arxiv
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling	2024	Arxiv
EMR-Merging: Tuning-Free High-Performance Model Merging	2024	Arxiv
DPPA: Pruning Method for Large Language Model to Model Merging	2024	Arxiv
Model breadcrumbs: Scaling multi-task model merging with sparse masks	2023	Arxiv
Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion	2023	Arxiv
Resolving Interference When Merging Models	2023	NeurIPS
Task-Specific Skill Localization in Fine-tuned Language Model	2023	ICML

Routing-based Merging Methods

Paper Title	Year	Conference/Journal
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts	2024	ICML
Learning to Route Among Specialized Experts for Zero-Shot Generalization	2024	ICML
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy	2024	ICLR
Soft merging of experts with adaptive routing	2024	TMLR
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models	2024	Arxiv
Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging	2024	Arxiv
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts	2024	Arxiv
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion	2024	Arxiv

Post-calibration based Methods

Paper Title	Year	Conference/Journal
Representation Surgery for Multi-Task Model Merging	2024	ICML

Theories and Analysis of Model Merging

Paper Title	Year	Conference/Journal
Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities	2024	Arxiv
WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average	2024	Arxiv
On the Emergence of Cross-Task Linearity in Pretraining-Finetuning Paradigm	2024	ICML
Diverse weight averaging for out-of-distribution generalization	2022	NeurIPS
Ensemble of averages: Improving model selection and boosting performance in domain generalization	2022	NeurIPS
The role of permutation invariance in linear mode connectivity of neural networks	2022	ICLR
Swad: Domain generalization by seeking flat minima	2021	NeurIPS
Linear Mode Connectivity and the Lottery Ticket Hypothesis	2020	ICML
Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes	2020	ICLR
Optimizing mode connectivity via neuron alignment	2020	NeurIPS
Uniform convergence may be unable to explain generalization in deep learning	2019	NeurIPS
Parallelizing stochastic gradient descent for least squares regression: mini-batching, averaging, and model misspecification	2018	JMLR
Iterate averaging as regularization for stochastic gradient descent	2018	Arxiv
Essentially no barriers in neural network energy landscape	2018	ICML
Averaging weights leads to wider optima and better generalization	2018	UAI
Train faster, generalize better: Stability of stochastic gradient descent	2016	ICML

Application of Model Merging in Foundation Models

Model Merging in Large Language Model

Human Preference Alignment for LLMs

Paper Title	Year	Conference/Journal
Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning	2024	Arxiv
PAFT: A Parallel Training Paradigm for Effective LLM Fine-Tuning	2024	Arxiv
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch	2024	Arxiv
Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations	2024	Arxiv
Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction	2024	Arxiv
Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment	2024	Arxiv
A safety realignment framework via subspace-oriented model fusion for large language models	2024	Arxiv
Weak-to-strong extrapolation expedites alignment	2024	Arxiv
Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic	2024	Arxiv
Rewarded soups: towards pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards	2023	NeurIPS
Personalized soups: Personalized large language model alignment via post-hoc parameter merging	2023	Arxiv

Detoxifcation of LLMs

Paper Title	Year	Conference/Journal
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation	2024	AAAI
Mitigating Social Biases in Language Models through Unlearning	2024	Arxiv
Fine-Grained Detoxification via Instance-Level Prefixes for Large Language Models	2024	Arxiv
Composing Parameter-Efficient Modules with Arithmetic Operation	2023	NeurIPS
Editing models with task arithmetic	2023	ICLR

Knowledge Unlearning of LLMs

Paper Title	Year	Conference/Journal
Strong Copyright Protection for Language Models via Adaptive Model Fusion	2024	ICML
Avoiding Copyright Infringement via Machine Unlearning	2024	Arxiv
Towards Safer Large Language Models through Machine Unlearning	2024	ACL
Editing models with task arithmetic	2023	ICLR
Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Model	2023	Arxiv
Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion	2023	Arxiv

Faster Training of LLMs

Paper Title	Year	Conference/Journal
DEM: Distribution Edited Model for Training with Mixed Data Distributions	2024	Arxiv
Checkpoint Merging via Bayesian Optimization in LLM Pretraining	2024	Arxiv
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning	2023	ACL
Early Weight Averaging meets High Learning Rates for LLM Pre-training	2023	NeurIPS Workshop
Stop wasting my time! saving days of imagenet and bert training with latest weight averaging	2022	NeurIPS Workshop
Fusing finetuned models for better pretraining	2022	Arxiv

Combine the Capabilities of Expert LLMs

Paper Title	Year	Conference/Journal
LLM Merging: Building LLMs Efficiently through Merging	2024	NeurIPS 2024 Competition Track
Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement	2024	Arxiv
It’s Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization	2024	Arxiv
MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic	2024	Arxiv
PROMETHEUS 2: An Open Source Language Model Specialized in Evaluating Other Language Models	2024	Arxiv
Knowledge fusion of large language models	2024	ICLR
Language models are super mario: Absorbing abilities from homologous models as a free lunch	2024	ICML
Controlled Text Generation via Language Model Arithmetic	2024	ICML
Evolutionary optimization of model merging recipes	2024	Arxiv
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM	2024	Arxiv
Knowledge Fusion of Chat LLMs: A Preliminary Technical Report	2024	Arxiv

Model Merging in Multimodal Large Language Models

Model Merging for Multimodal Fusion

Paper Title	Year	Conference/Journal
Jointly training large autoregressive multimodal models	2024	ICLR
Model Composition for Multimodal Large Language Models	2024	ACL
π-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation	2023	ICML
An Empirical Study of Multimodal Model Merging	2023	EMNLP
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks	2023	TMLR

Model Merging for Cross-Modal Knowledge Transfer

Paper Title	Year	Conference/Journal
Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification	2024	ICASSP Workshop

Model Merging in Image Generative Models

Style Mixing in Generative Models

Paper Title	Year	Conference/Journal
Diffusion Soup: Model Merging for Text-to-Image Diffusion Models	2024	Arxiv
MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models	2024	Arxiv
MoLE: Mixture of LoRA Experts	2024	ICLR
Merging loras	2023	(github)
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs	2023	Arxiv
GAN Cocktail: mixing GANs without dataset access	2022	ECCV

Reducing Training Cost of Generative Models

Paper Title	Year	Conference/Journal
Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better	2024	Arxiv
A Unified Module for Accelerating STABLE-DIFFUSION: LCM-LORA	2024	Arxiv

Enhancing the Faithfulness of Diffusion Models

Paper Title	Year	Conference/Journal
SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data	2024	Arxiv

Application of Model Merging in Different Machine Learning Subfields

Model Merging in Continual Learning

Model Merging to Mitigate Catastrophic Forgetting

Paper Title	Year	Conference/Journal
Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models	2024	ICML
Adaptive Discovering and Merging for Incremental Novel Class Discovery	2024	AAAI
MagMax: Leveraging Model Merging for Seamless Continual Learning	2024	ECCV
Lm-cocktail: Resilient tuning of language models via model merging	2024	ACL Findings
Backward Compatibility During Data Updates by Weight Interpolation	2024	EACL
Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models	2024	Arxiv
Mitigating Catastrophic Forgetting in Language Transfer via Model Merging	2024	Arxiv
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs	2024	Arxiv
WARP: On the Benefits of Weight Averaged Rewarded Policies	2024	Arxiv
A Second-Order perspective on Compositionality and Incremental Learning	2024	Arxiv
DynaMMo: Dynamic Model Merging for Efficient Class Incremental Learning for Medical Images	2024	Arxiv
DAM: Dynamic Adapter Merging for Continual Video QA Learning	2024	Arxiv
Task-Specific Skill Localization in Fine-tuned Language Model	2023	ICML
Tangent model composition for ensembling and continual fine-tuning	2023	ICCV
Task Arithmetic with LoRA for Continual Learning	2023	NeurIPS Workshop
Mitigating the Alignment Tax of RLHF	2023	Arxiv
Robust fine-tuning of zero-shot models	2022	CVPR

Model Merging in Multi-Task/Multi-Objective/Multi-Domain/Auxiliary Learning

Model Merging for Knowledge Transfer in Multi-Task Learning

Paper Title	Year	Conference/Journal
Task Prompt Vectors: Effective Initialization through Multi-Task Soft-Prompt Transfer	2024	Arxiv
Evolutionary optimization of model merging recipes	2024	Arxiv
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch	2024	ICML
Representation Surgery for Multi-Task Model Merging	2024	ICML
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts	2024	ICML
ZipIt! Merging Models from Different Tasks without Training	2024	ICLR
AdaMerging: Adaptive Model Merging for Multi-Task Learning	2024	ICLR
Resolving Interference When Merging Models	2023	NeurIPS
Editing models with task arithmetic	2023	ICLR

Model Merging for Knowledge Transfer in Multi-Objective Optimization

Paper Title	Year	Conference/Journal
You Only Merge Once: Learning the Pareto Set of Preference-Aware Model Merging	2024	Arxiv
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion	2024	Arxiv
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation	2024	Arxiv

Model Merging for Knowledge Transfer in Multi-Domain Learning

Paper Title	Year	Conference/Journal
DEM: Distribution Edited Model for Training with Mixed Data Distributions	2024	Arxiv
Merging Vision Transformers from Different Tasks and Domains	2023	Arxiv

Model Merging for Knowledge Transfer in Auxiliary Learning

Paper Title	Year	Conference/Journal
ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning	2023	NeurIPS

Model Merging in Out-of-Distribution/Domain Generalization

Model Merging for Better Out-of-Distribution Generalization

Paper Title	Year	Conference/Journal
Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging	2024	ICLR
Warm: On the benefits of weight averaged reward models	2024	ICML
Population parameter averaging (papa)	2024	TMLR
Adaptive Stochastic Weight Averaging	2024	JMLR
Scalable Learned Model Soup on a Single GPU: An Efficient Subspace Training Strategy	2024	ECCV
WARP: On the Benefits of Weight Averaged Rewarded Policies	2024	Arxiv
WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average	2024	Arxiv
Model Stock: All we need is just a few fine-tuned models	2024	Arxiv
Lookaround Optimizer: 𝑘 steps around, 1 step average	2023	NeurIPS
Model ratatouille: Recycling diverse models for out-of-distribution generalization	2023	ICML
Trainable Weight Averaging: Efficient Training by Optimizing Historical Solutions	2023	ICLR
Lookaround Optimizer: k steps around, 1 step average	2023	NeurIPS
AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models	2023	EACL
Dart: Diversify aggregate-repeat training improves generalization of neural networks	2023	CVPR
When do flat minima optimizers work?	2022	NeurIPS
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time	2022	ICML
Diverse weight averaging for out-of-distribution generalization	2022	NeurIPS
Robust fine-tuning of zero-shot models	2022	CVPR
Neural networks with late-phase weights	2021	ICLR
Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes Well	2020	ICLR
SWALP: Stochastic weight averaging in low precision training	2019	ICML
Averaging weights leads to wider optima and better generalization	2018	UAI
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results	2017	NeurIPS

Model Merging for Better Domain Generalization

Paper Title	Year	Conference/Journal
Training-Free Model Merging for Multi-target Domain Adaptation	2024	Arxiv
Ensemble of averages: Improving model selection and boosting performance in domain generalization	2022	NeurIPS
Swad: Domain generalization by seeking flat minima	2021	NeurIPS

Model Merging in Federated Learning

Model Merging for Local Knowledge Aggregation

Paper Title	Year	Conference/Journal
DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models	2024	CVPR
FedFisher: Leveraging Fisher Information for One-Shot Federated Learning	2024	AISTATS
lo-fi: distributed fine-tuning without communication	2023	TMLR
Revisiting Weighted Aggregation in Federated Learning with Neural Networks	2023	ICML
Deep neural network fusion via graph matching with applications to model ensemble and federated learning	2022	ICML
Federated Learning with Matched Averaging	2020	ICLR
Tackling the objective inconsistency problem in heterogeneous federated optimization	2020	NeurIPS
Model fusion via optimal transport	2020	NeurIPS
Bayesian nonparametric federated learning of neural networks	2019	ICML
Learning private neural language modeling with attentive aggregation	2019	IJCNN
Communication-Efficient Learning of Deep Networks from Decentralized Data	2017	AISTATS

Model Merging in Zero-shot/Few-shot Learning

Model Merging for Cross-task Generalization in Zero-shot Learning

Paper Title	Year	Conference/Journal
Learning to Route Among Specialized Experts for Zero-Shot Generalization	2024	ICML
Towards Modular LLMs by Building and Reusing a Library of LoRAs	2024	ICML
Chat Vector: A Simple Approach to Equip LLMs With New Language Chat Capabilities	2024	ACL
Unlocking the Potential of Model Merging for Low-Resource Languages	2024	Arxiv
Diffusion Soup: Model Merging for Text-to-Image Diffusion Models	2024	Arxiv
No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement	2024	Arxiv
MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models	2024	Arxiv
AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging	2024	Arxiv
Model Composition for Multimodal Large Language Models	2024	Arxiv
Exploring the Benefits of Training Expert Language Models over Instruction Tuning	2023	ICML
Token-Level Adaptation of LoRA Adapters for Downstream Task Generalization	2023	Arxiv
Language and Task Arithmetic with Parameter-Efficient Layers for Zero-Shot Summarization	2023	Arxiv

Model Merging for Cross-task Generalization in Few-shot Learning

Paper Title	Year	Conference/Journal
LoRA-Flow: Dynamic LoRA Fusion for Large Language Models in Generative Tasks	2024	ACL
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition	2024	COLM
LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed Tasks in the Wild	2024	ACL
Does Combining Parameter-efficient Modules Improve Few-shot Transfer Accuracy?	2024	Arxiv
MerA: Merging pretrained adapters for few-shot learning	2023	Arxiv

Model Merging in Adversarial Learning

Model Merging as an Attack

Paper Title	Year	Conference/Journal
BadMerging: Backdoor Attacks Against Model Merging	2024	CCS
LoRA-as-an-Attack! Piercing LLM Safety Under The Share-and-Play Scenario	2024	ACL

Model Merging as a Defense

Paper Title	Year	Conference/Journal
Here’s a Free Lunch: Sanitizing Backdoored Models with Model Merge	2024	ACL
Merging Improves Self-Critique Against Jailbreak Attacks	2024	Arxiv
Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging	2024	Arxiv
Revisiting adapters with adversarial training	2023	ICLR
Seasoning model soups for robustness to adversarial and natural distribution shifts	2023	CVPR

Other Applications

Paper Title	Year	Conference/Journal
Emotion Arithmetic: Emotional Speech Synthesis via Weight Space Interpolation	2024	Interspeech
Erasure Coded Neural Network Inference via Fisher Averaging	2024	Arxiv
SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging	2024	Arxiv
Scaling Up Personalized Image Aesthetic Assessment via Task Vector Customization	2024	Arxiv
Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks	2024	Arxiv
Task Arithmetic can Mitigate Synthetic-to-Real Gap in Automatic Speech Recognition	2024	Arxiv
Experts Weights Averaging: A New General Training Scheme for Vision Transformers	2023	Arxiv
One Student Knows All Experts Know: From Sparse to Dense	2022	Arxiv

Star History

Contact

We welcome all researchers to contribute to this repository 'model merging in foundation models or machine learning'.

If you have a related paper that was not added to the library, please contact us.

Email: ennengyang@stumail.neu.edu.cn / ennengyang@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
imgs		imgs
README.md		README.md

EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications

Folders and files

Latest commit

History

Repository files navigation