Orthogonal Model Merging

Sihan Yang, Kexuan Shi, Weiyang Liu

The Chinese University of Hong Kong

FDA Illustration
An intuitive comparison among (a) current model merging, the proposed (b) orthogonal merging and (c) orthogonal-residual decoupling merging.

OrthoMerge: a Geometrically Principled Framework
for Model Merging

Merging finetuned Large Language Models (LLMs) has become increasingly important for integrating diverse capabilities into a single unified model. However, prevailing model merging methods rely on linear arithmetic in Euclidean space, which often destroys the intrinsic geometric properties of pretrained weights, such as hyperspherical energy. To address this, we propose Orthogonal Model Merging (OrthoMerge), a method that performs merging operations on the Riemannian manifold formed by the orthogonal group to preserve the geometric structure of the model’s weights. By mapping task-specific orthogonal matrices learned by Orthogonal Finetuning (OFT) to the Lie algebra, OrthoMerge enables a principled yet efficient integration that takes into account both the direction and intensity of adaptations. In addition to directly leveraging orthogonal matrices obtained by OFT, we further extend this approach to general models finetuned with non-OFT methods (e.g., LoRA, full finetuning) via an Orthogonal-Residual Decoupling strategy. This technique extracts the orthogonal components of expert models by solving the orthogonal Procrustes problem, which are then merged on the manifold of the orthogonal group, while the remaining linear residuals are processed through standard additive merging.

How OrthoMerge Works

OrthoMerge Method

Illustration of OrthoMerge. (a) To merge orthogonal transformations, we first map them to the Lie algebra $\mathfrak{so}(d)$, perform the merging there with magnitude correction to preserve the strength of the transformations, and finally map the result back to the orthogonal group. (b) For general models, we decouple weights into orthogonal and residual components, merging them separately on the Riemannian manifold formed by the orthogonal group and in Euclidean space, respectively.

Intriguing Insights of OrthoMerge

loss_landscape
Loss landscape of the base model, TA, and OrthoMerge.

Better Loss Landscape. We illustrate the locations of the base model, as well as models merged by the classical Euclidean method TA and our OrthoMerge, on the joint loss landscape of all merged tasks. OrthoMerge achieves a more favorable loss location in terms of both optimization direction and magnitude compared to TA. The results validate that performing merging on the Riemannian manifold induced by the orthogonal group can better preserve model knowledge and reduce destructive interference.

Hyperspherical Energy Preservation. OrthoMerge effectively mitigates catastrophic forgetting by preserving hyperspherical energy. Unlike conventional Euclidean-space merging methods that disrupt the model’s geometric structure and increase forgetting, OrthoMerge keeps the hyperspherical energy unchanged. As a result, it not only delivers the best in-domain performance but also generalizes better to out-of-distribution tasks, showing a clear reduction in catastrophic forgetting.

Spectral Regularization. OrthoMerge also brings natural regularization to the spectral norm of the model weights. By merging in a way that always results in an orthogonal transformation, OrthoMerge ensures the spectral norm of the weights remains constant, no matter how many tasks are merged or how different those tasks are. In contrast, traditional merging approaches in Euclidean space can cause the spectral norm to either grow or shrink as more tasks are combined, potentially destabilizing the model. This geometric property of OrthoMerge helps prevent such spectral drift.

OrthoMerge Maintains Model Performance across Diverse Tasks
and Mitigate Catastrophic Forgetting

results results results

OrthoMerge consistently delivers strong multi-task performance while preserving base model generalization. Across both OFT-finetuned and standard (LoRA/full) finetuned models, OrthoMerge outperforms Euclidean baselines on in-domain tasks and on out-of-domain benchmarks. In several settings, the merged model surpasses the average performance of individual experts, indicating constructive integration rather than destructive interference. Our Orthogonal-Residual Decoupling further boosts existing merging methods, improving task accuracy and mitigating catastrophic forgetting. These gains extend to vision-language models, where OrthoMerge unifies spatial reasoning, OCR, and medical multimodal capabilities while better retaining general instruction following and multimodal performance compared to standard merging approaches.