Low-Rank Adaptation (LoRA) has become ubiquitous for efficiently fine-tuning foundation models. However, federated fine-tuning using LoRA is challenging due to suboptimal updates arising from traditional federated averaging of individual adapters. Existing solutions either incur prohibitively high communication cost that scales linearly with the number of clients or suffer from performance degradation due to limited expressivity.
We introduce Federated Silver Bullet (Fed-SB), a novel approach for federated fine-tuning of LLMs using LoRA-SB, a recently proposed low-rank adaptation method. LoRA-SB optimally aligns the optimization trajectory with the ideal low-rank full fine-tuning projection by learning a small square matrix \( R \) between adapters \( B \) and \( A \), keeping other components fixed.
Direct averaging of \( R \) guarantees exact updates, substantially reducing communication cost, which remains independent of the number of clients, and enables scalability. Fed-SB achieves state-of-the-art performance across commonsense reasoning, arithmetic reasoning, and language inference tasks while reducing communication costs by up to 230×.
In private settings, Fed-SB further improves performance by (1) reducing trainable parameters, thereby lowering the noise required for differential privacy and (2) avoiding noise amplification introduced by other methods.
Overall, Fed-SB offers a state-of-the-art, efficient, and scalable solution for both private and non-private federated fine-tuning.
We propose Fed-SB, an extremely communication-efficient and high-performing federated adaptation of LoRA-SB. Instead of reparameterizing updates as a low-rank decomposition with learnable adapters, the server distributes frozen adapters \( \mathbf{B} \) and \( \mathbf{A} \), while clients train only a small matrix \( \mathbf{R} \). This enables exact aggregation, as the global update is simply the average of \( \mathbf{R} \) across clients.
Formally, given a pre-trained weight \( \mathbf{W}_0 \) and data distributed across \( c \) clients, each client learns updates of the form:
\[ \Delta \mathbf{W}_i = \mathbf{B} \mathbf{R}_i \mathbf{A}. \]
The server then aggregates the updates by computing the global \( \mathbf{R} \) matrix:
\[ \mathbf{R}^{\text{agg}} = \frac{1}{c} \sum_{i=1}^{c} \mathbf{R}_i, \quad \Delta \mathbf{W}^{\text{agg}} = \mathbf{B} \left(\frac{1}{c} \sum_{i=1}^{c} \mathbf{R}_i \right) \mathbf{A}. \]
We show that Fed-SB effectively resolves all challenges in (private) federated FT while achieving state-of-the-art communication efficiency and performance.
We summarize the advantages of Fed-SB over various state-of-the-art federated fine-tuning methods involving c clients. Fed-SB achieves exact aggregation and high expressivity while maintaining an extremely low communication cost—one that remains constant with respect to the number of clients.
In privacy-preserving settings, Fed-SB provides further benefits by minimizing noise: it reduces the number of learnable parameters and leverages the linearity of its update formulation to avoid noise amplification—an issue that affects many other methods.
@misc{singhal2025fedsbsilverbulletextreme,
title={Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning},
author={Raghav Singhal and Kaustubh Ponkshe and Rohit Vartak and Lav R. Varshney and Praneeth Vepakomma},
year={2025},
eprint={2502.15436},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2502.15436},
}