ABBA

Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning

Raghav Singhal^*¹, Kaustubh Ponkshe^*¹, Rohit Vartak², Lav R. Varshney³, Praneeth Vepakomma^1,4

¹Mohamed bin Zayed University of Artificial Intelligence, UAE
²Duke University, USA
^{3 University of Illinois Urbana-Champaign, USA

⁴Massachusetts Institute of Technology, USA

^*Indicates Equal Contribution}

Abstract

Low-Rank Adaptation (LoRA) has become ubiquitous for efficiently fine-tuning foundation models. However, federated fine-tuning using LoRA is challenging due to suboptimal updates arising from traditional federated averaging of individual adapters. Existing solutions either incur prohibitively high communication cost that scales linearly with the number of clients or suffer from performance degradation due to limited expressivity.

We introduce Federated Silver Bullet (Fed-SB), a novel approach for federated fine-tuning of LLMs using LoRA-SB, a recently proposed low-rank adaptation method. LoRA-SB optimally aligns the optimization trajectory with the ideal low-rank full fine-tuning projection by learning a small square matrix \( R \) between adapters \( B \) and \( A \), keeping other components fixed.

Direct averaging of \( R \) guarantees exact updates, substantially reducing communication cost, which remains independent of the number of clients, and enables scalability. Fed-SB achieves state-of-the-art performance across commonsense reasoning, arithmetic reasoning, and language inference tasks while reducing communication costs by up to 230×.

In private settings, Fed-SB further improves performance by (1) reducing trainable parameters, thereby lowering the noise required for differential privacy and (2) avoiding noise amplification introduced by other methods.

Overall, Fed-SB offers a state-of-the-art, efficient, and scalable solution for both private and non-private federated fine-tuning.

Fed-SB Illustration

We propose Fed-SB, an extremely communication-efficient and high-performing federated adaptation of LoRA-SB. Instead of reparameterizing updates as a low-rank decomposition with learnable adapters, the server distributes frozen adapters \( \mathbf{B} \) and \( \mathbf{A} \), while clients train only a small matrix \( \mathbf{R} \). This enables exact aggregation, as the global update is simply the average of \( \mathbf{R} \) across clients.

Formally, given a pre-trained weight \( \mathbf{W}_0 \) and data distributed across \( c \) clients, each client learns updates of the form:

\[ \Delta \mathbf{W}_i = \mathbf{B} \mathbf{R}_i \mathbf{A}. \]

The server then aggregates the updates by computing the global \( \mathbf{R} \) matrix:

\[ \mathbf{R}^{\text{agg}} = \frac{1}{c} \sum_{i=1}^{c} \mathbf{R}_i, \quad \Delta \mathbf{W}^{\text{agg}} = \mathbf{B} \left(\frac{1}{c} \sum_{i=1}^{c} \mathbf{R}_i \right) \mathbf{A}. \]

We show that Fed-SB effectively resolves all challenges in (private) federated FT while achieving state-of-the-art communication efficiency and performance.

Contributions

We propose Fed-SB, a federated fine-tuning method that achieves exact and optimal aggregation in low-rank adaptation without incurring prohibitive communication costs or performance degradation.
Fed-SB consistently achieves state-of-the-art results while significantly reducing communication cost, by up to 230×, by requiring only an \( r \times r \) matrix to be transmitted per aggregation.
We demonstrate that Fed-SB is particularly well-suited for privacy-preserving (federated) fine-tuning, as it minimizes noise by reducing the number of learnable parameters and leveraging linearity in the aggregate update.
Extensive experiments on 4 models across 3 diverse benchmarks show that Fed-SB consistently outperforms existing methods while drastically reducing communication overhead in both private and non-private federated settings, establishing a new Pareto frontier in federated fine-tuning.

Efficiency of Fed-SB

We summarize the advantages of Fed-SB over various state-of-the-art federated fine-tuning methods involving c clients. Fed-SB achieves exact aggregation and high expressivity while maintaining an extremely low communication cost—one that remains constant with respect to the number of clients.

In privacy-preserving settings, Fed-SB provides further benefits by minimizing noise: it reduces the number of learnable parameters and leverages the linearity of its update formulation to avoid noise amplification—an issue that affects many other methods.

Main Results

Federated fine-tuning of Llama-3.2 3B across eight commonsense reasoning datasets. # Comm. denotes the number of parameters communicated per round (in M). Best results are in bold.

Federated fine‑tuning of Llama‑3.2 3B across eight commonsense reasoning datasets, in a highly data‑heterogeneous setting, where each client is trained on a distinct dataset. # Comm. denotes the number of parameters communicated per round (in M). Best results are in bold.

Federated fine-tuning of Mistral-7B and Gemma-2 9B on GSM8K and MATH. # Comm. denotes the number of parameters communicated per round (in M). Best results are in bold.

Centralized (Cent.) private fine-tuning of BERT-base on SNLI for varying values of \( \epsilon \). A smaller \( \epsilon \) indicates a stricter privacy budget. # Params. denotes the number of trainable parameters (in K). Best results are in bold.

Federated private fine-tuning of BERT-base on SNLI for varying values of \( \epsilon \). A smaller \( \epsilon \) indicates a stricter privacy budget. # Comm. denotes the number of parameters communicated per round (in K). Best results are in bold.

BibTeX

@misc{singhal2025fedsbsilverbulletextreme, title={Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning}, author={Raghav Singhal and Kaustubh Ponkshe and Rohit Vartak and Lav R. Varshney and Praneeth Vepakomma}, year={2025}, eprint={2502.15436}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2502.15436}, }