ABBA

ABBA: Highly Expressive Hadamard Product Adaptation for Large Language Models

Raghav Singhal^*¹, Kaustubh Ponkshe^*¹, Rohit Vartak^*², Praneeth Vepakomma^1,3

¹Mohamed bin Zayed University of Artificial Intelligence, UAE
²Duke University, USA
³Massachusetts Institute of Technology, USA
^*Indicates Equal Contribution

Abstract

Large Language Models have demonstrated strong performance across a wide range of tasks, but adapting them efficiently to new domains remains a key challenge. Parameter-Efficient Fine-Tuning (PEFT) methods address this by introducing lightweight, trainable modules while keeping most pre-trained weights fixed. The prevailing approach, LoRA, models updates using a low-rank decomposition, but its expressivity is inherently constrained by the rank. Recent methods like HiRA aim to increase expressivity by incorporating a Hadamard product with the frozen weights, but still rely on the structure of the pre-trained model. We introduce ABBA, a new PEFT architecture that reparameterizes the update as a Hadamard product of two independently learnable low-rank matrices. In contrast to prior work, ABBA fully decouples the update from the pre-trained weights, enabling both components to be optimized freely. This leads to significantly higher expressivity under the same parameter budget. We formally analyze ABBA's expressive capacity and validate its advantages through matrix reconstruction experiments. Empirically, ABBA achieves state-of-the-art results on arithmetic and commonsense reasoning benchmarks, consistently outperforming existing PEFT methods by a significant margin across multiple models.

ABBA Illustration

ABBA introduces a highly expressive yet efficient reparameterization for fine-tuning LLMs by modeling the update as the Hadamard product of two independently learnable low-rank matrices: \[ \Delta W = s(B_1 A_1) \odot (B_2 A_2) \] where \( B_1 \in \mathbb{R}^{m \times r_1} \), \( A_1 \in \mathbb{R}^{r_1 \times n} \), \( B_2 \in \mathbb{R}^{m \times r_2} \), and \( A_2 \in \mathbb{R}^{r_2 \times n} \), with \( r_1, r_2 \ll \min(m, n) \). The scaling factor \( s \) ensures training stability. This parameterization introduces only \( (r_1 + r_2)(m + n) \) trainable parameters-substantially fewer than full fine-tuning-while enabling an effective rank up to \( r_1 r_2 \). To maximize expressivity under a fixed parameter budget, we typically set \( r_1 = r_2 \). For fair comparison with other PEFT methods, we match parameter counts by setting \( r_1 = r_2 = r/2 \), so that ABBA uses the same number of parameters as standard LoRA-based approaches.

Contributions

We propose ABBA, a novel PEFT architecture that models the weight update as the Hadamard product of two independently learnable low-rank matrices. This formulation enables highly expressive, high-rank updates while preserving strict parameter efficiency.
We provide theoretical and empirical analyses of ABBA's expressivity, showing that Hadamard-based decomposition consistently outperforms standard low-rank methods in matrix reconstruction.
We introduce an exact and efficient reformulation of ABBA using Khatri–Rao factorization, enabling scalable and practical implementation without compromising expressivity.
Through extensive experiments on four models across arithmetic and commonsense reasoning tasks, we demonstrate that ABBA achieves state-of-the-art performance, significantly outperforming existing PEFT methods under equal or lower parameter budgets.

Efficiency of ABBA

While ABBA is clearly parameter-efficient, analyzing its memory footprint during training is more subtle. In LoRA, the update \( \Delta W = BA \) is applied as \( \Delta W x = B (A x) \), allowing intermediate computations to remain low-rank. Only the activation \( A x \in \mathbb{R}^r \) and the adapter weights need to be stored additionally, avoiding the materialization of the full \( m \times n \) matrix \( BA \).

In contrast, ABBA’s update \( \Delta W = (B_1 A_1) \odot (B_2 A_2) \) poses a challenge. A naive implementation would require constructing both \( B_1 A_1 \) and \( B_2 A_2 \), followed by their elementwise product, resulting in the storage of multiple full \( m \times n \) matrices. Moreover, unlike LoRA, the Hadamard product does not distribute over matrix–vector multiplication, so computing \( B_2 (A_2 x) \) does not help incorporate the other matrices.

Theorem (Khatri–Rao Factorization [Slyusar, 1997]):
Let \( B_1 A_1, B_2 A_2 \in \mathbb{R}^{m \times n} \). Then, \[ (B_1 A_1) \odot (B_2 A_2) = \underbrace{(B_1 \odot_r B_2)}_{m \times r_1 r_2} \underbrace{(A_1^\top \odot_r A_2^\top)^\top}_{r_1 r_2 \times n} \] where \( \odot_r \) denotes the row-wise Khatri–Rao product.

To address the memory bottleneck, we use the above theorem to rewrite ABBA in a LoRA-like form: define \( B_{\text{kr}} = B_1 \odot_r B_2 \) and \( A_{\text{kr}} = (A_1^\top \odot_r A_2^\top)^\top \). The update becomes \( \Delta W x = B_{\text{kr}} (A_{\text{kr}} x) \), avoiding any full-rank matrix construction.

This formulation allows ABBA to match LoRA’s compute and memory efficiency while offering significantly higher expressivity.

Main Results

Commonsense Reasoning: Llama-3.2 1B and 3B

Arithmetic Reasoning: Mistral-7B and Gemma-2 9B

Different initialization strategies for ABBA (Mistral-7B)

Performance comparison across different scaling factor values \( s_{\text{ABBA}} \) for Llama-3.2 3B

Different \( \{r_1, r_2\} \) pairs with fixed \( r_1 + r_2 = 32 \).

Comparison of training memory requirements across various methods

BibTeX

@misc{singhal2025abbahighlyexpressivehadamard, title={ABBA: Highly Expressive Hadamard Product Adaptation for Large Language Models}, author={Raghav Singhal and Kaustubh Ponkshe and Rohit Vartak and Praneeth Vepakomma}, year={2025}, eprint={2505.14238}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2505.14238}, }