A Paradigm Shift: MoRA’s Role in Advancing Parameter-Efficient Fine-Tuning Techniques

Parameter-efficient fine-tuning (PEFT) techniques adapt large language models (LLMs) to specific tasks by modifying a small subset of parameters, unlike Full Fine-Tuning (FFT), which updates all parameters. PEFT, exemplified by Low-Rank Adaptation (LoRA), significantly reduces memory requirements by updating less than 1% of parameters while achieving similar performance to FFT. LoRA uses low-rank matrices to enhance performance without extra computational costs during inference. Merging these matrices into original model parameters avoids extra inference costs. Numerous methods aim to improve LoRA for LLMs, primarily validating efficiency via GLUE by achieving better performance or requiring fewer trainable parameters.

Enhancements in LoRA include DoRA’s decomposition approach, LoRA+’s differential learning rates, and ReLoRA’s integration during training. Fine-tuning LLMs involves instruction tuning, complex reasoning tasks, and continual pretraining. Most LoRA variants use instruction tuning or GLUE tasks, which may not fully reflect effectiveness. Recent works test reasoning tasks but often need more training data, limiting accurate evaluation.

Researchers from Beihang University and Microsoft Corporation introduced MoRA. This robust method uses a square matrix instead of low-rank matrices in LoRA to achieve high-rank updating with the same number of trainable parameters. MoRA employs four non-parameter operators to adjust input and output dimensions, ensuring the weight can be merged back into LLMs. Comprehensive evaluation across five tasks—instruction tuning, mathematical reasoning, continual pretraining, memory, and pretraining—demonstrates MoRA’s effectiveness.

MoRA aims to achieve higher-rank updates with the same number of trainable parameters as LoRA by using a square matrix. It introduces non-parameter operators to reduce the input dimension and increase the output dimension, ensuring the weight can merge back into LLMs. Several methods implement these functions, such as truncating dimensions, sharing rows and columns, and reshaping inputs. Incorporating rotation operators enhances the expressiveness of MoRA, distinguishing different input segments and improving performance.

Researchers evaluated MoRA and presented fine-tuning results for MMLU in zero-shot and 5-shot settings for instruction tuning, GSM8K, and MATH for mathematical reasoning, and average performance on biomedical and financial tasks for continual pretraining. MoRA performs similarly to LoRA in instruction tuning and mathematical reasoning but outperforms LoRA in biomedical and financial domains due to high-rank updating. LoRA variants generally exhibit similar performances to LoRA, with AsyLoRA excelling in instruction tuning but struggling in mathematical reasoning. ReLoRA’s performance suffers at higher ranks, like 256, due to merging low-rank matrices during training. Each task demonstrates different fine-tuning requirements, where rank 8 suffices for instruction tuning but fails for mathematical reasoning, necessitating a rank increase to 256 for parity with FFT. In continual pretraining, LoRA, with rank 256, still lags behind FFT.

In this study, researchers analyze the limitations of low-rank updating in LoRA for memory-intensive tasks and propose MoRA as a solution. MoRA utilizes non-parameterized operators for high-rank updating and explores different decompression and compression methods. Performance comparisons show MoRA matching LoRA in instruction tuning and mathematical reasoning while outperforming it in continual pretraining and memory tasks. Pretraining experiments further validate the effectiveness of high-rank updating, demonstrating superior results compared to ReLoRA.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 42k+ ML SubReddit

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...