Sakana AI Introduces Evolutionary Model Merge: A New Machine Learning Approach Automating Foundation Model Development

A recent development of a model merging into the community of large language models (LLMs) presents a paradigm shift. Strategically combining multiple LLMs into a single architecture, this development approach has captivated the attention of researchers mainly due to the advantage that it requires no additional training, which cuts the cost of building new models significantly. This availability gave rise to interest and experimentation with model merging. However, to most of the community, model merging is a kind of puzzling that relies on the model maker’s intuition and instincts. 

Prior efforts, such as the model soup approach, significantly improved relatively large image processing and classification models. Linear weight averaging works well for image processing and classification models and is also effective for image generation models such as latent diffusion models exemplified by Stable Diffusion. For some time, the most popular Stable Diffusion models were neither the original base models nor the fine-tuned versions but rather the merged models created by enthusiasts. This trend persists until a more advanced base model is released, at which point the community’s cycle of fine-tuning and merging is renewed. There has been very little trial and error to improve upon it. Other works, such as the DARE method and Neural Architecture Search (NAS), were also proposed. However, the field remains highly under-explored because the proposed methods have limitations (NAS requires significant computational power). 

Researchers from Sakana AI present a methodology that utilizes evolutionary algorithms to enhance the merging of foundation models. Their approach is distinguished by its ability to navigate both parameter space (weights) and the data flow space (inference path), a framework that integrates these two dimensions. By automatically discovering effective combinations of diverse open-source models and harnessing their collective intelligence without requiring additional training data or computational power, their approach fosters cross-domain merging, generating models like a Japanese LLM with math reasoning capabilities.

The researchers dissect the merging process into two distinct, orthogonal configuration spaces, analyzing their impacts. Building on this analysis, they introduced a cohesive framework that seamlessly integrates these spaces. They established merging configuration parameters for sparsification and weight mixing at each layer, including input and output embeddings. These configurations are then optimized using an evolutionary algorithm, such as CMA-ES, for selected tasks, guided by critical task-specific metrics (e.g., ROUGE score for VQA). In their search, they only optimize the data inference path inside the merged model and keep parameters in the models intact.

Their method achieves an impressive score of 52.0 on MGSM-JA benchmark, highlighting the remarkable potential of combining models with distinct expertise. The DFS-merged model also shows a performance enhancement, with an over 6 percent increase in accuracy compared to the source models. Their hybrid model integrates both merging strategies and shows further improvements to the task.

The key contributions to the field of foundation model development made by the researchers’ work:

  • Automated Model Composition: They introduce Evolutionary Model Merge, a general

Evolutionary method to automatically discover optimal combinations of diverse open-source models for creating new foundation models with user-specified capabilities. 

  • Cross-Domain Merging: Their method can discover unique and effective ways to merge models from disparate domains (e.g., non-English language and Math, non-English language and Vision), potentially exceeding the capabilities achievable through conventional human design strategies.
  • State-of-the-Art Performance: They have shown their method’s effectiveness by automatically generating a Japanese LLM with Math reasoning capability and a Japanese VLM. Both models achieve state-of-the-art performance on various benchmarks, even without explicit optimization.
  • High Efficiency and Surprising Generalizability: Their 7B parameter LLM surpasses the performance of some previous 70B parameter Japanese LLMs on benchmark datasets.
  • Culturally-Aware VLM: The generated Japanese VLM achieves top results when tested on a domestically sourced dataset of Japanese image-description pairs, demonstrating its ability to handle Japanese culture-specific content.

In conclusion, the researchers from Sakana AI have proposed a general method that uses evolutionary techniques to efficiently discover the best ways to combine different models from the vast ocean of different open-source models with diverse capabilities. Their process can automatically create new foundation models with the desired capabilities specified by the user. Also, the proposed approach can automatically discover unique and effective ways to merge different models from vastly different domains in non-trivial ways that might make it difficult for human experts to find themselves. Their models achieve state-of-the-art results on several LLM and Vision benchmarks in extensive experimentation.

Check out the Paper, Github, and Blog. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...