Researchers From Microsoft and UCLA Introduce ClimaX: A Flexible And Generalizable Deep Learning Model For Weather And Climate Science

Most of the current state-of-the-art climate and weather models are largely based on simulations of massive numerical systems that utilize the laws of physics to govern different aspects of the atmosphere. Because of this, running cutting-edge numerical weather and climate models is exceedingly computationally expensive, especially when simulating atmospheric phenomena with fine-grained spatial and temporal resolution. So, despite their extraordinary performance, these models are acknowledged to have several shortcomings and constraints that apply to both long- and short-term time horizons.

The amount of data that can be collected utilizing satellites, radars, and various weather sensors has also considerably increased as a result of recent technology breakthroughs. These data-driven methods use deep neural networks to train a data-driven functional mapping to solve a downstream forecasting or projection task. However, there are several restrictions on how large-scale data may be handled by current numerical weather and climate models. To counter this issue, machine learning (ML) models can offer an alternate tradeoff to gain from the scalability of both data and computation. These efforts to scale up deep learning systems for short- and medium-range weather forecasting have already shown outstanding success, frequently matching the most advanced numerical weather models.

However, most ML models lack the generality of numerical models because they are trained for specific spatiotemporal objectives using handpicked climate datasets. In order to build a more generalized model for weather and climate science, researchers at Microsoft and UCLA worked on developing ClimaX. The research team consisted of Tung Nguyen, Johannes Brandstetter, Ashish Kapoor, Jayesh K. Gupta, and Aditya Grover. ClimaX is a generalizable transformer-based weather and climate science model that can be trained with heterogeneous datasets encompassing various variables, spatiotemporal coverage, and physical groundings. The foundation model can be adjusted to suit a wide range of climate and weather requirements, which allows it to be computationally efficient while maintaining universality. The model will shortly be made available for usage in academia and research.

ClimaX uses the pretraining-finetuning paradigm, which has grown in popularity recently for training unsupervised foundation models. The researchers used climate simulation datasets that use underlying laws of physics rather than limiting themselves to conventional homogeneous weather datasets for pretraining ClimaX. The benefit of doing so was the abundance of data made available due to diverse climate simulations from numerous groups. The researchers used the climate datasets derived from CMIP6 for this purpose. After that, the pre-trained ClimaX can be adjusted to handle various climate and weather tasks, including those that incorporate atmospheric variables and spatiotemporal scales that weren’t considered during pretraining.

🔥 Recommended Read: Leveraging TensorLeap for Effective Transfer Learning: Overcoming Domain Gaps

ClimaX is a multi-dimensional architecture for image-to-image translation based on Vision Transformers (ViT). However, ClimaX differs from standard ViT architectures in two important aspects: variable tokenization and variable aggregation. Contrary to common image data, where ViT tokenization involves dividing all input into equal patches and flattening these patches, the researchers used variable tokenization for climate data. Since climate data can be quite irregular, variable tokenization treats variables as discrete modalities to enable more flexible training even with inconsistent datasets. However, variable tokenization has two shortcomings. It produces sequences that get longer linearly with the number of input variables, which is incredibly inefficient in computation. Additionally, the input will likely comprise tokens of many variables with widely disparate physical backgrounds. Thus, the researchers suggested variable aggregation, a cross-attention process that generates an embedding vector of similar size for each spatial location.

Weather forecasts, climate projections, and climate downscaling were among the climate downstream tasks on which the researchers assessed ClimaX’s performance. Even when pretrained at lesser resolutions and computation budgets, ClimaX performs better than other baseline deep learning models.

The research team developed ClimaX intending to advance data-driven weather and climate modeling by enabling universal access to cutting-edge machine-learning techniques that handle a variety of challenges involving weather and climate variables. The team explained that they see ClimaX as a first step towards completing many of these types of tasks. More findings regarding their research can be found below.

Check out the Paper and Microsoft Blog. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 13k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Khushboo Gupta is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Goa. She is passionate about the fields of Machine Learning, Natural Language Processing and Web Development. She enjoys learning more about the technical field by participating in several challenges.