Researchers from UCLA and CMU Introduce Stormer: A Scalable Transformer Neural Networks for Skillful and Reliable Medium-Range Weather Forecasting

One of the main issues facing science and society today is weather forecasting. Accurate weather forecasting plays a crucial role in helping people plan for and recover from natural catastrophes and extreme weather occurrences and helping researchers better understand the environment in light of growing worries about climate change. Numerical weather prediction (NWP) models have historically been the mainstay of atmospheric scientists’ work. These models use systems of differential equations that explain thermodynamics and fluid flow and may be integrated across time to produce projections for the future. NWP models have several drawbacks while being widely used, such as parameterization mistakes of significant small-scale physical phenomena such as radiation and cloud physics. 

Because of the difficulty of integrating a large system of differential equations, numerical approaches also have substantial computing costs, particularly when modeling at precise spatial and temporal resolutions. Moreover, since the models depend on the knowledge of climate scientists to improve equations, parameterizations, and algorithms, NWP forecast accuracy remains the same with additional data. A rising number of people are interested in data-driven, deep learning-based weather forecasting methods to overcome the problems with NWP models. Using historical data, like the ERA5 reanalysis dataset, deep neural networks are trained to forecast future weather conditions. This is the main premise of the technique. Unlike traditional NWP models, which take hours to make forecasts, they may do so in seconds once trained. 

Early efforts in this field sought to employ conventional vision architectures like ResNet and UNet for weather forecasting since meteorological data and natural pictures have comparable spatial structures. However, their performances were inferior to those of numerical models. However, due to improved model designs, training recipes, and increased data and power, notable advancements have been made recently. The first model to surpass operational IFS was Pangu-Weather, a 3D Earth-Specific Transformer model trained on 0.25∘ data (721×1440 grids). Soon after, Keisler’s graph neural network design was scaled up to 0.25∘ data by GraphCast, which demonstrated gains over Pangu-Weather.

Even though the forecast accuracy is outstanding, current approaches sometimes employ intricate, highly tailored neural network topologies with little to no ablation experiments, making it challenging to pinpoint the precise elements that lead to their effectiveness. For instance, it’s unknown how much the multi-mesh message-passing in GraphCast contributes to its efficiency and what advantages the 3D Earth-Specific Transformer has over a normal Transformer. Moving forward in this sector will require a better understanding of these current methodologies and, preferably, a simplification. A unified framework would also make it easier to create foundation models for climate and weather that go beyond weather forecasting. This study demonstrates that a straightforward design can outperform cutting-edge techniques when combined with the right training formula. 

Researchers from UCLA, CMU, Argonne National Laboratory, and Penn State University present Stormer, a straightforward transformer model that requires little modification to the conventional transformer backbone to deliver state-of-the-art performance in weather forecasting. Beginning with a conventional vision transformer (ViT) architecture, the research team conducted in-depth ablation investigations to determine the three essential elements influencing the model’s performance: Three components make up the model: (1) a weather-specific embedding layer that models the interactions between atmospheric variables to convert the input data into a sequence of tokens; (2) a randomized dynamics forecasting objective that trains the model to predict weather dynamics at random intervals; and (3) a pressure-weighted loss that approximates the density at each pressure level by weighting variables at different pressure levels in the loss function. Their proposed randomized dynamics forecasting goal, by employing various combinations of the intervals for which the model was trained, enables a single model to generate many forecasts for a given lead time during inference. 

For instance, by distributing the 6-hour forecasts 12 times or the 12-hour predictions 6 times, one may get a 3-day forecast. Significant performance gains result from combining these projections, particularly for lengthy lead times. The research team assess Scalable transformers for weather forecasting (Stormer), their suggested approach, using WeatherBench 2, a popular benchmark for data-driven weather forecasting. Test results demonstrate that Stormer surpasses the state-of-the-art forecasting system after 7 days, achieving competitive prediction accuracy of important atmospheric variables for 1–7 days. Significantly, Stormer exceeds the baselines in performance by training on almost 5× lower-resolution data and orders of magnitude fewer GPU hours. Lastly, their scaling research demonstrates the possibility for additional improvements by proving that Stormer’s performance continuously improves with increased model capacity and data size.


Check out the Paper and ProjectAll credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...