Meet ClimSim: A Groundbreaking Multi-Scale Climate Simulation Dataset for Merging Machine Learning and Physics in Climate Research

Numerical physical simulation predictions are the main source of information used to guide climate change policy. Even though they are pushing the boundaries of the most potent supercomputers, existing climate simulators need to simulate the physics of clouds and heavy precipitation. The complexity of the Earth system severely limits the spatial resolution the research team can employ in these simulations. “Parameterizations” are empirical mathematical representations of physics happening on scales lower than climate simulations’ temporal and geographical resolutions. Regrettably, assumptions used in these parameterizations frequently result in mistakes that might worsen the projected climate in the future. 

A compelling method for simulating complicated nonlinear sub-resolution physics processes taking place on scales smaller than the resolution of the climate simulator at a lower computer complexity is machine learning (ML). The intriguing aspect of its application is that it will lead to more accurate and less expensive climate simulations than what they are now. The smallest resolvable scale of current climate simulations is typically 80–200 km, or the size of an average U.S. county. However, a resolution of 100 m or finer is needed to describe cloud formation effectively, necessitating an orders of magnitude increase in computing power. 

Using machine learning (ML) to overcome the constraints of classical computing is still a viable option. The resulting hybridML climate simulators blend ML emulators of the macro-scale effects of small-scale physics with conventional numerical methods for solving the equations governing the large-scale fluid motions of Earth’s atmosphere. The emulators learn directly from data produced by high-resolution, short-duration simulations rather than depending on heuristic assumptions about these small-scale processes. In essence, this is a regression problem: given large-scale resolved inputs, an ML parameterization emulator in the climate simulation returns the large-scale outputs (such as changes in wind, moisture, or temperature) that arise from unresolved small-scale (sub-resolution) physics. 

Although several proofs of concept have been developed recently, hybrid-ML climate simulations still need to be operationally deployed. One of the main obstacles preventing the ML community from being interested is getting enough training data. All macro-scale factors that control the behavior of sub-resolution physics must be included in this data for it to work with downstream hybrid ML-climate simulations. It has been shown that addressing this using training data from consistently high-resolution simulations is highly costly and can cause problems when combined with a host climate simulation. Using multi-scale climate simulation techniques to produce training data is a viable approach. Most importantly, these offer a clear interface between the planetary-scale dynamics of the host climate simulator and the mimicked high-resolution physics. This theoretically makes downstream hybrid coupled simulation tractable and accessible. Due to a lack of available datasets and the requirement for domain expertise when selecting variables, operational simulation code complexity and scarcity of available datasets have hindered the practical application of multi-scale approaches. 

For use in hybrid-ML climate simulations, the research team consisting of researchers from over 20 imminent research institutions present ClimSim, the largest and most physically complete dataset for training machine learning simulators of air storms, clouds, turbulence, rainfall, and radiation. ClimSim is an all-inclusive set of inputs and outputs from multi-scale physical climate simulations. To reduce the hurdles to entry for ML specialists on this significant issue, climate simulator developers and atmospheric scientists created ClimSim. Their benchmark dataset provides a solid basis for building robust frameworks that model cloud and severe rainfall physics parameterizations and how they interact with other sub-resolution phenomena. By facilitating online coupling inside the host coarse-resolution climate simulator, these frameworks help climate simulators used for long-term forecasts operate more accurately and perform better overall.


Check out the Paper and ProjectAll credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...