Salesforce AI Developed FSNet (Fast and Slow Learning Network) for Deep Time-Series Forecasting, Which can Learn Deep Forecasting Models on the Fly in a Nonstationary Environment

Time series forecasting, also known as predicting future values based on historical data, is crucial to solving many real-world issues ranging from weather forecasts and anomaly detection to energy usage, system tracking, and monitoring. Due to recent technological advancements with increased access to data and computational power, deep learning (DL) applications are gradually replacing traditional methods when it comes to time series forecasting. In contrast to conventional forecasting techniques, DL models can learn hierarchical representations and more intricate dependencies, reducing the requirement for manual feature engineering and model creation.

However, despite their recent success, many real-world applications where live time series come sequentially cannot be scaled up to use DL approaches for time series forecasting. This is because, to prevent concept drift, the forecasting model must promptly update itself. Due to their poor capacity to adapt to non-stationary settings and retain previously learned information, DL models that follow the classic batch learning paradigm are notoriously difficult to train on the fly. Using new training examples necessitates retraining the entire dataset. According to researchers, good handling of changes to both new and repeating patterns is necessary for successful solutions.

Online learning is intended to learn models progressively from data that enters sequentially, in contrast to standard offline learning paradigms. When new training data is received, models may be updated quickly and effectively to overcome the limitations of traditional batch learning. However, it is not easy to make minor adjustments to deep forecasting model optimizers to accommodate online updates. Slow convergence and ineffective pattern learning are the two main issues that arise.

Deep neural networks need a mechanism to enable successful learning on data streams since concept drift occurs, and a complex model would need a lot more training samples to pick up such new concepts. Due to the lack of offline training benefits like mini-batches or training for several epochs, training DL neural networks on data streams converges slowly. Time series data frequently show recurring patterns, where a previous pattern could disappear and reappear in the future. However, DL networks cannot preserve earlier knowledge due to the catastrophic forgetting phenomena. This leads to ineffective pattern recognition, which further reduces performance.

Fast and Slow Learning Network (FSNet) is a cutting-edge framework that Salesforce Research created to address problems that arise with online forecasting. By dynamically balancing quick adaptability to recent changes and retrieving related old knowledge, the framework, based on the Complementary Learning Systems (CLS) theory, enhances the slowly-learned backbone. Because of its properties, FSNet exhibits promising performance across numerous datasets. It can manage streaming data, forecast using real-time time-series data, and adjust to both recurring and mutating patterns. 

One of the main problems with deep neural networks, catastrophic forgetting, is solved with FSNet. By dynamically balancing quick adaptation to recent changes and retrieving related past knowledge, FSNet outperforms the slowly-learned backbone. The interaction of two complementing elements accomplishes this method: an adaptor to track how much each layer contributes to the loss and an associative memory to support memorizing, updating, and recalling recurring occurrences. 

Each intermediate layer may adjust itself more effectively with fewer data samples thanks to a per-layer adaptor that simulates the temporal information between succeeding samples, primarily when idea drift occurs. To assist in quick learning of patterns when repetitive events are encountered, the adapter interacts with its memory to retrieve and update past actions. Associative memory stores the training-related pattern’s significance and recurrence, allowing FSNet to draw on previously learned information constantly. It is also noteworthy that FSNet concentrates on enhancing the learning of current data rather than directly detecting idea drifts.

When FSNet meets new data points, the entire module (adapter + memory) can instantly produce the updated rule for base model parameters due to fast learning. In contrast, slow learning involves updating typical neural networks with a single sample at a time, which causes the networks to converge slowly. Recent research has shown the shallow-to-deep principle in action, showing that shallower networks can learn more effectively with less input or faster adapt to changes in the data. Therefore, gaining knowledge in these situations is better by starting with a shallow network and then progressively increasing its depth. 

In order to better understand the present loss, the team monitored and adjusted each layer individually. A gradient of a single sample might dramatically fluctuate during online training because of the noise and nonstationarity of time series data, which introduces noise to the adaptation coefficients. As a result, the noise in the online training was smoothed down, and the temporal information in the time series was captured using the Exponential Moving Average (EMA) of the backbone’s gradient.

It also becomes essential to use past activities to enhance learning outcomes since time series data frequently shows the recurrence of old patterns. Associative memory is used in this situation. This interaction is only initiated when a significant representation change occurs because interacting with the memory at every step would be costly and subject to noise. To find out whether a specific pattern might recur in the future and how it was previously adjusted to such a pattern, FSNet obtains the relevant meta-information.

On synthetic and real-world datasets, FSNet significantly outperforms conventional baselines. It can handle different concept drifts and achieve faster convergence while maintaining higher quality. The researchers found that due to the steep peaks of the loss curves, concept drifts are likely to occur in most datasets. As most of these drifts occur in the first 40% of data during the early stages of learning, simply evaluating the model on the final data segment (done during standard batch training) is overly optimistic. Due to the missing values in the ECL and Traffic datasets, which can vary significantly within and across dimensions, experimental evaluations on these datasets were more complex. This finding highlights the difficulties in online time-series forecasting, and addressing these difficulties can help the technique perform even better.

A promising yet challenging issue for time series forecasting is the integration of online learning and deep learning. An adaptor and an associative memory are two essential components that FSNet adds to a neural network’s backbone. Time-series forecasting benefits from FSNet’s ability to overcome deep neural network restrictions like slow convergence on data streams and catastrophic forgetting. Time series forecasting will become increasingly important in the near future. Therefore, the FSNet research could significantly impact both machine learning and human learning.

Future ML systems may be constructed like the FSNet method, which combines deep neural networks with an associative memory and an adapter. The Complementary Learning Systems (CLS) theory, a neuroscience paradigm for ongoing human learning, inspired the design of FSNet. The FSNet research might provide inspiration and insights going the opposite way, enhancing theories of how people learn. The code behind the framework can be found in their GitHub repository.

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'LEARNING FAST AND SLOW FOR
ONLINE TIME SERIES FORECASTING'. All Credit For This Research Goes To Researchers on This Project. Check out the paper, github and reference article.
Please Don't Forget To Join Our ML Subreddit and ML Discord Channel

Khushboo Gupta is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Goa. She is passionate about the fields of Machine Learning, Natural Language Processing and Web Development. She enjoys learning more about the technical field by participating in several challenges.