The three essential components that determine the evolution of modern Machine Learning are computing, data, and algorithmic advancements (ML). The article looks at trends in the most easily quantifiable element. Before 2010, training computes expanded in lockstep with Moore’s law, doubling every two years. Since the early 2010s, when Deep Learning was first introduced, the rate of training compute has quickened, roughly doubling every six months. Late in 2015, a new trend emerged. The history of computation in ML has been divided into three eras based on these observations – the Pre-Deep Learning Era, the Deep Learning Era, and the Large-Scale Era. The article summarises the fast-growing compute requirements for training advanced ML systems.
The comparison is made on a dataset of 123 milestone ML systems, annotated with the computing it took to train them. Before Deep Learning took off, there was a period of slow progress. The tendency accelerated in 2010 and hasn’t slowed since. Separately, in 2015 and 2016, a new trend of large-scale models arose, expanding at a comparable rate but by two orders of magnitude faster than the preceding one.
Transition to Deep Learning
Prior to and following the advent of Deep Learning, two different trend regimes were noticed. Previously, the amount of computing power necessary to train machine learning algorithms doubled every 17 to 29 months. As a result, the general trend accelerates, doubling every 4 to 9 months. Moore’s law, according to which transistor density doubles every two years (Moore, 1965) – frequently simplified to computing performance doubling every two years – essentially matched the trend in the Pre-Deep Learning Era. It’s unclear when the Deep Learning Era begins. The transition from the Pre-Deep Learning to the Deep Learning era has no discernible discontinuities. Furthermore, outcomes scarcely alter whether the Deep Learning era started in 2010 or 2012.
Trends in the Large-Scale Era
According to data, a new trend of large-scale models began in 2015-2016 (see Figure 3). This new tendency started in late 2015 with AlphaGo and has continued till today. Large firms with more significant training costs could probably break the prior pattern by training these large-scale models.
Separately, the popularity of regular-scale models remained unaffected. This trend is consistent and has the same slope before and after 2016, doubling every 5 to 6 months, as seen in Table 4.4. The increase in computing in large-scale models appears to be slowing, doubling every 9 to 10 months. The apparent slowdown could be noise since there is limited data on these models. The findings contrast with Amodei & Hernandez (2018), who found a 3.4-month doubling period between 2012 and 2018, and Lyzhov (2021), who found a greater than 2-year doubling period between 2018 and 2020. Previous evaluations could not distinguish between these two independent patterns since the large-scale trend had just recently developed.
The findings align with earlier research; however, these show a more moderate scaling of training computes. There’s an 18-month doubling time between 1952 and 2010, a 6-month doubling time between 2010 and 2022, and a new trend of large-scale models between late 2015 and 2022, which began 2 to 3 orders of magnitude earlier and had a 10-month doubling time. To summarise, computation progressed slowly before the Deep Learning Era. With the shift into the Deep Learning Era in 2010, the tendency accelerated. Late in 2015, businesses began producing large-scale models that outperformed the trend, such as AlphaGo, signaling the start of the Large-Scale Era. However, there is no surety in distinguishing between large-scale and regular-scale models, framing the pattern. The growing role of hardware infrastructure and engineers in teaching computing underscores the strategic necessity of hardware infrastructure and engineers. Access to enormous compute budgets or computing clusters and the expertise to utilize them has become synonymous with cutting-edge ML research.