This article sets out the journey of Artificial Intelligence (AI) and the interrelationship with the arrival of the “era of Big Data” alongside 3G and 4G telecoms networks. This will discuss or explore how we arrived at where we are now and also where we are going to next with the era of even bigger albeit increasingly decentralised data in the era of AI meets the IoT (AIoT) and standalone 5G networks that may arrive in the next few years.
Transformational change is set to occur later this decade at a faster pace than ever before in human history as we advance through the 2020s.
The previous decade has been one whereby AI technology’s most profound impact has been within the realms of Digital and Social-Media along with E-commerce.
During the rest of this decade AI will extend its reach into the rest of the economy (“real-world sectors of our economy”) with AI scaling across healthcare, financial services, transportation, education, energy, telecoms, agriculture and continue its advance into cybersecurity.
It will also result in changes in where data is generated with the rise of the Edge of the Network with IoT.
Furthermore, there will need to be a change in the manner in which AI technology is applied and the needs of the use case. Crucial differences between the digital media and e-commerce world are that edge cases really matter in the “real-world” sectors. For example, in the event that something goes wrong with an AI based recommendation algorithm on social media or e-commerce application, such as an incorrect fashion item being recommended then whilst the customer may endure a bad user experience, there is no resulting liability claim for personal injury or death.
However, in the “real-world” sectors there is the potential for material damage. For example, with autonomous vehicles (car turns left instead of right and crashes), or healthcare applications (incorrect diagnosis of cancer) there may be physical and financial damages that result from an error, and with financial services there may be substantial financial losses too (incorrect portfolio risk management trades executed in large volume). Hence explainability (model interpretability) becomes key as we seek to understand why the AI model made a particular decision to prevent mistakes from occurring in the future as does causal inference as we’d need our autonomous machines to have common sense and infer causation resulting from particular actions.
As we seek to extend AI into the “real-world” and achieve digital and physical convergence, we also face challenges in relation to data and also the manner of application of AI technology towards those sectors relating to model interpretability, causal reasoning, ethics (bias and diversity), quality and availability of data, plus making Machine Learning models, in particular Deep Learning models, more compact (neural compression) and enabling them to learn from smaller sets of data including the ability to dynamically respond to their environments.
This will not happen overnight and will take time, albeit the author is optimistic that we are entering an era of research and development that will yield exciting results over the course of the next three years enabling AI to scale across the economy and overlapping with the development of standalone 5G networks.
Even in the era of hype about AI that prevailed around the 2018 period, the chart from KPMG below reminds us that the broader Intelligent Automation market remains in its relatively early stages and is set for rapid growth from the period 2022 to 2025 that coincidentally overlaps with the period whereby we should also experience a growth in standalone 5G networks (standalone being a keyword here and explained further below).
A key point in scaling AI startups and internal corporate projects is access to quality data and extracting value from the data.
What is AI and why does data strategy matter for a Machine Learning Project?
BCG noted that Data Scientists need to feed machines plenty of data in order to properly weight the many connections plus correlations that ultimately create an algorithm “whose intelligence is limited to that specific realm of data and hence explains why AI is often described as data-hungry.”
The key role of labelled data becomes clear when we consider the types of Machine Learning and the fact that the majority of Machine Learning projects in the production environment today entail Supervised Learning, as explained below.
The Types of Machine Learning:
Supervised Learning: Data is labelled and the algorithm learns to predict the label. For example a cat vs a dog based upon the annotated ground truth samples in the training dataset.
Unsupervised Learning: It is a learning algorithm to discover patterns hidden in data that is not labelled (annotated). An example is segmenting customers into different clusters.
Semi-Supervised Learning: It entails a learning algorithm when only a small fraction of the data is labelled.
Reinforcement Learning (RL): It is an area that deals with modelling agents in an environment that continuously rewards the agent for taking the right decision. An example is an agent that is playing chess against a human being, An agent gets rewarded when it gets a right move and penalized when it makes a wrong move. Once trained, the agent can compete with a human being in a real match.
The most prevalent approach in Machine Learning production is to use Supervised Learning and for this we require large amounts of labelled data.
This then points us to focusing in on why has there been an explosion in data and where (sectors) is the data located and why the corresponding advances in AI during the same time?
Why has there been this explosion in data and how did AI adance?
The drivers for this resurgence of AI have been the following:
- The dramatic growth in data – fuelled by the rise of the internet and in particular over the past decade the rise of smart mobile apps enabled by 4G;
- Greater availability and reduced cost of storage (e.g. the rise of the cloud) for example Statista note that “In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of 19.2 percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached 6.7 zettabytes.”
- The potential of Graphical Processing Units (GPUs) to enable parallel processing and speed up Deep Neural Networks
- Advances in the architecture and approaches to Deep Neural Networks such as drop out and other network regularization techniques that mitigated the problem of overfitting (whereby the Deep Neural Network would perform strongly on the training data but fail to generalise, perform adequately, on the unseen test data known as out of sample performance)
The dramatic growth in data is illustrated by Statista in the chart below:
We’ve gone from approximately 2 Zetabytes in 2012 to 64.2 Zetabytes in 2020! This has mostly been driven by the rise of mobile apps empowered by 4G.
Over the next five years up to 2025, global data creation is projected to grow to more than 180 zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often.
As Oberlo observe the majority of data is being created from smart mobile phones:
Mobile Data In The Era of 3G
This was then followed by 4G whereby data continued rise rapidly albeit at a slower growth rate but a higher base in part fuelled by the continued arrival of digital media and social media applications for example Youtube mobile app being launched in 2012.
Mobile Data In The Era of 4G
VisualCapitalist in collaboration with Racounteur illustrated the sheer volume of data generated in 2020 as shown below:
In addition to the digital media and e-commerce sectors, the energy sector, financial, astronomical physics research, and pharmaceutical drug discovery trading have tended to be examples of other sectors were large data sets exist. However, it is in the social media and e-commerce sectors were the most rapid data growth has occurred. Hence it should be no surprise that the Tech majors are also home to the strongest AI capabilities in the world.
The era of AI alongside 4G led to the rise of social media and e-commerce giants such as Facebook, Twitter, Amazon, Microsoft (LinkedIn) and enabled Google (Youtube) and ByteDance (TikTok) to amass formidable growth and market positions.
Lessons for non-Tech sector (legacy firms)
However, the data is not so valuable in a raw form (just like crude oil needs to be refined to be truly useful) unless it is pre-processed (refined, data engineering) so that it may be useful from a business perspective.
And the firms need to understand that building the data ingestion and pre-processing pipelines are a key investment into enabling Machine Learning capabilities. Without a robust and effective data pipeline the Machine Learning part simply won’t work effectively.
The Machine Learning Process
Moreover, the past decade has been one built around increasing importance of Cloud Computing with the AI capabilities residing on the cloud.
The Next Generation of Tech Superstars
The next generation will feature 5G alongside AI and will lead to a new generation of Tech superstars in addition to some of the existing ones.
In future the variety, volume and velocity of data is likely to substantially increase as we move to the era of 5G and devices at the Edge of the network. The author argues that our experience of development with AI and the arrival of 3G followed by 4G networks will be dramatically overshadowed with the arrival of AI meets 5G and the IoT leading to the rise of the AIoT where the Edge of the network will become key for product and service innovation and business growth.
Furthermore, AI technology will be required by the network operators of 5G networks in order to optimise and manage the networks in near-real time operations.
The Cloud has been fundamental in enabling the growth in Data Science by enabling greater availability of storage and also reduced cost of storage too.
Fog Computing is a term coined by Cisco. It entails the concent of extending the cloud to be closer to where the data is created devices, referred to as fog nodes, that may be deployed anywhere with a network connection: any device that comprises internet connectivity, storage and computing.
As noted by SCC “Simply put, it involves moving your computers closer to the sensors they are talking to.”
SCC provide the example of a modern train maybe equipped with the latest digital equipment including sensors and other internet connected devices that allow for constant monitoring of the performance of the train.
However, as it is moving rapidly and possibly travelling into regions where internet connectivity levels fluctuate, connectivity with the cloud may prove challenging. Hence with the train itself acting as the centralised hub, a Fog Node, for the senor data generated and collected locally, the train itself becomes the server and removes the challenge of constant internet connection with the cloud.
Edge computing maybe considered as the processing of data generated from scenarios and other internet connected devices at the edge of the network closer to where the data is generated. This has the effect of placing computational capabilities towards the Edge (onto the devices and sensors rather than purely on the cloud) and has the beneficial impact of reducing constant traffic flow back and forth on the internet.
Examples of Edge Computing
- An autonomous vehicle is unable to effectively rely upon client-server communications with a remote cloud server that may be a long distance away from it. If the vehicle approaches a traffic light that changes to red then it is highly risky for the vehicle to send a communication to the Cloud server (in effect seeking a response of how it should react) and then wait for a remote cloud server signal to return with the effective response. The same applies with decisions that the vehicle must make to turn left or right including for avoiding pedestrians and other vehicles. Hence the autonomous vehicle requires AI on the Edge (on the vehicle itself);
- Autonomous space robots and space craft. Imagine the robot having to wait for a response signal to travel to and then return from plant earth! Instead the autonomous robot needs to respond in near-real time to the situations it is encountering;
- Healthcare with remote medicine – see examples in section below;
- Our mobile phones will increasingly have AI chips (GPUs) embedded within over the course of the next few years resulting in massive increases in computing power and immediate responses.
Federated Learning with Differential Privacy is a technique that the author believes will be crucuail to enabling Machine Learning to scale across areas such as healthcare and financial services. It will be explored in greater detail in the next article that follows this one. Essentially it is a Machine Learning technique developed by Google in 2017 that enables collaboratively learning from decentralised data whilst preserving data security and privacy. Initiatives also exist with Pysft and Nividia.
5G Standalone and non-standalone and its impact on tech development
What does the next generation of AI alongside 5G actually mean in terms of product development for the end customer and how will AI develop to respond the needs of the customer?
It should be noted that there is a difference between Standalone 5G networks and non-standalone 5G networks. A major difference between NSA and SA being that the control signalling of 5G Radio Networks is anchored to the 4G core network.
As implied by the name, 5G NSA, operates on a non-standalone basis across existing 4G networks and has been more common (due to reduced cost) in a number of regions. Whereas SA provides 5G services with no reliance on 4G network operations.
Investment into 5G
A vast amount of money is required to develop and deploy 5G networks in particular the standalone networks and it will take time for the standalone networks to emerge at scale.
Grandview Associates state that the global 5G infrastructure market was valued at $2.64 billion in 2020 and will hit $4.7 billion in 2021. Furthermore, they forecast that by 2028 it will hit $80.5 billion with a Compound Annual Growth Rate (CAGR) of 49.8% between 2021 and 2028.
Standalone 5G Networks is where the magic will happen
However, the substantial benefits of SA 5G networks is set out in the infographic below may justify the additional cost of investing in SA 5G networks albeit some telecom operators may potentially remain somewhat anxious about the costs.
- Faster speeds: 5G is 10 to x100 faster than what you can get with 4G.
- Lower Latency 5G 1 ms to 4ms vs 4G 100ms to 200ms
- Data rate 100x improvement for 5G relative to 4G
- Millimetre wavelength 5G will support 1m connected devices per square km vs 4G 4,000 per square km
- IoT device battery life will improve 10x with 5G relative to 4G.
The reduced latency and increased device capacity will be of significant importance to the next generation of AI development. One reason is the dramatically lower latency resulting from SA 5G networks that will enable near real-time responses enabling remote medicine and autonomous systems. An example is a surgeon in China who was reported to have successfully conducted remote brain surgery on a Parkinson’s patient 3,000 km (1,864 miles) away using a 5G network!
There will be a dramatic transformation for all sectors of the economy enabled by the pairing of AI technology alongside 5G as we move increasingly onto the Edge of the network (The Edge).
The disruption to healthcare via AI and 5G and cleaner economic growth
The emergence of 5G networks is occurring at a time when our economies and healthcare systems have been deeply scarred by the Covid crisis.
However, 5G and AI at the Edge of the network possess huge potential to assist in both economic growth with GDP and jobs creation as well as the fight against climate change by enabling the fourth industrial revolution with cleaner, more efficient processes.
A report, authored by PWC and commissioned by Microsoft, argued that AI across four sectors (agriculture, energy, transportation and water) may drive a 4% reduction in emissions of Green House Gasses by 2030 whist also creating in excess of 38 million net new jobs along with $5.2 Trillion of GDP growth also by 2030!
Accenture forecast that 5G may drive the creation of 3 million new jobs in the US along with $500Bn in GDP growth.
Furthermore, there role of AI combined with standalone 5G networks may enable a transformational revolution in healthcare.
Our healthcare systems were already facing substantial challenges around the world as we face ageing populations with increases in complex and costly diseases in the OECD countries at a time when national economies and governments re saddled with record levels of debt, and fast-growing populations often with less availability of healthcare resources (medics, nurses, carers, state of the art hospital facilities) in the non-OCED countries.
AI will have a key role to help augment healthcare workers and alleviate costs of healthcare whilst improving patient outcomes as illustrated in the infographic below:
Further examples of AI and the overlap with 5G in healthcare empowering devices and wearables are set out in the infographic below.
Moreover, AR, VR technology will work as intended as it struggles with the latency of 4G networks. This will lead to the rise of holographic calls that has been the realm of science fiction movies.
The dramatic increase in device connectivity will in turn enable massive machine to machine communication for IoT devices.
The power of Machine to Machine communication is illustrated whereby autonomous vehicles may broadcast and communicate with each other even if they are out of line of sight with each other. For example the silver car at the front is broken down but the red care has communicated to the others that there is a stationary vehicle ahead so that they slow down. Machine to machine communication will be crucial for successful autonomous driving because the vehicle can then inform the others that it is going to slow down and turn right or left or vice versa a vehicle behind can inform the vehicles in front that it is about to accelerate and overtake them.
The massive increase in device connectivity is explained by Statista.
For example Statista forecast that by 2025 we may reach 75 Billion internet connected devices that is a staggering nine internet connected devices per person on the planet!
This in turn will translate into a massive flood of data on a scale far greater than ever before.
Indeed IDC Seagate forecast that by 2025 we’ll have 175 Zettabytes of data with 30% of this data consumed real-time!
What does this actually mean? To put this into context this equates to 52.5 Zettabytes and in 2020 Statista believe that we generated 59 Zettabytes of data. Basically we will almost be creating as much (89%) real-time data as the entirety of data we created in 2020!
Earlier in this article we observed that Statista forecast that the amount of volume will be higher amounting to 180 Zetabytes of data and this is in part due to the impact of Covid with more people turning to digital than in prior years.
The following chart from McKinsey illustrates the growth in data traffic for major cities around the world and underscores why we will need SA 5G networks.
We will need Machine Learning and Deep Learning to manage the networks, to make sense of the data and personalise the journey. This is a key reason why I believe that there will not be another AI winter (leaving aside an ongoing AGI winter) this decade.
The wave of new product innovations will be limited merely by our imagination!
However, we also need to consider the following in terms of Machine Learning and Data Science practice:
- A shift away from the giant cloud servers towards hybrid Cloud / Edge computing and Edge computing combined with the Fog;
- The above will lead to a greater prevalence of AI on the Edge (on the device) initially inferencing and hopefully over time actually continually learning in order to respond to dynamic environments including situations where the AI has to learn on the fly as the environment encounters something that is not in its historic data set used for training;
- An ability to work with decentralised data whilst respecting data privacy – requiring further developments in Federated Learning with Differential Privacy;
- Diversity within the teams;
- Ethics including the datasets we work with and the manner in which models are implemented;
- Objective standards;
- Aligning Data Science with Business Strategy and Organizational cultures adapting to the digital era!
There is vast potential to use Machine Learning and Data Science to generate economic and shareholder value. The Tech majors and startup sector have been engines of job creation too. Furthermore, there is real potential to use the power of Machine Learning and Data Science in the fight against climate change in particular as the era of 5G arrives in turn enabling smart cities with smart grids, smart homes, Industry 4.0 with greater efficiencies to reduce carbon footprint.
The next article in this series will explore the exciting evolution of the state of the art in Machine Learning and the exciting research in AI that may emerge over the next few years and empower AI at the Edge of the network and the fourth industrial revolution in the era of the AIoT.
For the purposes of this article AI is defined as: the area of developing computing systems which are capable of performing tasks that humans are very good at, for example recognising objects, recognising and making sense of speech, and decision making in a constrained environment.
Machine Learning is defined as: the field of AI that applies statistical methods to enable computer systems to learn from the data towards an end goal.
Neural Networks: are biologically inspired networks that extract abstract features from the data in a hierarchical fashion. They are often referred to as Artificial Neural Networks (ANNs) in Computer and Data Science.
Deep Neural Networks (Deep Learning): refers to the field of neural networks with several hidden layers. Such a neural network is often referred to as a deep neural network.
Note: This is a guest post written by a guest author who is a specialist in the given field.
Editor’s Note: Feel free to contact the editor at [email protected] if you have any questions or suggestions related to the above article.