AI Development Cycle for Edge AI
In the last decade, AI research has brought astonishing results in many fields, and, undoubtedly, AI is nowadays a central technology in many aspects of our life. As new ideas are proposed every day, this continuous research usually comes with infinite applications: from the algorithms assisting surgeons in complex operations to the one which allows unlocking our phone using just our face. In this evolution from the idea to the actual implementation, it is often ignored how hard the passage between theoretical research and working application is.
We can refer to this process as AI Development Cycle for Edge AI and can be divided into three phases related to 1) data, 2) model, and 3) evaluation.
Many aspects must be considered: first, each different AI application requires a specific dataset. For this reason, in this step, the aim is to prepare the data, which, as is well known, is one of the crucial topics of AI: a good algorithm always relies on a good dataset. This phase can be divided into data collection, curation, labeling, and preparation.
Once the dataset is ready, the correct model has to be chosen and trained for the specific task. These two first steps are the basis for every AI application, and most research papers stop here. But for Edge AI, i.e., AI on embedded devices, the hard part comes now. Indeed, the model that has been trained before could be utterly useless if it could not run on the target hardware (HW). The first idea to avoid this step could be to use servers and the cloud: this, unfortunately, comes with other problems, such as the high-cost performances, the network connectivity issues, and the always present problem of data privacy.
Thus, the model must be re-think to be embedded in a particular device. For this reason, the next step should aim at reducing as much as possible the latency of the model (a metric that refers to the time taken to process one unit of data provided only one unit of data is processed at a time) while increasing the accuracy. For this purpose, an essential and costly aspect is the search for the appropriate neural network architecture among the thousands existing. In addition, compression techniques (such as pruning and filter decomposition) must be considered to reduce the computational resources needed by the model. Thirdly, all these modifications must be tested in a real environment, dealing with edge case data, i.e., scenarios in which the systems do not perform as required or as expected.
This whole process is repetitive and could take an insane amount of time.
But time is not the only issue: we should also consider the precision needed with data organization, the difficulty of meeting performance requirements for production, the troubles in finding and setting up optimization methods, the problem of meeting performance requirements for production, the difficulty of finding and setting up optimization methods, and the difficulty of optimizing AI models, to name a few.
Introducing Nota AI’s NetsPresso
This is where NetsPresso from Nota AI comes in. NetsPresso is an end-to-end AI optimization platform that builds and optimizes AI models for a target HW, drastically reducing development time from months to weeks. NetsPresso requires no expertise and automatically searches, trains, and optimizes models based on the input dataset and tests their performance on actual HW. It relies on modularized features to minimize overlap with the default pipeline, and it utilizes state-of-the-art technologies for production-level AI models. This leads to an exponential decrease in cost and increased model performance.
Overview of Model Search, Compression Toolkit, Model Packaging
As already described, NetsPresso takes as input the raw data and returns an optimized AI model. To do so, three main sub-applications take care of the different phases: Model Search, Compression Toolkit, and Model Packaging. The first one automatically searches for optimized models for the target device, the second one makes the compression process easy and fast, and the third one converts and benchmarks the model to deploy on devices immediately.
Focus on Model Search and Compression Toolkit
The Beta versions of the first two steps are already available, while for the third, we should wait until July 2022.
The Model Search toolkit uses a neural architecture search (NAS) technique to select the correct model given a dataset (which can be uploaded or chosen from those offered by NetsPresso) and a target device. In this set, it is possible to choose between various options based on your needs (such as detection, classification, and segmentation) and select a model in accordance with different priorities, such as accuracy, latency, or trade-off. In addition, it supports three types of output models (TensorRT, TensorFlow Lite, and OpenVINO) and can be selected with awareness of the actual HW on which they will run. More precisely, it supports target hardware both for edge (NVIDIA Jetson and Raspberry Pi) and server (Intel Xeon) deployment. Currently, the search process can be performed in the “quick search” modality, which returns one single well-performing model and takes 1 to 3 days. An “advanced search” feature is also in development, which returns multiple models with excellent performances but may take from 1 to 2 weeks.
In the next release of July 2022, the list of the available target device will be updated, adding ARM virtual hardware and NVIDIA T4 for edge and server deployment, respectively.
In addition, two fundamental techniques in Deep Learning, fine-tuning and transfer learning, will be made available. The former involves taking a model trained on a large dataset and applying it to a new and usually similar dataset. It is mainly performed for tasks where the dataset has too little data to train a complete model from scratch, and, for this reason, it is becoming the standard approach for many areas where you cannot rely on enough data.
The latter consists of unfreezing part of the previously obtained model and re-training it on the new data with a small learning rate. This can potentially achieve significant improvements by incrementally adapting the pre-trained features to the new data.
The second step, the Compression Toolkit, handles the compression process of the model, which can be uploaded or selected from the NetsPresso model zoo. It supports all convolutional neural network (CNN) architectures of the two most popular deep learning frameworks, PyTorch and Tensorflow, thus eliminating the usual months spent implementing the technology.
The toolkit can be used to apply several methods to make the model lighter with minimal information loss. The two methods supported are 1) structured pruning and 2) filter decomposition. Briefly, structured pruning concerns the removal of network group entities, such as neurons, layers, or channels, while filter decomposition is the study of representing filters as a combination of smaller matrices and tensors. The available techniques associated with the first group are index pruning, L2 norm pruning, geometric median pruning, and nuclear norm pruning, while those in the second group are Tucker and CP decomposition.
Often, it can be not trivial for non-expert users to choose how to perform the correct compression. For this reason, NetsPresso offers two solutions for assistance: the first is the Model Visualization feature, which allows displaying the model layers as graphs in order for the user to be able to check the model configuration; the second is the Model Profiling feature, which enables profiling models in layers and provides information that aid decision making regarding the compression configuration.
In addition, if the user has still some doubt about the configuration after utilizing these two features, NetsPresso also offers a powerful recommendation system to complete the configuration automatically, along with the option to show the different compressed model versions.
Different ways to utilize NetsPresso
One of the most formidable features of NetsPresso is its modularity. Handling the three toolkits separately or in succession makes it possible to build several pipelines designed following your needs. A graphical representation of 5 potential use cases is shown in the image below. If you have a dataset, you can search a model using Model Search based on it. After this step, you can simply download the model or continue the pipeline. The model can be compressed using the Compression Toolkit (or not, if you don’t need to) and passed to Model Packaging for deployment. But the starting point of this pipeline does not have to be the dataset; you can also start from your pre-trained model, using NetsPresso for compression or even directly for the packaging.
Introducing Nota AI’s Edge Solutions
Through NetsPresso, it is possible to build a customized pipeline to deploy your model on specific HW. But Nota AI also offers ready-to-use Edge solutions which can be directly integrated into your application. The two leading-edge AI-driven solutions they provide are DMS (driver monitoring system) and ITS (intelligent transportation system).
The former could be used to monitor the driver and detect drowsiness and distraction. In addition, it also allows customization to process automatic adjustments. It has high real-time performance on low-cost edge devices, such as CortexA53 and Ambarella CV25, and is available for various camera types, such as IR, RGB, and hybrid IR-RGB. Last but not least, it supports multiple types of installation positions on the car (windshield, cluster, and rear-view mirror).
Taking advantage of several indicators, such as eye status, head position, and the number of yawns, it is able to give an automatic alarm to warn the driver. Plus, it can also be personalized to detect specific driver actions, for example, smoking or making a phone call.
The goal of ITS is to improve and make the transportation system safer. It can be integrated directly into devices, and its applications are countless. For example, smart cameras can monitor traffic in real-time while being very resilient to environmental conditions (weather and sunlight). Another type of device on which AI cameras can be deployed is traffic lights: through reinforcement learning, ITS can optimize vehicle interactions, especially in isolated intersections and road networks. Finally, ITS can also be used for smart crossing and real-time incident prediction, people detection, parking lot monitoring, and many others.
About Nota AI
NetsPresso and the two edge solutions we just introduced (ITS and DMS) are developed by Nota AI, a company founded in 2015 and based in South Korea, the United States, and Germany, whose goal is to bring the benefits of AI to everyone and everywhere, democratizing its use across industries and applications. The company maintains partnerships with more than 35 global market leaders from a wide range of industries and sectors such as construction, manufacturing, logistics and mobility (Nvidia, Microsoft, Samsung, and AWS, to name a few).
The company has raised $14.7 million in a Series B funding round and additional funding from Kakao Investment. The company plans to use this recent investment funding to improve its platform and expand into new markets in Europe and North America. Nota AI will be a gold sponsor at the Embedded Vision Summit 2022 and at the AutoML Conference 2022, where it will present the aforementioned NetsPresso platform and its sub-modules.
Check out Netpresso Model Search and Compression Toolkit
Thanks to Nota AI for the thought leadership/ Educational article. Nota AI has supported this Content.
Leonardo Tanzi is currently a Ph.D. Student at the Polytechnic University of Turin, Italy. His current research focuses on human-machine methodologies for smart support during complex interventions in the medical domain, using Deep Learning and Augmented Reality for 3D assistance.