This Article Is Based On The Research Article 'HPE MACHINE LEARNING DEVELOPMENT SYSTEM' and 'HPE uses blockchain for distributed machine learning models'. All Credit For This Research Goes To The Researchers Of This Paper 👏👏👏 Please Don't Forget To Join Our ML Subreddit
HPE has unveiled two new AI products. One that constructs and trains machine learning (ML) models at scale. The other is a decentralized ML system that allows remote or edge installations to communicate model updates.
HPE Machine Learning Development System
The HPE Machine Learning Development System is a hardware and software platform. The HPE Machine Learning Development Environment (MLDS) is integrated with HPE compute infrastructure to deliver a system that can reduce the typical time-to-value from building and training machine models from weeks to days. The speedup is supplied as an integrated solution with a pre-configured infrastructure suited for ML model creation, allowing customers to focus on training ML models rather than worrying about infrastructure configuration. These tools include GPUs and software tools to help ML engineers automatically scale-out their workflows.
The core architecture is built on HPE Apollo 6500 Gen10 server nodes with eight Nvidia A100 80GB GPUs and Nvidia Quantum InfiniBand networking. Up to 4TB of RAM and 30TB of NVMe local scratch storage are available on Apollo nodes, with HPE Parallel File System Storage as an option. To control the system, there are additional ProLiant DL325 servers that operate as service nodes and are connected to the enterprise network through an Aruba CX 6300M switch.
The system is supplied in four nodes, but it can be further expanded. The software stack runs on Red Hat Enterprise Linux. It includes the Machine Learning Development Environment and HPE Performance Cluster Manager for provisioning, managing, and monitoring the server nodes.
Internal tests using customer workloads found that HPE MLDS with 32 GPUs is up to 5.7 times faster at natural language processing than a comparable platform with the same GPUs that didn’t have the HPE-provided optimized interconnect.
It is now available for purchase worldwide.
HPE Swarm Learning
HPE Swarm Learning is a decentralized machine learning framework for the edge or scattered sites developed by Hewlett Packard Labs, is HPE’s other AI announcement.
Swarm Learning does not feed data back to a centralized location like a data center (where a master ML model is updated and the changes are distributed). A distributed node group can share any updated parameters that each system’s ML model may have learned while operating.
This latter strategy can be inefficient and costly if vast amounts of data must be transferred back to the mothership. It may also violate data privacy and ownership laws that limit data sharing. In some instances, moving data from the edge to the core has compliance and GDPR consequences, so relocating everything to one central place is not straightforward.
HPE Swarm Learning, on the other hand, allows models to be trained locally, with the learning from those models being shared among nodes rather than the data. This entails establishing a peer-to-peer network between the various nodes and guaranteeing that model parameter can be safely transferred. The latter is accomplished through blockchain technology, commonly utilized in cryptocurrency systems to ensure that transactions cannot be tampered with or that any tampering is instantly apparent.
There are a variety of business scenarios in which an ML model may be deployed across several sites, and a simple approach to keep all of the models updated in sync would be beneficial. One such use is in financial services fraud detection, where HPE Swarm Learning, in conjunction with their data analytics platform, spots odd behavior in credit card transactions. The two technologies can improve accuracy when training machine learning models from massive amounts of financial data from various bank branches across a big area.
Manufacturing is a more common edge use case, where predictive maintenance utilizing machine learning may prevent unexpected machinery downtime. Swarm learning might increase the system’s accuracy by gathering understanding from sensor data from different production locations.
Swarm Learning is part of a containerized Swarm Learning Library that may operate on Docker within virtual machines and is hardware agnostic. Most nations now have access to the platform. For a further read, refer here.