While machine learning development is increasing every day, a survey from Algorithmia states that most enterprises spend between 8 to 90 days deploying an ML model. Most people blame the failure to scale, followed by model reproducibility challenges such as lack of official buy-in and inadequate tooling.
LinkedIn has recently open-sourced Dagli, a machine learning library for Java and other JVM languages. This library makes it easy to draft bug-resistant, understandable, modifiable, maintainable, and deployable model pipelines without incurring technical debt.
A directed acyclic graph consists of vertices and edges, with each edge directed from one vertex to another. With Dagli, the model pipeline is represented as a directed acyclic graph for training and inference. The Dagli environment prevents the majority of possible logic errors by providing pipeline definitions like static typing, near-ubiquitous immutability, and other characteristics.
Jeff Pasternack, a natural language processing research scientist at LinkedIn, says that ML models are generally part of an integrated pipeline. This makes the construction, training, and deployment of the pipelines to production more challenging. To accommodate both training and inference, duplicated or extrinsic work is more often required to produce inelastic glue code that complicates the model’s future evolution and maintenance.
Dagli operates on servers, Hadoop, command-line interfaces, IDEs, and other familiar JVM settings. Many pipeline components are available to use right out of the box, including neural networks, logistic regression, FastText, gradient boosted decision trees, cross-validation, cross-training, feature selection, data readers, evaluation, and feature transformations.
Dagli offers a path to superior functioning and production-ready AI models that are convenient to maintain. This provides data professionals with an extensible model that can leverage a present JVM technology stack in the long term. For less experienced software engineers, Dagli presents an API that can be employed with a JVM language and tools, designed to avoid common logic bugs.
The principal aim is to make efficient and production-ready models that are easy to write, revise, and deploy. Efficient production avoids the technical debt and long-term maintenance challenges that usually accompany them. Dagli uses modern, highly multicore processors and powerful graphics cards to efficiently single-machine train these real-world models.
Dagli was released after LinkedIn made the LinkedIn Fairness ToolKit LiFT available. It is an open-source software library that is designed to measure fairness in AI and machine learning workflows. Earlier, LinkedIn has also released DeText, an open-source framework for natural language process-related ranking, classification, and language generation tasks. It leverages semantic matching applying deep neural networks to learn member intents in search and recommender practices.