Version control is used to keep track of modifications made in a software code. Similarly, when building machine learning (ML) systems, it is essential to track things, such as the datasets used to train the model, the hyperparameters and pipeline used, the version of tensorflow used to create the model, and many more.
ML artifacts’ history and lineage are very complicated than a simple, linear log. Git can be used to track the code to one extent, but we need something to track your models, datasets, and more. The complexity of ML code and artifacts like models, datasets, and much more requires a similar approach.
Therefore, the researchers have introduced Machine Learning Metadata (MLMD), a standalone library to track one’s entire ML workflow’s full lineage from data ingestion, data preprocessing, validation, training, evaluation, deployment, etc. MLMD also comes integrated with TensorFlow Extended.
Beyond versioning your model, ML Metadata captures the training process’s full lineage, including the dataset, hyperparameters, and software dependencies. As an ML Engineer, one can use MLMD to trace wrong models back to their dataset and even trace from a wrong dataset to the models one trained on it. While working in ML infrastructure, one can also use MLMD to record their pipeline’s current state and enable event-based orchestration. Users can also allow optimizations like skipping a step if the inputs and code are the same, memoizing steps in your pipelines. MLMD can be integrated into the training system to create logs for querying later automatically. This auto-logging of the full lineage of training is the best way to use MLMD as it holds the complete history without extra effort.
MLMD is a crucial foundation for multiple internal MLOps solutions at Google. Furthermore, Google Cloud integrates tools like MLMD into its core MLOps platform:
The foundation of all these services is the ML Metadata Management service in the AI Platform allowing AI teams to track all the necessary artifacts and experiments they run, providing a curated ledger of actions and detailed model lineage. This helps users determine model provenance for any AI model train for debugging, audit, or collaboration. AI Platform Pipelines will track artifacts and lineage automatically, and AI teams can use the ML Metadata service directly for custom workloads, antiques, and metadata tracking.