MLOps Components Review and Machine Learning Platform Selection

March 17, 2023
6 min read

In an enterprise when we are thinking about developing a machine learning strategy, we need to think about the machine learning platform, MLOps, machine learning toolset, cloud data platforms, model training / retraining / monitoring / compute / maintenance / deployment features. In the modern development world, MLOps covers most of the things an enterprise machine learning teams need to follow continuous integration (CI), continuous development (CD) and continuous training (CT) process.

In this series, “Tech leaders’ journey to AI using Azure ML platform”, we will cover from the start to the end of the machine learning platform selection process, MLOps best practices and machine learning model training / deployment / pipeline design steps in detail. In this first article, I will mostly talk about MLOps process, the role of the MLOps engineer, best practices, platform selection and Azure Machine Learning components.

An image depicting the 5-part life cycle of MLOps including model creation, model registration, model environment preparation, model deployment and model monitoring.
Figure 1: MLOps life cycle View Full Size

What Is MLOps?

MLOps (machine learning operations) defines a process that determines the ideal practices of machine learning model training / retraining, model registration, monitoring, deployment, model scalability and options to bring production reliability and efficiency through automation pipelines.

Machine learning Ops ensures that the model is reproduceable, keeps track of model training logs, gives us an idea about model deployment environment, model runtime, model deployment-related necessary configurations and model monitoring process. Currently machine learning platforms are producing tons of models every day and it is impossible to operate the model training, retraining and deployment process manually. To make this process faster, easier and cost-effective, the industry has adapted to MLOps concepts.

What Are the Roles and Responsibilities of the MLOps Engineer?

In our industry we are familiar with the role of a DevOps engineer who maintains a development platform, implements CI/CD pipelines, ensures collaboration between teams, brings speed and stability to development environments and works in the application platform and infrastructure level. MLOps engineers are the ones who ensure deployment, monitoring, reproducibility, scalability of machine learning models and maintain CI-CD-CT pipelines. MLOps engineers deploy the model in a production environment, test the configuration along with runtime, and ensure that the model is always live and ready to make predictions.

What Are the MLOps Components?

MLOps platforms should have five key components and they are:

  • Model training / retraining: The MLOps platform should have automation pipelines to trigger ML model training and retraining jobs based on different custom metrices. A data scientist is the one who defines those metrics as the model retraining threshold.
  • Model registration and tracking: The MLOps platform should have the option to keep track of model versioning, model metadata management and model artifact storing. When a data scientist is training their model, they are creating different versions of the model with separate sets of hyper-parameters and separate sets of features based on feature engineering with different sampling rates of the dataset. MLOps needs to have automated tracking options to ensure reproducibility of the model.
  • Model audit trail: During the development phase of the model, we need to create the model training environment, we need to review model results and we need to ensure that the model is explainable following balanced classes; skipping any features that are not ethical to use following the ethical best practices of AI model development. We can automate these processes within the MLOps platform using a model audit trail component.
  • Model deployment: Once the model is trained and we are confident with the performance of the model, we need to deploy the model within the MLOps ecosystem to complete the AI loop of MLOps-based automation. We can use an AI model for real-time / batch prediction through a web API or IOT-based use cases or within a back-end analytics pipeline. MLOps ensures automatic deployment of the model along with model scalability.
  • Model monitoring: Once the model is deployed, we need to observe the model's health to ensure this model is producing relevant predictions. There are multiple ways of model monitoring, two key ways are model health monitoring and model performance monitoring. MLOps ensures that model health is always green, the model is always available for prediction, scalability is automatically maintained and configurations are consistent across all pods. For model performance monitoring, we need to store all the predicted results along with the input values to analyze the data later to make sure that the model is not producing garbage results and to know when to retrain the model.

Which MLOps Platform Should You Choose and What Are the Key Considerations You Should Follow?

All the key cloud providers provide MLOps platform within their ecosystem. If you are curious to learn, check out the MLOps platforms of:

You can also check other popular MLOps platform that can be deployed in different cloud environments, like:

In this series we will only cover Azure Machine Learning within Microsoft Azure cloud. As a technical leader, if you are setting up the machine learning strategy of your organization, the ideal choice will be to choose a platform where you get all the MLOps components and that they are working side by side instead of choosing different tools for different MLOps components. One of the key best practices is to choose a MLOps platform that is close to your cloud and data ecosystem and uses cloud native tools for the MLOps components I mentioned in the previous section. This will save time, complexity, and will minimize security issues and cost for your organization. In Microsoft Azure cloud platform, you get cloud, data platform features, security and MLOps components from a single place and you need to pay based on your resource consumption only. Choosing different systems for different components and choosing licensing model-based systems instead of consumption-based model could be costly to run a complete loop of MLOps.

MLOps in Azure Machine Learning Platform

Azure Machine Learning platform has a set of tools available for MLOps activities starting from machine learning compute, security components, data source connections, pipelines for automations, model registry and model deployment options. Here are the key components we will cover in this series,

  • Model training / retraining: Azure AutoML, Azure Machine Learning designer, Azure Machine Learning notebook, Azure Machine Learning compute, Azure Blob Storage.
  • Model registration & tracking: Azure container registry (ACR)
  • Model audit trail: Azure Machine Learning pipelines, Azure Machine Learning data guardrail, and other security components within Azure.
  • Model deployment: Azure Container Instance (ACI), Azure Kubernetes Services (AKS)
  • Model monitoring: Application Insight, Log Analytics, Data Drift within Azure ML, Cosmos DB.

Series articles (will have links as they are published)

  1. MLOps Components and Machine Learning Platform Selection (this article)
  2. Azure Machine Learning Workspace Review
  3. Azure Machine Learning Compute Review
  4. Machine Learning Workflow Review
  5. Azure ML Notebook Selection and Development Process
  6. Connecting Data Sources with Azure ML Workspace
  7. Azure ML Security Review
  8. Azure ML Model Training
  9. Azure ML Model Registration and ML Job Automation
  10. Azure ML Model Deployment in ACI
  11. Azure ML Model Deployment in AKS
  12. Azure ML Model Health Monitoring
  13. Azure ML Model Drift and Data Drift Review
  14. Azure ML Model Retraining Pipeline
  15. Azure ML Model Result Analysis Dashboard with PowerBI

Learning References

Rahat Yasir

Rahat Yasir

Rahat Yasir works at ISAAC Instruments as Director of Data Science & AI to lead their Data & AI initiatives for data-driven & AI-powered transportation industry. He was selected as Canada's top 30 software developer under 30 in 2018. He is an eight times Microsoft Most Valuable Professional award holder in the Artificial Intelligence category. He has years of experience in imaging and data analysis application development, cross-platform technologies and enterprise system design.