MLOps (machine learning operations) is based on DevOps principles and practices that increase overall workflow efficiencies and qualities in the machine learning project lifecycle. In this post, we will start by highlighting general concepts of Microsoft MLOps Maturity Model. Then we will introduce MLOps architectural patterns using Azure Machine Learning CLI v2 and GitHub, elaborating on the challenges and action items. Practical code snippets and step-by-step sample codes are out of scope for this blog, but it will be published in our next blog.
To clarify the principles and practices of MLOps, Microsoft defines MLOps maturity model. Different AI Systems have different requirements and different organizations have different maturity levels. Hence, an MLOps maturity model will be very useful to:
The table below is an abstract view of each level of MLOps maturity model. Note that this is a generic guideline, and we recommend you customize it for your own MLOps project. We will introduce more details in the following sections of this blog.
Maturity Level | Training Process | Release Process | Integration into app |
---|---|---|---|
Level 0 - No MLOps | Untracked, file is provided for handoff | Manual, hand-off | Manual, heavily DS driven |
Level 1 - DevOps no MLOps | Untracked, file is provided for handoff | Manual, hand-off to SWE | Manual, heavily DS driven, basic integration tests added |
Level 2 - Automated Training | Tracked, run results and model artifacts are captured in a repeatable way | Manual release, clean handoff process, managed by SWE team | Manual, heavily DS driven, basic integration tests added |
Level 3 - Automated Model Deployment | Tracked, run results and model artifacts are captured in a repeatable way | Automated, CI/CD pipeline set up, everything is version controlled | Semi-automated, unit and integration tests added, still needs human signoff |
Level 4 - Full MLOps Automated Retraining | Tracked, run results and model artifacts are captured in a repeatable way, retraining set up based on metrics from app | Automated, CI/CD pipeline set up, everything is version controlled, A/B testing has been added | Semi-automated, unit and integration tests added, may need human signoff |
As you evolve into a higher level of maturity, you may notice that it is not only about technology, but also the organization, the people, and the culture will become more important. As this blog is focused on the technology, please see AI Business school to learn about AI business strategies in your organization and AI-ready culture.
Azure Machine Learning is an open-source friendly, machine learning platform that can be used to implement full machine learning lifecycle and MLOps through integration with GitHub (or Azure DevOps) and Responsible AI technologies which support you to develop, use and govern AI responsibly.
In the following sections, we will use Azure Machine Learning Workspace and assets to implement MLOps. Fundamental knowledge with Azure Machine Learning, GitHub, and GitHub Actions concepts are recommended to follow the steps below.
Azure Machine Learning Workspace is the top-level resource of Azure Machine Learning. Azure Machine Learning Workspace leverages Azure Storage Account
, Azure Container Registry
, Azure Key Vault
, Azure Application Insights
, and related Azure services depending on your requirements. Here is a list of Azure Machine Learning assets we want you to understand before reading this blog.
For more details, please check out How Azure Machine Learning works: resources and assets (v2).
The tables that follow identify the detailed characteristics for this level of process maturity.
People | Model Creation | Model Release | Application Integration |
---|---|---|---|
|
|
|
|
If you use Azure Machine Learning, Compute Instance
is a good place to start your machine learning journey. It includes essential Python/R libraries and development tools like Jupyter Notebook, JupyterLab, R Studio and VSCode remote development feature. After trained models, you can deploy by yourself or useManaged Endpoints
to deploy models quickly.
Next, we will introduce Level 1 : DevOps no MLOps
This is a level that data scientists work on a standardized machine learning platform. Data pipeline is maintained by data engineers, so it is easy to get data for training model.
However, reproducibility in model training is still an issue because machine learning assets such as data and python packages are not shared. Onetime job dependencies within a training workflow are complicated but they are not shared by each data scientists.
People | Model Creation | Model Release | Application Integration |
---|---|---|---|
|
|
|
|
Azure Machine Learning is a shared and collaborative machine learning platform. By using the Data
feature to connect to Azure storage services like Azure Data Lake Storage Gen2, you can access to data source easily from Azure Machine Learning and mount/download it into a Compute Instance
. Once you trained model, you can register it to the Model
registry and deploy it using Managed Endpoints
that streamlines the model deployment for both real-time and batch inference deployments. The general concept is explained in this document.
Azure Machine Learning can be integrated with GitHub, allowing you to share your train and score code on GitHub code repository and to test codes automatically using GitHub Actions.
Next, we will introduce Level 2 : Automated Training
People | Model Creation | Model Release | Application Integration |
---|---|---|---|
|
|
|
|
Typically, the workflow of training models is defined by a Pipeline
which captures complex job dependencies (if the training model workflow is simple, you don't have to use it).
Job
manages your training job, logs parameters, and metrics. Azure Machine Learning CLI v2 is a new version of Azure Machine Learning API. You can define your job using YAML files and execute your job using Azure Machine Learning CLI v2. Python SDK v1&v2 is also available if you want to define by Python code.
Finally, an Environment
defines Python packages, environment variables, and Docker settings for training model and scoring model.
You can use GitHub Actions to trigger model training job easily via Azure ML CLI v2. Typical cases of triggering model training job are as follows:
Next, we will introduce Level 3 : Automated Model Deployment
In this level, trained models are deployed automatically to eliminate human errors and increase quality, security, and compliance in the machine learning application.
The tables that follow identify the detailed characteristics for this level of process maturity.
People | Model Creation | Model Release | Application Integration |
---|---|---|---|
|
|
|
|
Managed Endpoint
using Azure ML CLI v2. GitHub Actions works well with Azure CLI Action for automation, but you don't have to automate the full lifecycle. If you want to put a release gate between Stage and Production, you can use GitHub environments. Next, we will introduce Level 4 : Full MLOps Automated Retraining
In this level, monitoring and retraining are introduced for continuous model improvement.
People | Model Creation | Model Release | Application Integration |
---|---|---|---|
|
|
|
|
Metrics and logs are stored and analyzed in Azure Monitor and Azure Application Insights. You can check if deployed model performs well or not and if system performance needs to be improved or not. A retraining job is triggered by the defined metrics automatically or via human judgment.
When deploying new model into an existing inference endpoint, Managed Online Endpoint
for examples, you can use blue-green deployment (and mirrored traffic) to control traffics into production inference endpoint without affecting the existing production environment before rolling new model out completely.
We introduced the concept of an MLOps maturity model and explained the architecture and product features of Azure Machine Learning and GitHub used in each maturity level. We also discussed the challenges and what's next for each maturity level. In the next blog, we will describe how to implement MLOps using Azure Machine Learning and GitHub.
We would like to thank the following people for helping and contributing to us with this post:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.