August 22, 2022

Photo by Luke Peters on Unsplash


At the time of this writing, organizations are still putting notebooks into production! Fortunately, the machine learning space is slowly beginning to adopt software engineering best practices. Among these is MLOps, the machine learning equivalent of DevOps.

Continuous Integration

In DevOps, we continuously integrate code changes into the build. Similarly, in MLOps, we continuously integrate code and data changes into the model.

Let’s examine the steps in closer detail:

  1. A source code change is committed to the dev branch (i.e. a pull request is merged), or in the case of MLOps only, a new version of the data is made available.
  2. The first stage of a CI job is automatically kicked off (e.g. Jenkins, Azure DevOps, etc) which trains a new model using the latest version of the code and data.
  3. Assuming the first stage was successful, another one is started. This one evaluates the performance of the model.
  4. Assuming the second stage was successful, a third stage is started. This one uploads the model to a model repository.
  5. An optional fourth stage could build an image with the model and upload it to a container image repository.

Continuous Delivery

In DevOps, we continuously deploy the new versions of the build to the staging (or production depending on your level of comfort) environment. Similarly, in MLOps, we continuously deploy the new versions of the model to the staging environment.

Let’s examine the steps in closer detail:

  1. A new image is pushed to the container image repository.
  2. The new image is automatically deployed to a managed container orchestration service or ML inference service in the cloud.

The setup should include the following:

  • Automatic resource scaling based on demand
  • Load balancers
  • A/B testing support

Continuous Monitoring

Once the model has been deployed to production, it needs to be monitored continuously. The reason being, the allocated resources may be insufficient to meet the demand, the model could be biased and we might encounter data drift.

To get a better idea, the AWS Sagemaker service supports the following metrics:

  • The number of requests sent to the model endpoint (e.g. 100 HTTPS requests per second).
  • The time the model(s) took to respond. This includes the time it took to send the request, to fetch the response from the model container, and to complete the inference in the container (e.g. 100 microseconds).
  • The time it takes to launch new compute resources for a serverless endpoint (e.g. 1 second).

Continuous Training

It’s not enough to re-train the model whenever there is a change to the code. We need to account for a change in the nature of the production data overtime. The model can become less accurate because the data used to train the model is no longer representative of the new data in production.

There are multiple ways we can trigger the training process:

  • We can periodically train the model (e.g. setup a cron job).
  • We can train the model using new data whenever it’s performance (e.g. accuracy) falls below a certain threshold.
  • We can train the model whenever we detect a significant change in the production data.
  • We can train the model on demand (i.e. manually).

When re-training the model, we also need to consider how much data is needed. Options include:

  • Fixed window
  • Dynamic window
  • Representative subsample

We also need to think about what should be re-trained:

  • Transfer learning vs training from scratch
  • Offline vs online learning

Explainable AI

Data driven decision making is increasingly regulated. Data practitioners have to be able to explain how a machine learning model came to a certain conclusion on short notice to avoid discrimination. There are multiple techniques (i.e. SHAP) that can be used to explain a model’s predictions even if they aren’t very interpretable by default.

Workflow Orchestration

Workflow orchestration coordinates the tasks of an ML workflow pipeline according to Directed Acyclic Graphs, or DAGs for short. DAGs define the execution order by considering the relationships and dependencies between tasks. At the time of this writing, the most popular open source orchestrator is Apache Airflow.

ML Metadata Tracking & Logging

We hinted at this previously when we mentioned data versioning. We should be keeping track of the following information:

  • The training date & time
  • The training duration
  • The hyperparameters used when training the model
  • The values of the evaluation metrics
  • The version of the data used to train the model
  • The version of the code used to train the model

Profile picture

Written by Cory Maklin Genius is making complex ideas simple, not making simple ideas complex - Albert Einstein You should follow them on Twitter