MLOps in practice: how to design the inference part of AI applications

Four examples of how to structure an AI service for model inference

Niels van den Berg
8 min readMay 16, 2023

Introduction

AI models are only helpful when they’re put into action, like in applications. MLOps is all about having the right setup and procedures in place to train and use these models in a consistent and organized way. That means making sure everything is reproducible, scalable, easy to maintain, and can be checked up on later.

Together with platform engineers, MLOps engineers are the ones in charge of building and looking after the systems and tools that data scientists and ML engineers use to make and release their models. They make it possible for the team to work smoothly and efficiently, with everything they need right at their fingertips.

Based on my experience, designing a system that balances code standards and flexibility for data scientists can be quite challenging. On the one hand, it’s crucial to ensure that the code adheres to best practices and is maintainable over time. On the other hand, data scientists need the freedom to experiment with different model types and approaches to achieve the best results. After all, you don’t want to frustrate data scientists.

As described in my other blog post series, it is essential to start with a good design. This blog post describes a template for the model inference part of AI apps. I start simple and I extend in four steps.

But first, let’s define the components of an application that uses AI.

What defines an AI app?

By breaking down the app into multiple components rather than designing it as a monolith, we are able to achieve a clear separation of concerns. This approach allows us to update each component independently, which is highly beneficial in terms of flexibility and scalability. An AI app consists of the following elements:

  • App orchestration: a component that manages all tasks in the app;
  • Other business logic: everything that is not AI, for example, an API;
  • AI service: everything AI, like model inference.
Figure 1: components of an AI app. Image by the author.

New data flows into the app, where it is processed by business logic and used in the AI service. AI model predictions are used in other parts of the app, for example, exposed via an API.

Let’s zoom in on the AI service. What is happening there? The AI model is loaded during initialization. During runtime, all model steps are run in order and managed by orchestration.

Figure 2: the steps of the AI service, are managed by some form of orchestration. Image by the author.

The app components can be built in various ways. Some options are:

  • An object-oriented approach is where an instance of the service class is initiated and where its methods are used as needed. Orchestration via a function or method.
  • A microservice, where the orchestration is handled within the AI service or via an orchestration service (service-to-service communication).
  • A machine learning pipeline, where the model steps are executed in jobs. Orchestration via the pipeline.

While the tooling may differ, the high-level setup remains the same across all options.

Examples of an object-orientated approach

Let’s use the first option as an example to dive deeper into the various components of the AI service. In four steps we add extra functionality while ensuring reproducibility and quality:

  1. The basics
  2. Adding service contracts for data validation
  3. Store the model artifact outside the AI service
  4. Use a config file to refer to models and logic

The complete code can be found on GitHub.

Version 1: the basics

The code below shows a class that covers all steps of the AI service as in Figure 2, although ‘data loading’ and ‘returning results’ are only included implicitly.

import pandas as pd
from sklearn.externals import joblib


class AIService:
"""A class for performing inference with a pre-trained machine
learning model."""

def __init__(self, model_path: str) -> None:
"""Initialize the AIService object with a pre-trained machine
learning model."""
self.model = joblib.load(model_path)

def preprocess(self, input_data: pd.DataFrame) -> pd.DataFrame:
"""Preprocess the input data."""
return input_data.drop(["id"], axis=1)

def inference(self, preprocessed_data: pd.DataFrame) -> pd.DataFrame:
"""Perform inference with the preprocessed data."""
return self.model.predict(preprocessed_data)

def postprocess(self, input_data, predictions: pd.DataFrame) -> pd.DataFrame:
"""Postprocess the model's predictions."""
return pd.DataFrame(
{
"id": input_data["id"],
"prediction": predictions,
}
)

def orchestrate(self, input_data: pd.DataFrame) -> pd.DataFrame:
"""Run the entire inference pipeline on the input data."""
preprocessed_data = self.preprocess(input_data)
predictions = self.inference(preprocessed_data)
return self.postprocess(input_data, predictions)


my_model_service = AIService("my_model.joblib")
result = my_model_service.orchestrate(input_data)

Note that our class does not cover other tasks such as data storage interactions. Based on my experience, during the early stages of development, the data storage infrastructure undergoes frequent changes. Thus, by separating data storage from other services, we achieve loose coupling.

In this approach, preprocessing and inference are closely related as inference heavily relies on preprocessing. Different model types or versions may require different features. Therefore, they are tightly coupled in our approach.

If preprocessing takes too much time during the inference process, such as in cases where the app is exposed as an API or when it is used in a streaming design, we can consider an online feature store. By using an online feature store, we can pre-calculate the features and retrieve them during inference, reducing the inference time.

Version 2: adding service contracts for data validation

In order to maintain a consistent data structure between the orchestration service and the AI service, it is important to validate the incoming and outgoing data. By doing so, we can ensure that our AI services can be updated independently from other components without breaking the app.

Figure 3: adding contracts by validating incoming and outgoing data. Image by the author.

The way you build this depends on your app implementation. For our example, we use pydantic it to validate incoming and outgoing data types. The structure of our AI service becomes as follows:

Figure 4: adding steps for data validation to the AI service. Image by the author.

The implemented data validation in the class-based approach is as follows, where the following is added:

  • pydantic BaseModel classes to validate input data and output data;
  • A method validate_dataframe;
  • An adjusted orchestrate method that includes two new references to the validate_dataframe method.
import pandas as pd
from pydantic import BaseModel, validator
import joblib
from fastapi.encoders import jsonable_encoder


class InputData(BaseModel):
id: intpython
feature1: float
feature2: float

@validator("id")
def id_must_be_positive(cls, v):
if v <= 0:
raise ValueError("id must be a positive integer")
return v


class OutputData(BaseModel):
id: int
prediction: int

@validator("id")
def id_must_be_positive(cls, v):
if v <= 0:
raise ValueError("id must be a positive integer")
return v


class AIService:
"""A class for performing inference with a pre-trained machine
learning model."""

def __init__(self, model_path: str) -> None:
"""Initialize the AIService object with a pre-trained machine
learning model."""
self.model = joblib.load(model_path)

def validate_dataframe(
self, schema: BaseModel, dataframe: pd.DataFrame
) -> pd.DataFrame:
list_validated_data = [schema(**row) for _, row in dataframe.iterrows()]
return pd.DataFrame(jsonable_encoder(list_validated_data))

def preprocess(self, input_data: pd.DataFrame) -> pd.DataFrame:
"""Preprocess the input data."""
return input_data.drop(["id"], axis=1)

def inference(self, preprocessed_data: pd.DataFrame) -> pd.DataFrame:
"""Perform inference with the preprocessed data."""
return self.model.predict(preprocessed_data)

def postprocess(self, input_data, predictions: pd.DataFrame) -> pd.DataFrame:
"""Postprocess the model's predictions."""
return pd.DataFrame({"id": input_data["id"], "prediction": predictions})

def orchestrate(self, input_data: pd.DataFrame) -> pd.DataFrame:
"""Run the entire inference pipeline on the input data."""

validated_input = self.validate_dataframe(InputData, input_data)
preprocessed_data = self.preprocess(validated_input)
predictions = self.inference(preprocessed_data)
postprocessed_data = self.postprocess(input_data, predictions)
return self.validate_dataframe(OutputData, postprocessed_data)


my_model_service = AIService("my_model.joblib")
result = my_model_service.orchestrate(input_data)
print(result)

Version 3: store the model artifact outside the AI service

Previously, we assumed that the model artifact would be stored within the model service. However, an alternative approach is to separate the storage of code and model artifacts by introducing a model registry. This allows for independent updates of the model to the latest version without affecting the service directly.

Figure 5: introducing a model registry outside to store the model outside the app. Image by the author.

To implement the model registry in our example, I utilized MLflow. The model is loaded from the registry during the initialization process. The updated code is presented below. For the sake of clarity, I excluded the data contracts from the example and only included the modified __init__ method.

import pandas as pd
from sklearn.linear_model import LogisticRegression
import mlflow


class AIService:
"""A class for performing inference with a pre-trained machine
learning model."""

def __init__(self, model_name, model_version):
"""Initialize the MyModel object with the configuration file path."""
model_version = None if model_version == "latest" else int(model_version)
model_uri = f"models:/{model_name}/{model_version}"
self.model = mlflow.sklearn.load_model(model_uri)


my_model_service = AIService("sklearn-linear-regression","latest")
result = my_model_service.orchestrate(input_data)
print(result)

Version 4: use a config file to refer to models and logic

The final step that I will explain is the utilization of a config file. Instead of directly specifying the model name and version during service initialization, we can pass a config file that contains all the necessary information. This approach provides a more streamlined and organized way to manage the configuration of the service.

Another use case for the config file is to reference preprocessing logic. As previously mentioned, preprocessing and the model are closely linked. For instance, a newer version of a model may require a new feature. Therefore, we need the flexibility to update the preprocessing logic independently of other parts of the service. With the help of a config file, we can easily adjust the preprocessing logic without affecting the rest of the service.

Figure 6: splitting code in the AI service across multiple files. Image by the author.

In the code, it looks as follows. A config file config/config.yaml:

model:
name: sklearn-linear-regression
version: latest
preprocessing:
function_name: preprocess
module_path: src.sklearn_linear_regression.preprocessing

Preprocessing logic stored in src/preprocessing.py:

def preprocess(input_data):
"""Preprocess the input data."""
return input_data.drop(["id"], axis=1)

Note that this function replaces thepreprocessing method in the class. Below is the updated __init__ method of the class:

class AIService:
"""A class for performing inference with a pre-trained machine
learning model."""

def __init__(self, config_path):
"""Initialize the MyModel object with the configuration file path."""
with open(config_path) as f:
config = yaml.safe_load(f)

# Load the model
model_version = config["model"]["version"]
model_version = None if model_version == "latest" else int(model_version)
model_uri = f"models:/{config['model']['name']}/{model_version}"
self.model = mlflow.sklearn.load_model(model_uri)

# Load preprocessing logic
preprocess_module = importlib.import_module(
config["preprocessing"]["module_path"], "src"
)
self.preprocess = getattr(
preprocess_module, config["preprocessing"]["function_name"]
)


config_path = "config/config.yaml"
my_model_service = AIService(config_path)
result = my_model_service.orchestrate(input_data)

How to use it

The AI service we have defined allows for running different models in the app with ease. To be more specific, we can convert the AI service into a scikit-learn service and create multiple instances of it, such as one for logistic regression and another for a random forest model. Additionally, we can create a TensorFlow service to run one or more TensorFlow models. The app orchestration handles the communication with these services.

Figure 7: introducing multiple model services. Image by the author.

Conclusion

This blog explains how you as an MLOps engineer can provide a framework for data scientists to run their models in an app. In a future blog post, I will dive into model training and deployment of the app.

--

--

Niels van den Berg

MLOps specialist working for Deloitte. I assist clients with deploying ML applications to Production. Home automation enthusiast in my spare time.