In-Depth Guide: Deploy Models from Databricks to Azure ML (2023)
In this blog post, we explore the different ways you can easily and seamlessly deploy models trained on Databricks to Azure ML endpoints using Databricks notebooks.
Databricks provides a great platform for data practitioners to develop, test and experiment with various Machine Learning and AI workloads.
A recent feature, Model Serving v2, also allows users to easily serve those models on Databricks for batch or online inferences with a few clicks.
One caveat is that model serving feature is not supported in specifics regions in Azure Databricks. Whilst waiting for regions to roll out, many users legitimately turn to Azure ML powerful serving features.
All code used in this article is on Github. Let’s now tackle the main topic!
How to deploy model trained on Databricks to Azure ML Endpoint or AKS?
Architecture
In order to move our models from Databricks to Azure ML endpoints, we rely on MLflow Model Registry as our connection point between the two platforms.
First, we develop our model per usual on Databricks. We could leverage Dual Tracking to log MLflow experiments on both Databricks and Azure MLflow tracking server but this technique has some limitations.
When it comes to the Model Registry, we recommend using only one Model Registry, for maintenance and source of truth considerations.
Here, we use the Experiment Tracking and Model Registry on Azure ML so that file paths and deployments settings remain consistent for Azure resources.
There is two main ways to deploy a model for online serving using Azure Machine Learning:
- Azure ML Managed Endpoints
- Azure Kubernetes Services (AKS) Endpoints
For more information about the endpoints differences and which one is right for you, check out the documentation. Now, let’s start building our solution from the start.
Prerequisite
- Have an Azure Databricks Workspace
- Have an Azure Machine Learning Workspace
- Have permission to manage those resources
How to connect Databricks to Azure ML Workspace
In order to successfully let Databricks communicate with Azure ML, you will need two things:
- Setup the access management between your resources.
- Install the proper libraries to use AML on Databricks.
1. Setup access management
First, we need to create a role for Azure Databricks generated managed identity dbmanagedidentity
over the scope of the Azure Machine Learning. This will grant access to Databricks to write and read from Azure ML.
There’s two ways to do it, one with the UI, the other with Azure CLI.
Steps with UI:
- Go to my Azure Machine Learning resource and go to the Access Control tab (IAM).
- Click on “Add role assignment”.
3. Select the right role permission. The good practice is to select the minimum permission needed. In this case, Contributor is good enough.
4. Select “Managed Identity” and find the Databricks’ workspace associated managed identity.
5. Click “review and assign” and this is it! Your Databricks has now permission to access to your AML resource.
Steps with Azure CLI:
Use the command az role assignment create
in Azure CLI. You can see all the arguments you can pass in the official documentation here.
The full command used looks like this:
az role assignment create --assignee-object-id 22222222-dddd-cccc-dddd-aaaaaaaa --role Contributor --scope /subscriptions/your-subscription-id/resourceGroups/your-resource-group/providers/Microsoft.MachineLearningServices/workspaces/your-aml-name
The parameter assignee-object-id
is the dbmanagedidentity
Object ID.
As for the parameter scope
, it is your Azure ML resource one. You can either fill the scope path with your resource group and Azure ML name, or find it formatted in Azure ML IAM page.
Now you are done with access management! We can now start to use Databricks to connect to our Azure ML workspace.
2. Install libraries and connect to AML workspace
Now coming to your Azure Databricks workspace, you will need to install the Python packages which contain integration code of AzureML with MLflow. You can either use pip install
or install them directly on your cluster.
azureml-mlflow
azure-ai-ml
After that, configure your resources information to retrieve your Azure ML workspace like in the code snipped below.
import mlflow
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
# TODO: Enter details of your Azure Machine Learning workspace
subscription_id = "<Subscription ID of your resource group>"
resource_group = "<Resource group having your resources>"
workspace_name = "<Your azure workspace name>"
# Retrieves your Azure ML resources with already set up Managed Identity
aml_client = MLClient(credential=DefaultAzureCredential(),
subscription_id=subscription_id,
workspace_name=workspace_name,
resource_group_name=resource_group)
# Retrieves MLflow tracking URI of Azure ML workspace
aml_mlflow_tracking_uri = aml_client.workspaces.get(aml_client.workspace_name).mlflow_tracking_uri
# Changes MLflow tracking URI to Azure ML server
mlflow.set_tracking_uri(aml_mlflow_tracking_uri)
You are now set to use MLflow as usual. Commands such as mlflow.flavor.log_model()
will write artifacts and models to your Azure MLflow server.
Training Models using MLflow with Azure ML
Now, we want to train a simple model and register it to Azure ML Model Registry. We are using this wine quality dataset to create a wine quality scoring model.
To track MLflow experiments on Azure ML, you need to create an MLflow experiment and set the experiment, else using MLflow will return the exception BadRequest: Experiment ID must be a GUID.
After that, we train a model and log the model experiment to MLflow. You can find the full code with example dataset in my GitHub repo.
# Create a new MLflow experiment
artifact_path = "model"
experiment_name = "wine_quality_experiment"
registered_model_name = "wine_quality"
# Creates and sets the experiment when using MLflow with Azure ML
mlflow.set_experiment(experiment_name=experiment_name)
with mlflow.start_run() as run:
# Keep the metadata of the run
run_id = run.info.run_id
# Train your model
lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
lr.fit(train_x, train_y)
predicted_qualities = lr.predict(test_x)
mlflow.log_params({"alpha": alpha, "l1_ratio": l1_ratio})
# Infer model signature
signature = mlflow.models.infer_signature(model_input=test_x[:10], model_output=predicted_qualities[:10])
# Log the model to the experiment
mlflow.sklearn.log_model(lr, artifact_path, signature=signature)
Notice how we associate a signature to our model. A model signature in MLflow defines the schema of a model’s inputs and outputs. It is not mandatory but it’s good practice to log model with their signature. Azure Machine Learning enforces compliance with it, both in terms of the number of inputs and their types when using online inference endpoint.
Once you are satisfied with your model experimentation, you can register your best model version by using register_model
.
registered_model = mlflow.register_model(f"runs:/{run_id}/{artifact_path}", registered_model_name)
Your model is now in Azure ML Model Registry, ready to be deployed!
Deploy your model on Azure ML or AKS Endpoints
Deploying your model for online serving using Azure ML is integrated with two types of endpoints. These are:
- Azure ML Managed Endpoints
- Azure Kubernetes Computes Endpoints
Azure ML Endpoint are off-the-shelf solution to deployment where we don’t have access to the underlying infrastructure. They require minimal configuration and maintenance whilst being less customizable.
Kubernetes Endpoints (AKS) are more scalable and infrastructure customizable, but require overhead setup and management. For more comparison trade-offs, check out the documentation.
Let’s move on to how to deploy both type of endpoints using MLflow.
1. Deploy with Azure ML Managed Endpoint
In order to create an Azure ML Managed Endpoint, we need as input a configuration file in json format with a few parameters.
auth_mode
: Determines authentication mode for the endpoint. Can be “key”, “anonymous” or “aad”.identity
/type
: Specifies the type of identity assigned to the endpoint. Can be “none”, “system_assigned” or “user_assigned”.
import json
# Write the endpoint configuration file
endpoint_config_path = "endpoint_config.json"
endpoint_config = {
"auth_mode": "key",
"identity": {
"type": "system_assigned"
}
}
with open(endpoint_config_path, "w") as outfile:
outfile.write(json.dumps(endpoint_config))
Since we want to deploy on Azure ML, we retrieve the deployment_client
associated to our workspace using Azure ML tracking URI. Then, we create the endpoint using the configuration file created above and give it a name.
from mlflow.deployments import get_deploy_client
endpoint_name = "wine-endpoint-test"
# Create the deployment client linked to Azure ML workspace
deployment_client = get_deploy_client(aml_mlflow_tracking_uri)
# Create a AML managed endpoint
endpoint = deployment_client.create_endpoint(
name=endpoint_name,
config={"endpoint-config-file": endpoint_config_path}
)
When endpoint are created, they are initially empty and waiting for deployment to be made on it. An endpoint can host multiple deployments
.
Let’s create a first deployment. First, we specify the compute resources we want to allocate to that deployment in a configuration file.
# Write the deployment configuration file
deployment_name = "default"
deploy_config = {
"instance_type": "Standard_DS2_v2",
"instance_count": 1,
}
deployment_config_path = "deployment_config.json"
with open(deployment_config_path, "w") as outfile:
outfile.write(json.dumps(deploy_config))
Then, we can use the deployment_client
to create a deployment on the endpoint we created. Inputs are our model name and version we logged in the Model Registry and the configuration file we’ve just created.
# Retrieve the model name and version to deploy
model_name = registered_model_name
version = registered_model
# Create the model to the endpoint
deployment = deployment_client.create_deployment(
name=deployment_name,
endpoint=endpoint_name,
model_uri=f"models:/{model_name}/{version}",
config={"deploy-config-file": deployment_config_path},
)
The deployment will take a few minutes to roll out. Before we can call our model through endpoint requests, we need to update the traffic percent our deployment gets by the endpoint.
# Configure the traffic percentage split between deployments on same endpoint
traffic_config = {"traffic": {deployment_name: 100}}
traffic_config_path = "traffic_config.json"
with open(traffic_config_path, "w") as outfile:
outfile.write(json.dumps(traffic_config))
# Updates the traffic of the endpoint according to traffic configuration file
deployment_client.update_endpoint(
endpoint=endpoint_name,
config={"endpoint-config-file": traffic_config_path},
)
# Fetch the endpoint scoring uri
scoring_uri = deployment_client.get_endpoint(endpoint=endpoint_name)["properties"]["scoringUri"]
print(scoring_uri)
After that, you are done and you can test out your served model! You can either use Azure ML UI or make requests on Postman / Databricks. There’s a request example in the Github repository.
2. Deploy with Azure Kubernetes Compute Endpoint
When you create an AKS Compute in Azure ML, it is provisioned as a managed service designed for running machine learning workloads. The endpoint is attached to an Azure Kubernetes Service’s node pool.
At the moment, we can only deploy models to AKSCompute (v1) as the KubernetesCompute (v2) does not support direct deployment for MLflow models from Databricks.
First, you will need to install the library azureml-core
. Then, we can start by connecting to our Azure ML workspace.
from azureml.core.compute import ComputeTarget, AksCompute
from azureml.core import Workspace
from azureml.core.authentication import MsiAuthentication
# TODO: Enter details of your Azure Machine Learning workspace
subscription_id = "<Subscription ID of your resource group>"
resource_group = "<Resource group having your resources>"
workspace_name = "<Your azure workspace name>"
# Retrieves your Azure ML resources with already set up Managed Identity
ws = Workspace(subscription_id=subscription_id,
resource_group=resource_group,
workspace_name=workspace_name,
auth=MsiAuthentication())
print("Found workspace {} at location {}".format(ws.name, ws.location))
Then, we create our AKSCompute Endpoint which takes about 5 minutes to roll out. The AKSCompute endpoint name has some restriction. It must start with a letter, end with a letter or digit, and be between 2 and 16 characters in length. It can include letters, digits and dashes.
# Use the default configuration
prov_config = AksCompute.provisioning_configuration(vm_size = "Standard_PB6s",
location = "westus2",
agent_count = 2)
# Create the cluster
aks_endpoint_name = "wine-endpoint-1"
aks_target = ComputeTarget.create(workspace=ws,
name=aks_endpoint_name,
provisioning_configuration=prov_config)
aks_target.wait_for_completion(show_output = True)
print(aks_target.provisioning_state)
print(aks_target.provisioning_errors)
You can also attach an existing AKS resource instead of creating a new one. You might need to grant proper permissions for Databricks to attach the AKS resource to Azure ML. You can follow the same steps as we did above to grant Databricks permission to AKS.
# For using an existing AKS resource
# TODO: Fill your AKS resource information
existing_aks_name = "<existing aks resource name>"
subscription_id = "<azure subscription id>"
resource_group = "<resource group of aks>"
aks_resource_id = f"/subscriptions/{subscription_id}/resourcegroups/{resource_group}/providers/Microsoft.ContainerService/managedClusters/aks-test"
# Create configuration and attach the AKS pool to an AKSCompute endpoint
aks_endpoint_name = "wine-endpoint-1"
existing_attach_config = AksCompute.attach_configuration(resource_id=aks_resource_id)
aks_target = ComputeTarget.attach(ws, aks_endpoint_name, existing_attach_config)
aks_target.wait_for_completion(show_output = True)
print(aks_target.provisioning_state)
print(aks_target.provisioning_errors)
Finally, we can create our model deployment on the endpoint in the same way as we did for Managed Endpoints. First, we create the json configuration file for the deployment. Notice how we selected already the compute resources at the endpoint level, and not the deployment level.
import json
deployment_name = "default"
deployment_config = {"computeType": "aks", "computeTargetName": aks_endpoint_name}
deployment_config_path = "deployment_config.json"
with open(deployment_config_path, "w") as outfile:
outfile.write(json.dumps(deployment_config))
We can then deploy our model from Model Registry using the deployment_client
.
from mlflow.deployments import get_deploy_client
# Create the deployment client linked to Azure ML workspace
deployment_client = get_deploy_client(aml_mlflow_tracking_uri)
# Retrieve the model name and version to deploy
model_name = registered_model_name
version = registered_model.version
# define the model path and the name is the service name
deployment_client.create_deployment(model_uri=f"models:/{model_name}/{version}",
config={'deploy-config-file': deployment_config_path},
name=deployment_name)
# Fetch the endpoint scoring uri
scoring_uri = deployment_client.get_endpoint(endpoint=endpoint_name)["properties"]["scoringUri"]
print(scoring_uri)
The deployment will take about 5 minutes to roll out. After that, you are done and you can test out your served model on AKS! This will also create a AKS cluster in your resource group which you can configure and govern as you need.
That’s it for deployment with Databricks and Azure Machine Learning! Hope this was helpful. If you have any remarks or question, I’ll be happy to help :-)
Additional Links & References:
Deployment Notebooks can be found in my Github Repository
Official documentation on Managed Endpoint vs Kubernetes Endpoint
Official documentation on AKSCompute (v1) vs KubernetesCompute (v2)