ML Deployment

Machine Learning Deployment

What is ML Deployment?

The 4 Deployment Paradigms

Deployment Requirements

Deployment Architectures

Other Issues

What is ML Deployment?

Data Science != Data Engineering

Data science is scientific

Business problems -> data problems
Model mathematically
Optimize performance

Data engineers are concerned with

Reliability
Scalability (load parameters)
Maintainability
SLA’s
...

Closed Loop Systems

DevOps vs ModelOps

DevOps = software development + IT operations

Manages deployments
CI/CD of features, patches, updates, rollbacks
Agile vs waterfall

ModelOps = data modeling + deployment operations

Java environments
Use of containers
Also C/C++ and legacy environments
Model performance monitoring

The 4 Deployment Paradigms

Batch

80-90% of deployments
Leverages databases and object storage
Fast retrieval of stored predictions

Continuous/Streaming

10-15% of deployments
Moderately fast scoring on new data

Real time

5-10% of deployments
Usually using REST (Azure ML, SageMaker, containers)

Mobile

Latency Requirements

Deployment Requirements

All the (DevOps) things!

And then more things!

Core Requirements

Model architecture

ML pipeline w/ featurization logic

Monitoring + Alerting

CI/CD pipeline for automation

Testing framework (unit + integration)

Version control

Core+ Requirements

Model registry

Data and model drift

Interpretability

Reproducibility: data, code, environment, debugging

Security

Environment management

Specialized Requirements

Data dictionary

Cost management

A/B testing

Performance optimization

Deployment Architectures

Standards for each deployment paradigm

Managed by an admin

Clear responsibilities on maintenance in production

Who gets paged at 2 in the morning?

Architecture I

Architecture II

Architecture III

Delta + MLflow

Other Issues

DL Optimization

Quantization: reduce precision of mathematical operations

Train normally (e.g. on 64 bit numbers)
Reduce to 32 or 16 bit for deployment
Generally see 3x improvement

Weight pruning: reduce size of architecture

Model topography: retrain using different architectures

e.g. compare MobileNet to VGG16

Featurization Logic

Apply the same logic to training and scoring data

Look into MLflow’s pyfunc

Confirm production data is available in training