DevOps (Development Operations) has changed the world of traditional software development by making it possible for companies to ship software to production in minutes and keep this software running reliably.

There is a new kid on the block in the world of software though, which is threatening to change everything again, and that is MLOps. But what is it, and how can your software team help to achieve its goals? 

What is MLOps?

Nvidia has defined MLOps as “a set of best practices for businesses to run AI successfully.”

We would expand on that to say that MLOps (or Machine Learning Operations) is a set of best practices that combines Data Engineering, DevOps and Machine Learning in order to reliably and efficiently deploy and maintain Machine Learning systems in production.

What are MLOps Best Practices?

Best Practice 1: Hybrid Teams

One of the biggest challenges Machine Learning teams face is getting the Machine Learning systems into production. An MLOps Engineer could theoretically have all the skills necessary to productionize an ML model. Still, in reality, at the moment, it is more likely that a hybrid team – made up of a Data Scientist, Data Engineer, ML Engineer and DevOps Engineer – would be more successful.

The exact composition of the hybrid team could vary. Still, it is important for all business owners to remember that achieving their Machine Learning Ops goals will take much more than just one Data Scientist. It needs a team of people all working closely together to get the Machine Learning model to work.

Best Practice 2: Machine Learning Pipelines

As we have discussed before, one of the core concepts of Data Engineering is the data pipeline, i.e. the series of transformations applied to data between its source and its final destination. These pipelines are sometimes called ETL (Extract, Transform and Load) pipelines.

All Machine Learning models require data transformation, usually managed through cells in a notebook or scripts. However, this tends to make them hard to run reliably and manage. Switching over to more robust Machine Learning pipelines can help with management and scalability, code reuse, and run-time visibility.

Best Practice 3: Model and Data Versioning

Being able to track model versions is essential in Machine Learning, as well as tracking the data used to train the model and other meta-information, such as training hyperparameters.

Git software is often used for tracking models and metadata as a standard version control system, but some companies have found that it is not as practical with large amounts of data.

The ideal way to track model and data versioning for Machine Learning models is with a purpose-built tool that would tie each model to the exact version of code, data and hyperparameters that were used – but this doesn’t seem to exist on the market today.

Best Practice 4: Model Validation

Test automation is a standard DevOps practice, usually done through integration and unit tests that must be passed before the unit is deployed. Machine Learning models, however, are harder to test as you will never get 100% correct results.

Model Validation tests, therefore, need to be statistical rather than relying on a pass/fail status. So it is important to decide what metrics you will track and what the empirical threshold of their acceptable values is – compared to previous benchmarks or models.

Best Practice 5: Data Validation

Machine Learning training and prediction data pipelines all rely on good validation of input data – such as column types, empty values, file format and size, and invalid values. If this doesn’t happen, you may end up with a misbehaving model and no idea why this is the case.

As well as the basic data validations outlined above, Machine Learning data pipelines should also cover validations of higher statistical properties of the input.

Best Practice 6: Monitoring Model Performance

As with any production system, monitoring Machine Learning systems is essential as their performance depends on not only infrastructure and software (which you have some control over) but also data (which you have less control over).

This brings about challenges as you probably won’t have a verified label to compare your model’s predictions to – but there are solutions to these problems. It is also important to remember to monitor metrics across slices to be able to detect any problems that are affecting specific sectors.

MLOps is a new and exciting discipline that seems to be evolving quickly. To take advantage of the opportunities relating to MLOps, please get in touch with the friendly and experienced team at Agile Recruit.

Share this blog