Tutorial: Learning MLOps with Vector - Part 1

MLOps is what enables enterprises to deliver Machine Learning in production

Figure 1: The complete lifecycle of ML Operations (MLOps)

In a previous tutorial, we learnt how to train the Anki Vector robot to recognize another Vector robot. Specifically, we learnt how to train a YOLOv5 model to detect the Vector robot in a picture taken with Vector’s camera. We leveraged the publicly available Vector dataset to train the model.

While that was an interesting model training exercise, the process was not mature enough for a production deployment. Meaning that, if Digital Dream Labs (DDL) (the current owner of the Vector robot) were to use this model to deliver a production quality feature that enables all Vector robots to recognize other Vector robots, they would have to do a lot more than merely train a ML model. They would likely have to build an entire automated end-to-end pipeline which consists of: (i)Collecting new data (ii)Labeling them, (iii)Training new models, (iv)Deciding if the newly trained models should be deployed, (v)Rolling back models if required, and (vi)Continuously monitoring the performance of production deployed models. This fascinating new and emerging area comprising of aspects of how to deliver and manage ML in production is known as MLOps.

First, let us examine why a one-time trained model may not subsequently work in production in a bit more detail. Here are a few reasons.

(1) Data diversity: Usually, the dataset by which the first ML model is trained is not rich and diverse enough. As an example, may be the pictures in the dataset were taken in summer when the room was brightly lit. A model trained on such a dataset may not do well in winter when the light and illumination is different.

(2) Data drift: The operating environment may change over time and the current model may not have the information to deal with a changing operating environment. As an example, if one changed the wall decor in the room to something different, the previously trained model might get confused by the new decor.

(3) Model enhancements: Particularly in the computer vision space, we often see a quick advancements, with sometimes newer model concepts arriving every quarter. One would like an automated pipeline with which models could be quickly retrained as more models evolve.

(4) Feedback from users: Once your system works in production, it is very likely that your users will report cases where the model did not perform well. Over time, it is likely that one might develop an intuitive idea of the deficiencies of a model. In such cases, one might desire to either improve the model (as an example via regularization), or to improve the dataset (by capturing newer images of cases where the model did not work).

MLOps covers the full cycle of what it takes to get ML successfully running in production. A MLOps pipeline usually consists of the following parts:

(1) Data pipeline: Data needs to be pulled in from the target and stored in a centralized dataset. Having a data pipeline is important, because it provides you an authentic source of data.

(2) Annotation: New data needs to be annotated to generate a labeled data.

(3) Training and Re-Training models: Models need to be retrained at a regular cadence so that they can absorb new labelled datasets. After the training is complete, we need an unbiased way of comparing the performance of the newly trained models with previous models.

(4) Deploying to production. If a new model is found to be of better quality than the previous model, we would like to automatically roll it to production. There is always a chance that the new model may not perform to our expectations in production. In such cases, we will desire to roll back our model and reinstate a previous performing model.

As you can imagine, MLOps is a very rich area will a lot of continuous research and emerging tools. Hence, we are devoting an entire series to MLOps. In the following articles, we will discuss each of the above steps in much greater details with coding examples from our favorite project of training a Vector to recognize another Vector. Please subscribe to our newsletter to keep yourself updated with our future posts.