Tackling Data Challenges for a real-time MLOps platform

Nov 25, 2021 | by A. Spina, Helicon

happy data scientist

Event stream processing systems’ main features are high throughput and low latency; this is the reason why we put Kubernetes, Seldon, and Apache Kafka together to develop a brand-new MLOps platform – Helicon, which is able to serve together thousands of ML models, regardless of the underlying technology.

I’m quite sure that some of you struggled to understand if your ML models in production would become obsolete. Radicalbit Helicon tracks down when it is the right time to train your model again, and identifies the proper subset of fresh data needed. All of that is kindly provided by event streaming.

MLOps platform

Finally, I’ll tell you about the ultimate Radicalbit’s goal. Introducing the first production-ready serving layer for streaming machine learning models; special algorithms that are able to learn while they produce inference. Subsequently, the over-time adapting to unpredictable changes of input data behaviour (because domains change continuously over time), enables the highest prediction performance ever reached before at a ridiculously fast pace.

How can containers ship any model?

Radicalbit’s solutions are Kubernetes-based products; for this reason, we need to ship by containers our clients ML models. In IT, the easiest method to ship intelligent and scalable models in production is by leveraging containers. What we need is just the list of the minimum technology requirements the model needs in order to run. We use this information to wrap the trained models into consistent containers. And that’s it. The model is already able to (1) run in production, (2) scale horizontally, and (3) produce inference. The good news is that the awesome MLFlow project can perform such capabilities (and many others). Therefore, we need to be sure to use them the right way.

However, in a containerized world, this is not enough anymore. Most of the time, in a production system, some sophisticated rollout strategies for ML models (like canary deployment) might be necessary, e.g. deploying different instances based on users’ subscription tiers. Luckily, in the last months, we measured the rise of a new generation of DevOps-driven MLOps tools. The latter enables an unprecedented set of features for the production of AI-driven systems.

 

We wanted to pick the best technology, so we picked Seldon-core. This great open-source project can serve any kind of ML assets (even custom ones); moreover, it offers out-of-the-box support to modern cloud-native technologies like service meshes and even accepts Kafka as a communication bus. We integrated Seldon in our Kubernetes orchestration; this backs up most of our MLOps features, like dynamic serving, continuous training, and streaming machine learning. But how do we make interact streaming workloads and ML models?  Radicalbit Helicon offers a codeless pipeline editor, where engineers and scientists can work together to seamlessly manage and process data in motion.  

Once the pipeline is built and tested (we provide a novel codeless debugging feature), users can deploy the topology on top of their favourite streaming engine. We provide out-of-the-box support for Apache Flink, Spark Streaming, and the Kafka streams library. Don’t worry! In the end, you will get a new horizontally-scaled deployment on Kubernetes. We come up then with two main deployment families:

1) Your streaming workload topology

2) A set of Seldon orchestrated ML trained models

While the deployment runs, the topology asks the models for predictions and the models respond with high-throughput and low-latency. The communication channel is Kafka, of course. Through this solution, we can exchange inference asynchronously using Kafka messages conveyed by Kafka topics; therefore, we can leverage fault-tolerant and highly scalable features. Consequently, it is possible to scale infinitely, the only limit is Kafka’s infrastructure availability.

Let’s talk about Continuous Training

For Radicalbit, “Continuous Training” means providing customers with a set of DevOps tools, in order to help them in establishing when it is the right time to retrain a model in production. We follow one key metric only: drift. For us, drift is not only an upstream change in the data structure; but a change in data behaviour and its domain evolution.

 

We identify two kinds of drifts in data. The first one is the so-called “Data Drift Concept”; at this stage, we can monitor input data behaviour. As you can see in the figure below, we track the value of the feature x‘s behaviour we are monitoring, and we do not consider the inference values at all. However, when the input data distribution changes in unprecedented ways, your model might not be good anymore. Therefore, you should perform a new training session. ADWIN (Adaptive Windowing) is a well-known method to spot these drifts by monitoring data streams and – in the figure below – the vertical red lines represent the ADWIN activation.

However, this approach might not properly work in some cases. For instance, for stock market data, where the trend concept is as meaningful as the values the feature assumes. Where “how the values change over time” represents information as well data drift is not a perfect fit anymore. So we need to move to a new monitoring strategy, where the values of the inference are also important. Thus, we built a detection model for the second kind of drift we identified, the so-called “Concept Drift“. 

By monitoring the prediction trend, we can understand how well we are predicting. However, in order to understand, we need feedback, thus a piece of information stating the true class for a determined and already emitted prediction. By this hint, the platform is then able to compute the error rate. Methods like drift detection model track down sudden changes in the error rate value (as in the figure below as vertical red lines).

If detected, the platform alerts the clients that it is time to retrain their ML model with fresh data. Radicalbit Helicon offers a tiered set of notifications or triggers useful to activate actions (e.g. execute remote continuous integration pipelines aimed to re-train a model). 

Let’s talk about Online Learning
How do we train machine learning models with streaming data?

Like I eagerly anticipated in this article preface, Radicalbit aims to build the first production-ready platform for streaming machine learning models. This is a completely different approach for many reasons, but the obvious one is that it changes the input data that our models’ crunch (therefore, goodbye traditional datasets!) while they learn. We do not have historical and finished data, but an unfinished and continuous flow of information that changes over time. Therefore, the chances of making valuable a-priori assumptions on the input data are very limited. The objective is to make our models dynamic in behaviour so they can adapt to the input data changes over time.

On the other hand, concerning Supervised Learning, we are driven by the same requirement of drifts, which is feedback. In this situation, also the paradigm is changing; indeed, for online machine learning, the prediction is firstly computed, then delivered to the downstream, and finally, in reaction to incoming feedback, the models update the learning functions based on the measured loss. 

The main difference with traditional batch learning tasks is that online models predict and train the algorithm at the same time. In our latest R&D project, we built one fit for an online neural network algorithm based on a hedge algorithm. The main feature of hedging lies in the expert-advice approach: each network layer is able to provide an output prediction, shallower layers will be more valuable in earlier stages of the algorithm lifetime; then, the more events the model crunches, the more credibility will be assigned to deeper layers output predictions. 

Radicalbit also proposes a first distributed implementation for it, as shown in the picture below. 

The main advantage of using streaming machine learning algorithms in our platform is that you can dramatically reduce the effort of periodically re-train models because they fix themselves over time. Additionally, by building the right serving layer for streaming models, we are about to prove that for many contexts streaming algorithms can outperform traditional batch learning applications.

Stay tuned and follow us to learn more!