In general, it is important to clearly understand your business requirements and the problem you are trying to solve when determining the best approach to automate the retraining of an active machine learning model. It is also important to continuously monitor the performance of the model and make adjustments to the retraining cadence and metrics as needed.
And this whole process can be deployed in 2 environments:
Automating the retraining of a machine learning model can be a complex task, but there are some best practices that can help guide the design.
The metrics used to trigger retraining will depend on the model and its usage. Each metric will need a threshold to trigger retraining when model performance falls below.
Some ideal metrics to trigger model retraining are:
The new model will have to be tested or validated before being put into production to replace the old one. Several approaches are recommended for this purpose:
The promotion strategy for the new model will depend on its impact on the company. In some cases, it may be appropriate to automatically replace the old model with the new one. But in other cases, the new model may require A/B testing before replacing the old model.
Some strategies to consider for live model testing are:
Once we identify that the model needs to be retained, the next step is to choose the right data set to retrain with. Here are some recommendations to ensure that new training data will improve model performance.
Measuring cost impact varies by deployment environment (cloud vs. edge).
While it is difficult to calculate the direct ROI of some AI tasks, the value of optimized model retraining is simple, tangible, and possible to calculate directly. The computation and storage costs of model training jobs are often already recorded as part of cloud computing costs. Often, the business impact of a model can also be calculated.
When optimizing retraining, we consider both retraining costs and the impact of model performance on the business ("AI ROI"). We can weigh these costs against each other to justify the cost of retraining models.
Retraining Cost = (compute cost for retraining + cost of storing new model) x frequency
Edge retraining can have advantages, such as data privacy and reduced latency, as data does not have to be transmitted over a network and can remain on the device. In addition, Edge retraining may be necessary to adapt the model to changes in the environment.
The cost of retraining machine learning models on the Edge depends on several factors, such as the size and complexity of the model, the quantity and quality of the available data, the processing capacity of the Edge Processing Unit (EPU), and the cost of power.
In general, the process of retraining machine learning models on the Edge can be more expensive than doing so in the cloud due to the resource limitations of the EPU and the need to transmit data over a network, which can be slow and costly. In addition, machine learning models often require large amounts of data to train, which can require a large amount of storage on the Edge.
However, there are also techniques and tools to reduce the cost of retraining on the Edge, such as using federated learning techniques to filter out only the necessary data, transfer learning to take advantage of pre-trained models, optimizing models for low-power devices, and carefully selecting training data to reduce the size of the required data set.
Transitioning from fixed-interval model retraining to automated model retraining triggered by model performance offers numerous benefits to organizations, from lower IT costs at a time when cloud costs are rising to improved ROI from artificial intelligence through improved model performance.
Barbara Industrial Edge Platform is a powerful tool that can help organizations simplify and accelerate their Edge ML deployments, building, orchestrating and maintaining easily container-based or native applications across thousands of distributed edge nodes.
The most important data of the Industry starts ‘at the edge’ across thousands of IoT devices, industrial plants and equipment machines. Discover how to turn data into real-time insight and actions, with the most efficient, zero-touch and economic platform.