Production Accuracy Evaluation
The MLOps lifecycle results in multiple jobs following each other up, linked together in a job sequence. This roots in the desire to build upon capabilities and insights that are already there (configuration options, (certain models in) the model zoo...), but it also brings about an interesting opportunity for evaluation. During experimentation, a user signs of on a configuration and a model zoo expecting a certain performance - typically validated by the out-of-sample forecasts. While the sequence unfolds, circumstances, context, data... may change, and that may influence the performance. Therefore, it is important to be able to track the performance while in the production stage, to be able to steer and adjust - in the form of updated configuration settings, rebuilding or retraining... - where needed.
Production accuracy evaluation empowers users to do just that: the production forecasts of the jobs in the sequence are evaluated after the facts, by comparing their forecasted values with the actuals that have been added to the dataset since.
Therefore, when accuracy starts deteriorating, the model zoo can be altered - by rebuilding or retraining with the same or with updated configuration settings - to get back on track in terms of performance. Simultaneously, root cause analysis can be used to drill down to the root cause of the deteriorating performance, gaining the user additional insights in the process.