Contact centers rely on pool of resources ready to help customers when they reach out via call, email, chat, or other channel. For contact centers, predicting volume of incoming requests at specific times is critical for resource scheduling (very short- and short-term horizon) and resource management (mid to long term horizons). It takes time action taken within workforce management framework becomes effective (and is reflected in financial reports eventually), moving people around, hiring, upskilling, or down-sizing pool of resources takes weeks if not longer. Because of this, forecast for longer horizons is needed, starting from one to more months.
To build a high-quality forecast, it is necessary to gather relevant, and valid data with predictive power. In such case it is possible to employ ML technology like TIM RTInstantML that can build models for time-series data in fraction of time.
In our sample use case, we will showcase how TIM can predict volumes of requests for the next quarter, for each week ahead.
import logging
import pandas as pd
import plotly as plt
import plotly.express as px
import plotly.graph_objects as go
import numpy as np
import json
import tim_client
import os
with open('credentials.json') as f:
credentials_json = json.load(f) # loading the credentials from credentials.json
TIM_URL = 'https://timws.tangent.works/v4/api' # URL to which the requests are sent
SAVE_JSON = False # if True - JSON requests and responses are saved to JSON_SAVING_FOLDER
JSON_SAVING_FOLDER = 'logs/' # folder where the requests and responses are stored
LOGGING_LEVEL = 'INFO'
level = logging.getLevelName(LOGGING_LEVEL)
logging.basicConfig(level=level, format='[%(levelname)s] %(asctime)s - %(name)s:%(funcName)s:%(lineno)s - %(message)s')
logger = logging.getLogger(__name__)
credentials = tim_client.Credentials(credentials_json['license_key'], credentials_json['email'], credentials_json['password'], tim_url=TIM_URL)
api_client = tim_client.ApiClient(credentials)
api_client.save_json = SAVE_JSON
api_client.json_saving_folder_path = JSON_SAVING_FOLDER
Dataset contains aggregated (per week) information about request volumes, temperature, holiday, no. of regular customers, marketing campaign, no. of customers for which contract will expire within next 30 or 60 days, no. of invoices sent, invoicing days, hours open.
Weekly.
Structure of CSV file:
Column name | Description | Type | Availability |
---|---|---|---|
Date | Timestamp | Timestamp column | |
Sum of Volumes | Sum of all requests in given week | Target | t+0 |
Avg temperature | Mean temperature | Predictor | t+13 |
Hours of public holidays | Public holiday days in given week x 24 | Predictor | t+13 |
Hours open | Total hours center was/will be open to requests | Predictor | t+13 |
Hours of mkting campaign | How many hours campaign run/will run | Predictor | t+13 |
Avg contracts to expire in 30 days | Average no. of regular contracts that will expire within 30 days | Predictor | t+13 |
Avg contracts to expire in 60 days | Average no. of regular contracts that will expire within 60 days | Predictor | t+13 |
Avg no. of regular customers | Average no. of active contracts for regular customers | Predictor | t+13 |
No. of invoicing hours | Total hours during which invoice were/will be sent | Predictor | t+13 |
No. of invoices | No. of invoices sent | Predictor | t+13 |
We want to predict total volume of requests for the next quarter (13 weeks) for each week. We assume to have forecasted values for predictors available. This situation in data is reflected in values present in CSV file. To simulate out-of-sample period thoroughly (i.e. to use always the latest model for each forecasting), each forecasting situation has its own CSV file reflecting data situation relevant at respective forecasting.
CSV files used in experiments can be downloaded here as ZIP package.
This is synthetic dataset generated by simulating outcome of events relevant to operations of contact center.
# Sample from the first CSV file
data = tim_client.load_dataset_from_csv_file('dataL/data2LB1.csv', sep=',')
data
target_column = 'Sum of Volumes' # sum of requests per given week
timestamp_column = 'Date'
fig = go.Figure()
fig.add_trace( go.Scatter( x=data.iloc[:]['Date'], y=data.iloc[:][ target_column ] ) )
fig.update_layout( width=1300, height=700, title='Sum of Volumes' )
fig.show()
Parameters that need to be set:
We also ask for additional data from engine to see details of sub-models so we define extendedOutputConfiguration parameter as well.
back_test_length = 0
prediction_horizon = 13
configuration_backtest = {
'usage': {
'predictionTo': {
'baseUnit': 'Sample',
'offset': prediction_horizon
},
'backtestLength': back_test_length
},
'extendedOutputConfiguration': {
'returnExtendedImportances': True
}
}
Experiment for the first CSV file, in the next section we will simulate 40 production forecasts.
backtest = api_client.prediction_build_model_predict( data, configuration_backtest )
backtest.status
backtest.result_explanations
Simple and extended importances are available for you to see to what extent each predictor contributes to explanation of variance of target variable.
simple_importances = backtest.predictors_importances['simpleImportances']
simple_importances = sorted(simple_importances, key = lambda i: i['importance'], reverse=True)
simple_importances = pd.DataFrame.from_dict( simple_importances )
# simple_importances
fig = go.Figure()
fig.add_trace( go.Bar( x = simple_importances['predictorName'],
y = simple_importances['importance'] ) )
fig.update_layout(
title='Simple importances',
width = 1200,
height = 700
)
fig.show()
extended_importances = backtest.predictors_importances['extendedImportances']
extended_importances = sorted(extended_importances, key = lambda i: i['importance'], reverse=True)
extended_importances = pd.DataFrame.from_dict( extended_importances )
fig = go.Figure()
fig.add_trace( go.Bar( x = extended_importances[ extended_importances['time'] == '[11]' ]['termName'],
y = extended_importances[ extended_importances['time'] == '[11]' ]['importance'] ) )
fig.update_layout(
title='Features generated from predictors used by model for 11th week in prediction horizon',
width = 1200,
height = 700
)
fig.show()
# Helper function, merges actual and predicted values together
def create_eval_df( predictions, prediction_only = False ):
data2 = None
if prediction_only:
data2 = tim_client.load_dataset_from_csv_file('data2L.csv', sep=',')
else:
data2 = data.copy()
data2[ timestamp_column ] = pd.to_datetime( data2[ timestamp_column ]).dt.tz_localize('UTC')
data2.rename( columns={ timestamp_column: 'Timestamp' }, inplace=True)
data2.set_index( 'Timestamp', inplace=True)
eval_data = data2[ [ target_column ] ].join( predictions, how='inner' )
return eval_data
edf = create_eval_df( backtest.aggregated_predictions[0]['values'] )
backtest.aggregated_predictions[0]['accuracyMetrics']['MAPE']
fig = go.Figure()
fig.add_trace( go.Scatter( x = edf.index, y=edf['Prediction'], name='In-Sample') )
fig.add_trace( go.Scatter( x = edf.index, y=edf[ target_column ], name='Actual') )
fig.update_layout( width=1200, height=700, title='Actual vs. predicted (in-sample)' )
fig.show()
edf = create_eval_df( backtest.prediction, True )
fig = go.Figure()
fig.add_trace( go.Scatter( x = edf.index, y=edf['Prediction'], name='Prediction') )
fig.add_trace( go.Scatter( x = edf.index, y=edf[ target_column ], name='Actual') )
fig.update_layout( width=1200, height=700, title='Actual vs. predicted' )
fig.show()
results = list()
mapes = list()
configuration_backtest
datadir = 'dataL'
for fname in os.listdir(datadir):
fpath_ = os.path.join( datadir, fname )
# print( fpath_ )
data_ = tim_client.load_dataset_from_csv_file( fpath_, sep=',' )
backtest_ = api_client.prediction_build_model_predict( data_, configuration_backtest )
# print( backtest_.status )
edf_ = create_eval_df( backtest_.prediction, True )
edf_['err_pct'] = abs( edf_[ target_column ] - edf_[ 'Prediction' ] ) / edf_[ target_column ]
results.append( edf_ )
mapes.append( edf_['err_pct'].mean() )
Mean MAPE value
pd.DataFrame(mapes).describe()
fig = go.Figure()
fig.add_trace( go.Bar( x = list(range(len(mapes))), y= mapes, name='MAPE') )
fig.update_layout( width=1200, height=700, title='MAPE per forecast' )
fig.show()
We demonstrated how TIM can be used to predict volumes for mid-term forecasting with weekly data.
Having relevant data with predictive power available at the time of forecasting is prerequisite to any ML/AI solution, however not every ML solution can build new model in fraction of time, adapting to the most recent reality reflected in data.
Contact centers that support multiple channels that customers can use to submit query may benefit from forecasts for various perspectives. With TIM RTInstantML it is possible to build new model and make predictions for various perspectives, e.g. volume per channel (incoming calls, messages from social media, emails etc.), volumes per region, consolidated volumes, and other. Equally, need for various prediction horizons does not mean any additional burden for TIM, depending on sampling of your data, you can predict from minutes to years ahead.