Owners of the PV plants, electricity traders, system regulators and need accurate forecasts of PV plants production for different time horizons and different granularities to optimize their maintenance, trading and regulation strategies.
In essence, forecasting of solar production is a regression problem, where solar production is a time series of volume of the production for individual periods (usually for specific hours or quarter-hours) and predictors are forecasts of various meteorological parameters. This is generally the description of the problems that TIM is designed to solve.
Except from maintenance, production of solar power is not driven by any part of sociological factors such as day of the week and holidays. Main factors affecting production of PV plants are the amount of irradiance reaching the panel, angle of the panel surface with solar flux and Temperature (TEMP) affecting panels efficiency in converting irradiance to electricity.
Minimal requirements for having a good forecast of PV plant production is to have a good forecast point of Global Horizontal Irradiance (GHI) for the location of the PV plant. However, TIM achieves much better accuracy, when provided with components of GHI – Direct Normal Irradiance (DNI) and Diffuse Irradiance (DIF), as they arrive at panel in different angles.
Finally, to help TIM capture geometry type of PV plant (fixed, tracking, …) it is also useful to upload data about angular position of the sun on the sky in respect to GPS coordinate of PV plant – Sun Elevation (SE) and Sun Azimuth (SA). Provided these variables, TIM is able to create accurate models for forecasting of solar power without need to know specifics about configuration of modeled PV plant.
TIM requires no setup of TIM's mathematical internals and works well in business user mode. All that is required from a user is to let TIM know the desired prediction horizon. TIM can automatically learn that there is no weekly pattern, in some cases, however, (e.g. short datasets) it can be difficult to learn this and therefore we recommend switching off the weekdays transformations.
import logging
import pandas as pd
import plotly as plt
import plotly.express as px
import plotly.graph_objects as go
import numpy as np
import json
import tim_client
with open('credentials.json') as f:
credentials_json = json.load(f) # loading the credentials from credentials.json
TIM_URL = 'https://timws.tangent.works/v4/api' # URL to which the requests are sent
SAVE_JSON = False # if True - JSON requests and responses are saved to JSON_SAVING_FOLDER
JSON_SAVING_FOLDER = 'logs/' # folder where the requests and responses are stored
LOGGING_LEVEL = 'INFO'
level = logging.getLevelName(LOGGING_LEVEL)
logging.basicConfig(level=level, format='[%(levelname)s] %(asctime)s - %(name)s:%(funcName)s:%(lineno)s - %(message)s')
logger = logging.getLogger(__name__)
credentials = tim_client.Credentials(credentials_json['license_key'], credentials_json['email'], credentials_json['password'], tim_url=TIM_URL)
api_client = tim_client.ApiClient(credentials)
api_client.save_json = SAVE_JSON
api_client.json_saving_folder_path = JSON_SAVING_FOLDER
In this example we will simulate a day ahead scenario. Each day at 10:15 we wish to have forecast for each hour up until the end of the next day - we will set the "predictionTo" to 38 samples. Model is built using a range between 2015-01-01 00:00:00 and 2015-12-31 23:00:00. Out-of-sample forecasts are made in the range between 2016-01-01 00:00:00 and 2016-10-17 23:00:00 (the last 6946 samples). To get better insights from our model we will also want extended importance and prediction intervals to be returned.
configuration_backtest = {
'usage': {
'predictionTo': {
'baseUnit': 'Sample', # units that are used for specifying the prediction horizon length (one of 'Day', 'Hour', 'QuarterHour', 'Sample')
'offset': 38 # number of units we want to predict into the future (24 hours in this case)
},
'backtestLength': 6946 # number of samples that are used for backtesting (note that these samples are excluded from model building period)
},
"predictionIntervals": {
"confidenceLevel": 90 # confidence level of the prediction intervals (in %)
},
'extendedOutputConfiguration': {
'returnExtendedImportances': True # flag that specifies if the importances of features are returned in the response
}
}
Dataset used in this example has hourly sampling rate and contains data from 2015-01-01 to 2017-01-12.
Data used in this example are assembled from an individual PV plant in Central Europe. Production of this PV plant is our target. It is the second column in CSV file, right after column with timestamps. In this case the name of the target is ‘PV_obs’. Data are sampled hourly.
As meteo predictors are used GHI, DNI, DIF, TEMP, SA, and SE, as discussed in section ‘Data Recommendation Template’. In this demo we use historical actuals for both model building and out-of-sample forecasting.
Timestamp is in UTC+01:00 timezone and each value of the timestamp is the beginning of the period it corresponds to i.e. ‘PV_obs’ in the row with timestamp 2015-01-01 00:00:00 corresponds to the production of a PV plant during the period between 2015-01-01 00:00:00 and 2015-01-01 01:00:00.
We simulate a day ahead scenario – each day at 10:00 we would want to forecast target one whole day into the future. We assume that values of all predictors are available till the end of the next day (the end of the prediction horizon). This means that predictors’ data columns are a combination of actual values and forecast values. The last value of the target is from 09:00. To let TIM know that this is how it would be used in the production we can simply use the dataset in a form that would represent a real situation (as can be seen in the view below - notice the NaN values representing the missing data for the following day we wish to forecast). In this demo data set, out-of-sample validation is performed using historical actuals of meteorological data. More representative validation may be obtained by using historical forecasts of meteorological data instead.
data = tim_client.load_dataset_from_csv_file('data.csv', sep=',') # loading data from data.csv
data # quick look at the data
backtest = api_client.prediction_build_model_predict(data, configuration_backtest) # running the RTInstantML forecasting using data and defined configuration
backtest.status # status of the job
fig = plt.subplots.make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.02) # plot initialization
fig.add_trace(go.Scatter(x = data.loc[:, "timestamp"], y=data.loc[:, "PV_obs"],
name = "target", line=dict(color='black')), row=1, col=1) # plotting the target variable
fig.add_trace(go.Scatter(x = backtest.prediction.index,
y = backtest.prediction.loc[:, 'Prediction'],
name = "production forecast",
line = dict(color='purple')), row=1, col=1) # plotting production prediction
fig.add_trace(go.Scatter(x = backtest.prediction_intervals_upper_values.index,
y = backtest.prediction_intervals_upper_values.loc[:, 'UpperValues'],
marker = dict(color="#444"),
line = dict(width=0),
showlegend = False), row=1, col=1)
fig.add_trace(go.Scatter(x = backtest.prediction_intervals_lower_values.index,
y = backtest.prediction_intervals_lower_values.loc[:, 'LowerValues'],
fill = 'tonexty',
line = dict(width=0),
showlegend = False), row=1, col=1) # plotting confidence intervals
fig.add_trace(go.Scatter(x = backtest.aggregated_predictions[1]['values'].index,
y = backtest.aggregated_predictions[1]['values'].loc[:, 'Prediction'],
name = "in-sample MAE: " + str(round(backtest.aggregated_predictions[1]['accuracyMetrics']['MAE'], 2)),
line=dict(color='goldenrod')), row=1, col=1) # plotting in-sample prediction
fig.add_trace(go.Scatter(x = backtest.aggregated_predictions[3]['values'].index,
y = backtest.aggregated_predictions[3]['values'].loc[:, 'Prediction'],
name = "out-of-sample MAE: " + str(round(backtest.aggregated_predictions[3]['accuracyMetrics']['MAE'], 2)),
line = dict(color='red')), row=1, col=1) # plotting out-of-sample-sample prediction
fig.add_trace(go.Scatter(x = data.loc[:, "timestamp"], y=data.loc[:, "GHI"],
name = "GHI", line=dict(color='forestgreen')), row=2, col=1) # plotting the predictor GHI
fig.update_layout(height=600, width=1000,
title_text="Backtesting, modelling difficulty: "
+ str(round(backtest.data_difficulty, 2)) + "%" ) # update size and title of the plot
fig.show()
simple_importances = backtest.predictors_importances['simpleImportances'] # get predictor importances
simple_importances = sorted(simple_importances, key = lambda i: i['importance'], reverse=True) # sort by importance
extended_importances = backtest.predictors_importances['extendedImportances'] # get feature importances
extended_importances = sorted(extended_importances, key = lambda i: i['importance'], reverse=True) # sort by importance
si_df = pd.DataFrame(index=np.arange(len(simple_importances)), columns = ['predictor name', 'predictor importance (%)']) # initialize predictor importances dataframe
ei_df = pd.DataFrame(index=np.arange(len(extended_importances)), columns = ['feature name', 'feature importance (%)', 'time', 'type']) # initialize feature importances dataframe
for (i, si) in enumerate(simple_importances):
si_df.loc[i, 'predictor name'] = si['predictorName'] # get predictor name
si_df.loc[i, 'predictor importance (%)'] = si['importance'] # get importance of the predictor
for (i, ei) in enumerate(extended_importances):
ei_df.loc[i, 'feature name'] = ei['termName'] # get feature name
ei_df.loc[i, 'feature importance (%)'] = ei['importance'] # get importance of the feature
ei_df.loc[i, 'time'] = ei['time'] # get time of the day to which the feature corresponds
ei_df.loc[i, 'type'] = ei['type'] # get type of the feature
si_df.head() # predictor importances data frame
fig = go.Figure(go.Bar(x=si_df['predictor name'], y=si_df['predictor importance (%)'])) # plot the bar chart
fig.update_layout(height=400, # update size, title and axis titles of the chart
width=600,
title_text="Importances of predictors",
xaxis_title="Predictor name",
yaxis_title="Predictor importance (%)")
fig.show()
ei_df.head() # first few of the feature importances
time = '12:00:00' # time for which the feature importances are visualized
fig = go.Figure(go.Bar(x=ei_df[ei_df['time'] == time]['feature name'], # plot the bar chart
y=ei_df[ei_df['time'] == time]['feature importance (%)']))
fig.update_layout(height=700, # update size, title and axis titles of the chart
width=1000,
title_text="Importances of features (for {})".format(time),
xaxis_title="Feature name",
yaxis_title="Feature importance (%)")
fig.show()