Trading on financial markets is very risky endeavor and, when proper risk management is not in place, losses are just around the corner. It requires many elements to construct profitable trading strategy while risk management and position sizing should be certainly in the mix. One of the essential building blocks is mechanism estimating price movement.
Each market has different level of volatility and risk. One of the most liquid markets in the world is Forex where currency pairs are traded, e.g. EURUSD, NOKEUR etc. Forex is traded 24 hours a day 5 days a week and allows traders to open and close positions at rapid speed.
Our Use case demonstrates how TIM can support forecasting of direction of price movement on hourly basis. TIM will forecast price movement (direction) for the next hour. With this information, traders would be more confident to open new position. To evaluate business results we take very simple approach - each hour position is liquidated completely.
Business objective: | Make profit from trading on forex intra-day market |
Value: | Direct input into decision making process when opening positions - understanding movement of price in upcoming hour |
KPI: | Profit / Loss |
Disclaimer: Please note that very purpose of this notebook is demonstration only and must not be taken as advice to be followed (e.g. to trade real money); We provide full disclaimer at the very bottom of this page, read it carefully.
import logging
import pandas as pd
import plotly as plt
import plotly.express as px
import plotly.graph_objects as go
import numpy as np
import json
import pickle
import datetime
import tim_client
Credentials and logging
(Do not forget to fill in your credentials in the credentials.json file)
with open('credentials.json') as f:
credentials_json = json.load(f) # loading the credentials from credentials.json
TIM_URL = 'https://timws.tangent.works/v4/api' # URL to which the requests are sent
SAVE_JSON = False # if True - JSON requests and responses are saved to JSON_SAVING_FOLDER
JSON_SAVING_FOLDER = 'logs/' # folder where the requests and responses are stored
LOGGING_LEVEL = 'INFO'
level = logging.getLevelName(LOGGING_LEVEL)
logging.basicConfig(level=level, format='[%(levelname)s] %(asctime)s - %(name)s:%(funcName)s:%(lineno)s - %(message)s')
logger = logging.getLogger(__name__)
credentials = tim_client.Credentials(credentials_json['license_key'], credentials_json['email'], credentials_json['password'], tim_url=TIM_URL)
api_client = tim_client.ApiClient(credentials)
api_client.save_json = SAVE_JSON
api_client.json_saving_folder_path = JSON_SAVING_FOLDER
[INFO] 2021-01-22 13:56:00,271 - tim_client.api_client:save_json:66 - Saving JSONs functionality has been disabled [INFO] 2021-01-22 13:56:00,274 - tim_client.api_client:json_saving_folder_path:75 - JSON destination folder changed to logs
We will collect results for evaluation in "results".
results = { 'long': {}, 'short': {} }
Source dataset contains values for USDCHF currency pair (open, high, low, close, volume).
Two additional columns were added:
Data are sampled on hourly basis and contain gaps because Forex is not traded during weekends.
Column name | Description | Type | Availability |
---|---|---|---|
timestamp | Timestamp on hourly basis - GMT | Timestamp column | |
direction | 1 if change is >0, otherwise 0 | Target | t-1 |
change | Change against previous timestamp for c_usdchf | Predictor | t-1 |
c_usdchf | Price at which trading closed for given hour (USDCHF), our target | Predictor | t-1 |
o_usdchf | Price at which particular hour opened (USDCHF) | Predictor | t-1 |
h_usdchf | The highest price in given interval | Predictor | t-1 |
l_usdchf | The lowest price in given interval (USDCHF) | Predictor | t-1 |
v_usdchf | Volume traded (USDCHF) | Predictor | t-1 |
TIM detects forecasting situation from current "shape" of data, e.g. if last target value is available at 15:00 timestamp, it will start forecasting as of 16:00. It will also takes the last 15:00 timestamp as reference point against which availability per each column in dataset is determined - this rule is then followed for back-testing when calculating results for out-of-sample interval.
In our case all values are aligned until t-1.
We wanted to back-test all forecasting situations during the day (i.e. all 24 hours), so there are 24 versions of dataset available.
Package of CSV files used in experiments can be downloaded here.
Raw data used in this demonstration were downloaded from INVESTING.COM website.
SITUATIONS = { h: 'data_'+str(h)+'_4B.csv' for h in range(0,24) }
SITUATIONS
{0: 'data_0_4B.csv', 1: 'data_1_4B.csv', 2: 'data_2_4B.csv', 3: 'data_3_4B.csv', 4: 'data_4_4B.csv', 5: 'data_5_4B.csv', 6: 'data_6_4B.csv', 7: 'data_7_4B.csv', 8: 'data_8_4B.csv', 9: 'data_9_4B.csv', 10: 'data_10_4B.csv', 11: 'data_11_4B.csv', 12: 'data_12_4B.csv', 13: 'data_13_4B.csv', 14: 'data_14_4B.csv', 15: 'data_15_4B.csv', 16: 'data_16_4B.csv', 17: 'data_17_4B.csv', 18: 'data_18_4B.csv', 19: 'data_19_4B.csv', 20: 'data_20_4B.csv', 21: 'data_21_4B.csv', 22: 'data_22_4B.csv', 23: 'data_23_4B.csv'}
We will run all cells below (until Evaluation section) 24 times, to simulate results for each forecasting situation.
Cell below sets the current situation, i.e. to simulate hour at which we will be forecasting.
situation = 23
Read dataset per given situation.
data = tim_client.load_dataset_from_csv_file('data/'+SITUATIONS[ situation ], sep=',')
data.head()
timestamp | direction | c_usdchf | o_usdchf | h_usdchf | l_usdchf | v_usdchf | change | |
---|---|---|---|---|---|---|---|---|
0 | 2016-02-08 03:00:00 | 1.0 | 0.9938 | 0.9932 | 0.9938 | 0.9929 | 0 | 0.0004 |
1 | 2016-02-08 04:00:00 | 0.0 | 0.9938 | 0.9938 | 0.9941 | 0.9932 | 0 | 0.0000 |
2 | 2016-02-08 05:00:00 | 0.0 | 0.9937 | 0.9938 | 0.9942 | 0.9934 | 0 | -0.0001 |
3 | 2016-02-08 06:00:00 | 0.0 | 0.9936 | 0.9938 | 0.9942 | 0.9935 | 0 | -0.0001 |
4 | 2016-02-08 07:00:00 | 1.0 | 0.9938 | 0.9936 | 0.9942 | 0.9936 | 0 | 0.0002 |
data.tail()
timestamp | direction | c_usdchf | o_usdchf | h_usdchf | l_usdchf | v_usdchf | change | |
---|---|---|---|---|---|---|---|---|
24226 | 2019-12-12 18:00:00 | 1.0 | 0.98660 | 0.98610 | 0.98690 | 0.98555 | 1829 | 0.00040 |
24227 | 2019-12-12 19:00:00 | 0.0 | 0.98610 | 0.98645 | 0.98705 | 0.98595 | 2013 | -0.00050 |
24228 | 2019-12-12 20:00:00 | 0.0 | 0.98580 | 0.98615 | 0.98730 | 0.98560 | 2234 | -0.00030 |
24229 | 2019-12-12 21:00:00 | 0.0 | 0.98475 | 0.98600 | 0.98640 | 0.98460 | 2384 | -0.00105 |
24230 | 2019-12-12 22:00:00 | 1.0 | 0.98510 | 0.98480 | 0.98535 | 0.98475 | 1012 | 0.00035 |
data.shape
(24231, 8)
Visualisation of closing price.
fig = plt.subplots.make_subplots(rows=1, cols=1, shared_xaxes=True, vertical_spacing=0.02)
fig.add_trace( go.Scatter( x = data.loc[:, "timestamp"], y=data.loc[:, "c_usdchf"], name = "USDCHF", line=dict(color='blue')), row=1, col=1)
fig.update_layout(height=500, width=1000, title = 'Closing price: USDCHF')
fig.show()
The only parameters that need to be set are:
We also ask for additional data from engine to see details of sub-models so we define extendedOutputConfiguration as well.
30% of data will be used for out-of-sample interval.
backtest_length = int( data.shape[0] * .3 )
backtest_length
7269
configuration_backtest = {
'usage': {
'predictionTo': {
'baseUnit': 'Sample',
'offset': 1 # number of units we want to predict into the future
},
'backtestLength': backtest_length # number of samples that are used for backtesting (note that these samples are excluded from model building period)
},
'extendedOutputConfiguration': {
'returnExtendedImportances': True # flag that specifies if the importances of features are returned in the response
}
,'interpolation':{
'maxLength': 48*2,
'type': 'Linear'
}
}
For each situation we will run experiment, iteration by iteration, get insights into models and collect results.
backtest = api_client.prediction_build_model_predict(data, configuration_backtest) # running the RTInstantML forecasting using data and defined configuration
backtest.status # status of the job
'FinishedWithWarning'
backtest.result_explanations
[{'index': 1, 'message': 'Predictor direction has a value missing for timestamp 2016-02-12 23:00:00.'}]
def encode_predictions(x):
return 1 if x > 0.5 else 0
out_of_sample_timestamps = backtest.aggregated_predictions[1]['values'].index.tolist()
out_of_sample_predictions = pd.DataFrame.from_dict( backtest.aggregated_predictions[1]['values'] )
out_of_sample_predictions['direction_pred'] = out_of_sample_predictions['Prediction'].apply( lambda x: encode_predictions(x) )
out_of_sample_predictions = out_of_sample_predictions[ ['direction_pred'] ]
evaluation_data = data.copy()
evaluation_data['timestamp'] = pd.to_datetime(data['timestamp']).dt.tz_localize('UTC')
evaluation_data = evaluation_data[ evaluation_data['timestamp'].isin( out_of_sample_timestamps ) ]
evaluation_data.set_index('timestamp',inplace=True)
evaluation_data = evaluation_data[ ['direction','change'] ]
evaluation_data = evaluation_data.join( out_of_sample_predictions )
evaluation_data = evaluation_data[ evaluation_data.index.hour == ( situation ) ]
evaluation_data.head()
direction | change | direction_pred | |
---|---|---|---|
2019-02-13 23:00:00+00:00 | 1.0 | 0.00030 | 0 |
2019-02-14 23:00:00+00:00 | 1.0 | 0.00005 | 0 |
2019-02-15 23:00:00+00:00 | 0.0 | -0.00015 | 0 |
2019-02-17 23:00:00+00:00 | 0.0 | -0.00015 | 0 |
2019-02-18 23:00:00+00:00 | 1.0 | 0.00060 | 0 |
No. of evaluated data points
evaluation_data.shape[0]
219
Evaluating period from
evaluation_data.index[0]
Timestamp('2019-02-13 23:00:00+0000', tz='UTC')
Evaluating period to
evaluation_data.index[-1]
Timestamp('2019-12-11 23:00:00+0000', tz='UTC')
Evaluation period length (days)
trading_p = evaluation_data.index[-1] - evaluation_data.index[0]
trading_p.days
301
TIM offers you the view to see which predictors are considered as important. Simple and extended importances are available for you to see to what extent each predictor contributes in explaining variance of target variable.
simple_importances = backtest.predictors_importances['simpleImportances']
simple_importances = sorted(simple_importances, key = lambda i: i['importance'], reverse=True)
extended_importances = backtest.predictors_importances['extendedImportances']
simple_importances
[{'importance': 100.0, 'predictorName': 'direction'}]
extended_importances
[{'time': '[1]', 'type': 'TargetAndTargetTransformation', 'termName': 'EMA_direction(t-1, w = 2)', 'importance': 55.06}, {'time': '[1]', 'type': 'Interaction', 'termName': 'EMA_direction(t-1, w = 2) & cos(2πt / 24.0 hours)', 'importance': 44.94}, {'time': '[1]', 'type': 'TargetAndTargetTransformation', 'termName': 'Intercept', 'importance': 0.0}]
We are predicting direction of price movement in upcoming hour, up means 1, down (or no change - although this never happens) means 0. It is a binary classification problem and so, to evaluate results, we will calculate percentage of correctly predicted direction for both, 1 and 0, for each forecasting situation.
To understand theoretical business impact of taking actions based on predictions, we provide calculations of gains, losses and net p/l for both types (long and short) of positions opened. Results are available for each forecasting situation - i.e. predicted hour.
Hypothetical value of capital to be traded on each position.
POSITION_SIZE = 10**6
'{:,}'.format(POSITION_SIZE)
'1,000,000'
filter_gains = evaluation_data['direction_pred'] == evaluation_data['direction']
filter_losses = evaluation_data['direction_pred'] != evaluation_data['direction']
direction_check = 1
filter_direction = evaluation_data['direction'] == direction_check
gains = sum( evaluation_data[ filter_direction & filter_gains ]['change'] * POSITION_SIZE )
losses = sum( evaluation_data[ filter_direction & filter_losses ]['change'] * POSITION_SIZE )
pl = gains-losses
print('Gains', '%.4f' % gains )
print('Losses', '%.4f' % losses )
print('Profit/Loss', '%.4f' % pl )
Gains 2149.9991 Losses 17800.3907 Profit/Loss -15650.3916
result = evaluation_data[ filter_direction ][ ['direction_pred'] ].value_counts()
result_len = evaluation_data[ filter_direction ].shape[0]
accuracy = 0
if direction_check in result: accuracy = result[ direction_check ].values[0] / result_len
print('Accuracy', '%.4f' % accuracy )
Accuracy 0.0909
results['long'][ str(situation) ] = {'gains': gains, 'losses': losses, 'pl': pl, 'accuracy': accuracy }
direction_check = 0
filter_direction = evaluation_data['direction'] == direction_check
gains = -sum( evaluation_data[ filter_direction & filter_gains ]['change'] * POSITION_SIZE )
losses = -sum( evaluation_data[ filter_direction & filter_losses ]['change'] * POSITION_SIZE )
pl = gains-losses
print('Gains', '%.4f' % gains )
print('Losses', '%.4f' % losses )
print('Profit/Loss', '%.4f' % pl )
Gains 42099.6547 Losses 900.0301 Profit/Loss 41199.6245
result = evaluation_data[ filter_direction ][ ['direction_pred'] ].value_counts()
result_len = evaluation_data[ filter_direction ].shape[0]
accuracy = 0
if direction_check in result: accuracy = result[ direction_check ].values[0] / result_len
print('Accuracy', '%.4f' % accuracy )
Accuracy 0.9648
results['short'][ str(situation) ] = {'gains': gains, 'losses': losses, 'pl': pl, 'accuracy': accuracy }
# save results to file on hard drive
# backup_file = open('results5.3.pkl', 'wb')
# pickle.dump(results, backup_file)
# backup_file.close()
Following tables and charts depict evaluation of out-of-sample interval for each forecasting situation.
A few explanations before we dive in:
results_df_long = pd.DataFrame.from_dict(results['long'], orient='index')
results_df_long
gains | losses | pl | accuracy | |
---|---|---|---|---|
0 | 4799.902439 | 19800.305366 | -15000.402927 | 0.094118 |
1 | 2349.972725 | 32899.439335 | -30549.466610 | 0.064516 |
2 | 3249.943256 | 47600.626945 | -44350.683689 | 0.068966 |
3 | 5499.899387 | 30600.070953 | -25100.171566 | 0.101852 |
4 | 3649.950027 | 31350.255012 | -27700.304985 | 0.065421 |
5 | 4499.912262 | 33800.125122 | -29300.212860 | 0.140625 |
6 | 3750.085831 | 24149.835110 | -20399.749279 | 0.133333 |
7 | 9749.948979 | 26050.090790 | -16300.141811 | 0.229358 |
8 | 13799.846173 | 47600.030899 | -33800.184726 | 0.207207 |
9 | 18349.826336 | 48149.943352 | -29800.117016 | 0.279570 |
10 | 40700.376034 | 35050.034523 | 5650.341511 | 0.495575 |
11 | 32749.831676 | 29699.325562 | 3050.506114 | 0.470085 |
12 | 28950.095176 | 25100.171566 | 3849.923610 | 0.490741 |
13 | 42899.668217 | 31349.599361 | 11550.068855 | 0.529412 |
14 | 28950.452804 | 47049.880028 | -18099.427223 | 0.367347 |
15 | 35799.443722 | 49350.202084 | -13550.758362 | 0.414414 |
16 | 20900.189876 | 57150.125504 | -36249.935627 | 0.257426 |
17 | 8449.912071 | 67400.336266 | -58950.424195 | 0.145455 |
18 | 12550.175190 | 48350.214959 | -35800.039769 | 0.213592 |
19 | 7900.416851 | 35149.991512 | -27249.574661 | 0.191304 |
20 | 3500.103951 | 62250.077724 | -58749.973774 | 0.085271 |
21 | 13300.001621 | 44900.000095 | -31599.998474 | 0.204724 |
22 | 6050.050259 | 28749.942780 | -22699.892521 | 0.115702 |
23 | 2149.999142 | 17800.390721 | -15650.391579 | 0.090909 |
results_df_long['pl'].sum()
-566801.0115637623
results_df_short= pd.DataFrame.from_dict(results['short'], orient='index')
results_df_short
gains | losses | pl | accuracy | |
---|---|---|---|---|
0 | 39600.133896 | 1299.917698 | 38300.216198 | 0.954545 |
1 | 22600.233555 | 1349.806786 | 21250.426769 | 0.926316 |
2 | 38050.472736 | 2099.931240 | 35950.541496 | 0.920792 |
3 | 30900.239944 | 4250.049591 | 26650.190353 | 0.888889 |
4 | 28699.815273 | 5549.967289 | 23149.847984 | 0.851852 |
5 | 16950.249672 | 3150.045872 | 13800.203800 | 0.850575 |
6 | 21900.057793 | 2749.800682 | 19150.257111 | 0.858407 |
7 | 28500.258923 | 3750.026226 | 24750.232696 | 0.872727 |
8 | 43249.368668 | 13199.806213 | 30049.562454 | 0.761905 |
9 | 69150.209427 | 27599.990368 | 41550.219059 | 0.672000 |
10 | 34399.807453 | 30499.875546 | 3899.931907 | 0.538462 |
11 | 20650.446415 | 33749.818802 | -13099.372387 | 0.494949 |
12 | 20849.823952 | 38049.757480 | -17199.933529 | 0.453704 |
13 | 33449.947834 | 20050.168038 | 13399.779797 | 0.571429 |
14 | 67349.255085 | 31049.907207 | 36299.347877 | 0.618644 |
15 | 47200.381756 | 28250.217438 | 18950.164318 | 0.647619 |
16 | 76599.955558 | 27749.896050 | 48850.059509 | 0.773913 |
17 | 62149.822712 | 20200.431347 | 41949.391365 | 0.754717 |
18 | 48400.402069 | 13249.993324 | 35150.408745 | 0.831858 |
19 | 57549.893856 | 4450.023174 | 53099.870682 | 0.933333 |
20 | 44550.478459 | 3150.165081 | 41400.313378 | 0.891089 |
21 | 41900.098324 | 7600.247860 | 34299.850464 | 0.875969 |
22 | 37150.204182 | 1450.002193 | 35700.201988 | 0.947368 |
23 | 42099.654675 | 900.030136 | 41199.624538 | 0.964789 |
results_df_short['pl'].sum()
648501.3365736774
Charts below show results for each forecasting situation.
results_df_short.index = [ int(i) for i in results_df_short.index ]
results_df_short.sort_index(inplace=True)
results_df_long.index = [ int(i) for i in results_df_long.index ]
results_df_long.sort_index(inplace=True)
fig = go.Figure()
fig.add_trace( go.Bar( x = results_df_long.index, y=results_df_long['accuracy'].values, name = "Long", marker_color='blue') )
fig.add_trace( go.Bar( x = results_df_short.index, y=results_df_short['accuracy'].values, name = "Short", marker_color='green') )
fig.update_layout(height=500, width=1000, title='Accuracy')
fig.show()
fig = go.Figure()
fig.add_trace( go.Bar( x = results_df_long.index, y=results_df_long['pl'].values, name = "Long", marker_color='blue') )
fig.add_trace( go.Bar( x = results_df_short.index, y=results_df_short['pl'].values, name = "Short", marker_color='green') )
fig.update_layout( height=500, width=1000, title='P/L' )
fig.show()
We can conclude that TIM, with given data and approach (i.e. using one global model used for predictions for out-of-sample interval) predicted price moves downwards a lot better in most situations.
In production setup though, this approach would not be followed because TIM allows to quickly build new model with every new prediction, thus would not use one pre-built model. This is extremely important for time series with very dynamic changes - like financial markets certainly are.
With the basic data and almost zero effort, we demonstrated that TIM, with default settings, has potential to achieve quite remarkable results. Nevertheless, there is a big room for improvements if one would want to continue further, for instance, experimenting on other intervals, adding additional (explanatory) data etc.
Any content on this page should not be relied upon as advice or construed as providing recommendations of any kind. It is your responsibility to confirm and decide which trades to make. Past results are no indication of future performance. In no event should the content of this page be understood as an express or implied promise or guarantee. We are not responsible for any losses incurred as a result of using any of information on this page. Information provided in this page is intended solely for informational purposes and is obtained from sources believed to be reliable. Information is in no way guaranteed. No guarantee of any kind is implied or possible where projections of future conditions are attempted.