Predictive maintenance for Aircraft Engines (Classification)¶

Title: Predictive maintenance for Aircraft Engines
Author: Michal Bezak, Tangent Works
Industry: Manufacturing, Transportation
Area: Predictive maintenance
Type: Classification

Description¶

Predictive maintenance covers variety of areas, including but not limited to:

  • prediction of failure,
  • root cause analysis,
  • failure detection,
  • failure type classification,
  • recommendation of mitigation or maintenance actions after failure.

To avoid downtimes caused by malfunction of technical equipment, companies follow maintenance schedule to discover and fix potential issues. Nevertheless, with more data (from sensors installed) available there are new possibilities how to make maintenance process better. With AI/ML it is possible to further optimize maintenance schedule.

This has tangible financial implications, imagine two extreme situations:

  • Maintenance planned for later than needed - tech. equipment would reach point of failure and cause operations disruptions, every minute of downtime can bring big losses, delays of other parts of the process, not to mention implications to health or lives of people.
  • If a company makes maintenance too often or too soon, expenses for time and material spent is higher than situation required, also, should component be replaced completely, capital invested into machine/components is not utilized to its full potential.

There is zero tolerance for failure of aircraft engines and so they are checked rather sooner than later. This is a very good example for our sample Use case.

In this Use case we will solve following problem: Is engine going to fail within N cycles?. From ML perspective this can be framed as a binary classification problem.

Business objective: Financial: Maximize value of capital, minimize operational costs
Business value: Lower cost, capital utilization
KPI: -
Business objective: Operations: Increase aircraft’s safety
Business value: Enhanced personnel/passengers safety
KPI: -
In [3]:
import logging
import pandas as pd
import plotly as plt
import plotly.express as px
import plotly.graph_objects as go
import numpy as np
import json
import datetime

from sklearn.metrics import confusion_matrix, recall_score, precision_score

import tim_client

Credentials and logging

(Do not forget to fill in your credentials in the credentials.json file)

In [4]:
with open('credentials.json') as f:
    credentials_json = json.load(f)                     # loading the credentials from credentials.json

TIM_URL = 'https://timws.tangent.works/v4/api'          # URL to which the requests are sent

SAVE_JSON = False                                       # if True - JSON requests and responses are saved to JSON_SAVING_FOLDER
JSON_SAVING_FOLDER = 'logs/'                            # folder where the requests and responses are stored

LOGGING_LEVEL = 'INFO'
In [5]:
level = logging.getLevelName(LOGGING_LEVEL)
logging.basicConfig(level=level, format='[%(levelname)s] %(asctime)s - %(name)s:%(funcName)s:%(lineno)s - %(message)s')
logger = logging.getLogger(__name__)
In [6]:
credentials = tim_client.Credentials(credentials_json['license_key'], credentials_json['email'], credentials_json['password'], tim_url=TIM_URL)
api_client = tim_client.ApiClient(credentials)

api_client.save_json = SAVE_JSON
api_client.json_saving_folder_path = JSON_SAVING_FOLDER
[INFO] 2021-02-09 17:00:40,966 - tim_client.api_client:save_json:66 - Saving JSONs functionality has been disabled
[INFO] 2021-02-09 17:00:40,968 - tim_client.api_client:json_saving_folder_path:75 - JSON destination folder changed to logs
In [7]:
# results of each experiment iteration will be collected
results_last = dict()
results_all = dict()

Dataset¶

Data contain operational information about 100 aircraft engines consolidated into single table with columns about:

  • operational settings (3),
  • measurements from sensors (21),
  • cycle no.

Train part of dataset contained information about when (at which cycle) engine failed. Target (binary) variable is called window which is to express if given timestamp is in zone/window of failure (1) or not (0). Zone starts N steps ahead of expected failure, and ends at the point of failure.

Decision how to set min. value of N, would be in general determined by factors such as:

  • time required to do maintenance,
  • average lead time until the maintenance can start (right resources are available, logistics etc.)

Bottom line is, that there must be sufficient time left for operations team to take action, i.e. to conduct maintenance of respective engine in given window.

Sampling¶

Data are sampled on a daily basis.

Data¶

Original raw data files were consolidated into a single time series file - train file is used for in-sample interval only, test file is use for out-of-sample interval.

Structure of CSV file:

Column name Description Type Availability
Timestamp Date Timestamp column
window Binary label, denotes whether engine will fail in given (time) window Target t-1
cycle Operations cycle no. Predictor t+1
setting1 ... setting3 Operational settings Predictor t+1
s1 ... s21 Sensor measurements Predictor t+1

Due to consolidation into a single file timestamps were re-generated to avoid duplicates and be continous (they are not real timestamps from operations records).

In each experiment iteration, window was set to different value to see which one would deliver the best results. We tried 7, 15 and 30 steps (days).

Please note that engine_id column is not present in file because it is identifier, and not real measurement/setting.

Data situation¶

If we want TIM to do classification the very last record of target must be kept empty (NaN/None). TIM will use all available predictors to classify given record. Furthermore, this situation will be replicated to calculate results for all out-of-sample records during back-testing.

CSV files used in experiments can be downloaded here.

Source¶

Raw data files were acquired from Microsoft's server:

http://azuremlsamples.azureml.net/templatedata/PM_train.txt

http://azuremlsamples.azureml.net/templatedata/PM_test.txt

http://azuremlsamples.azureml.net/templatedata/PM_truth.txt

In [199]:
iteration_code = 30 # 7, 15, 30
In [200]:
fpath = 'data_w_'+str(iteration_code)+'.csv'
data = tim_client.load_dataset_from_csv_file( fpath , sep=',')

data.tail()
Out[200]:
timestamp window cycle setting1 setting2 setting3 s1 s2 s3 s4 ... s12 s13 s14 s15 s16 s17 s18 s19 s20 s21
33722 2102-08-29 1.0 194 0.0049 0.0000 100.0 518.67 643.24 1599.45 1415.79 ... 520.69 2388.00 8213.28 8.4715 0.03 394 2388 100.0 38.65 23.1974
33723 2102-08-30 1.0 195 -0.0011 -0.0001 100.0 518.67 643.22 1595.69 1422.05 ... 521.05 2388.09 8210.85 8.4512 0.03 395 2388 100.0 38.57 23.2771
33724 2102-08-31 1.0 196 -0.0006 -0.0003 100.0 518.67 643.44 1593.15 1406.82 ... 521.18 2388.04 8217.24 8.4569 0.03 395 2388 100.0 38.62 23.2051
33725 2102-09-01 1.0 197 -0.0038 0.0001 100.0 518.67 643.26 1594.99 1419.36 ... 521.33 2388.08 8220.48 8.4711 0.03 395 2388 100.0 38.66 23.2699
33726 2102-09-02 NaN 198 0.0013 0.0003 100.0 518.67 642.95 1601.62 1424.99 ... 521.07 2388.05 8214.64 8.4903 0.03 396 2388 100.0 38.70 23.1855

5 rows × 27 columns

In [201]:
data.shape
Out[201]:
(33727, 27)
In [202]:
timestamp_column = 'timestamp'
target_column = 'window'

Engine settings¶

Parameters that need to be set are:

  • Prediction horizon (Sample + 1) - because we want to classify the next value only.
  • Back-test length (that defines out-of-sample interval) is set arbitrarily, to be of exact size of test dataset.
  • allowOffsets is set to False, and features completely disregard temporal (lagged) dictionaries, because we want to treat each record independently.

We also ask for additional data from engine to see details of sub-models so we define extendedOutputConfiguration parameter as well.

In [203]:
backtest_length = 13096   # information taken from original raw data
In [204]:
configuration_backtest = {
    'usage': {                                 
        'predictionTo': { 
            'baseUnit': 'Sample',             
            'offset': 1
        }, 
        'backtestLength': backtest_length  
    },
    'allowOffsets': False,
    'extendedOutputConfiguration': {
        'returnExtendedImportances': True,
    }
}

Experiment iteration(s)¶

We will run 3 different iterations (each with different version of dataset - length of window encoded in data is different for each).

Also, we will evaluate results in two distinct ways:

  1. evaluate all points in out-of-sample interval,
  2. evaluate only the last data point per engine.

Cells below must be run for each iteration, then results all collected in results_all, and results_last variables and rendered at the end of this notebook.

In [205]:
print('Current dataset', fpath)
Current dataset data_w_30.csv
In [206]:
#
# Beginning of experiment iteration cells
#

Proportion of positive values (1) for in-sample and out-of-sample intervals.

In [207]:
data.iloc[:-backtest_length][target_column].value_counts()   # in-sample
Out[207]:
0.0    17531
1.0     3100
Name: window, dtype: int64
In [208]:
data.iloc[-backtest_length:][target_column].value_counts()   # out-of-sample
Out[208]:
0.0    12764
1.0      331
Name: window, dtype: int64
In [209]:
data.iloc[-backtest_length:].tail()
Out[209]:
timestamp window cycle setting1 setting2 setting3 s1 s2 s3 s4 ... s12 s13 s14 s15 s16 s17 s18 s19 s20 s21
33722 2102-08-29 1.0 194 0.0049 0.0000 100.0 518.67 643.24 1599.45 1415.79 ... 520.69 2388.00 8213.28 8.4715 0.03 394 2388 100.0 38.65 23.1974
33723 2102-08-30 1.0 195 -0.0011 -0.0001 100.0 518.67 643.22 1595.69 1422.05 ... 521.05 2388.09 8210.85 8.4512 0.03 395 2388 100.0 38.57 23.2771
33724 2102-08-31 1.0 196 -0.0006 -0.0003 100.0 518.67 643.44 1593.15 1406.82 ... 521.18 2388.04 8217.24 8.4569 0.03 395 2388 100.0 38.62 23.2051
33725 2102-09-01 1.0 197 -0.0038 0.0001 100.0 518.67 643.26 1594.99 1419.36 ... 521.33 2388.08 8220.48 8.4711 0.03 395 2388 100.0 38.66 23.2699
33726 2102-09-02 NaN 198 0.0013 0.0003 100.0 518.67 642.95 1601.62 1424.99 ... 521.07 2388.05 8214.64 8.4903 0.03 396 2388 100.0 38.70 23.1855

5 rows × 27 columns

In [210]:
backtest = api_client.prediction_build_model_predict(data, configuration_backtest)     # running the RTInstantML forecasting using data and defined configuration
backtest.status                                                                        # status of the job
Out[210]:
'Finished'
In [211]:
backtest.result_explanations
Out[211]:
[]
In [212]:
out_of_sample_predictions = backtest.aggregated_predictions[1]['values']
In [213]:
out_of_sample_predictions.rename( columns = {'Prediction':target_column+'_pred'}, inplace=True)
In [214]:
out_of_sample_timestamps = out_of_sample_predictions.index.tolist()
In [215]:
evaluation_data = data.copy()

evaluation_data[ timestamp_column ] = pd.to_datetime(data[ timestamp_column ]).dt.tz_localize('UTC')
evaluation_data = evaluation_data[ evaluation_data[ timestamp_column ].isin( out_of_sample_timestamps ) ]

evaluation_data.set_index( timestamp_column,inplace=True)

evaluation_data = evaluation_data[ [ target_column,'cycle' ] ]
In [216]:
def encode_class( x ):
    if x < .5: return 0
    return 1
In [217]:
evaluation_data = evaluation_data.join( out_of_sample_predictions )

evaluation_data[ target_column+'_pred_p' ] =  evaluation_data[ target_column+'_pred' ]
evaluation_data[ target_column+'_pred' ] =  evaluation_data[ target_column+'_pred' ].apply( encode_class )

evaluation_data.head()
Out[217]:
window cycle window_pred window_pred_p
timestamp
2066-10-24 00:00:00+00:00 1.0 200 1 0.993112
2066-10-25 00:00:00+00:00 0.0 1 0 0.000000
2066-10-26 00:00:00+00:00 0.0 2 0 0.006201
2066-10-27 00:00:00+00:00 0.0 3 0 0.000000
2066-10-28 00:00:00+00:00 0.0 4 0 0.000000
In [218]:
# Run this cell for the last record evaluation only

# Raw data contains records per 100 engines, where records per each engine ended at certain cycle, thus in our consolidated file, we can detect the last record for each engine based on this information (sudden drop of cycle value).

# Filter out evaluation data to keep only the last record for each engine.
# evaluation_data['cycle_diff'] = evaluation_data['cycle'].diff(-1)
# evaluation_data = evaluation_data[ evaluation_data['cycle_diff']>0 ]
# evaluation_data

No. of positive values in out-of-sample interval

In [219]:
evaluation_data[ evaluation_data[target_column]==1 ].shape[0]
Out[219]:
332

Insights - inspecting ML models¶

Simple and extended importances are available for you to see to what extent each predictor contributes to explanation of variance of target variable.

In [220]:
simple_importances = backtest.predictors_importances['simpleImportances']
simple_importances = sorted(simple_importances, key = lambda i: i['importance'], reverse=True) 

simple_importances = pd.DataFrame.from_dict( simple_importances )
In [221]:
fig = go.Figure()

fig.add_trace(go.Bar( x = simple_importances['predictorName'],
                      y = simple_importances['importance'] )
             )

fig.update_layout(
        title='Simple importances'
)

fig.show()
In [222]:
extended_importances = backtest.predictors_importances['extendedImportances']
extended_importances = sorted(extended_importances, key = lambda i: i['importance'], reverse=True) 

extended_importances = pd.DataFrame.from_dict( extended_importances )

#extended_importances
In [223]:
fig = go.Figure()

fig.add_trace(go.Bar( x = extended_importances[ extended_importances['time'] == '[1]' ]['termName'],
                      y = extended_importances[ extended_importances['time'] == '[1]' ]['importance'] )
             )

fig.update_layout(
        title='Features generated from predictors used by model',
        height = 700
)

fig.show()

Evaluation of results¶

Results for out-of-sample interval.

Confusion matrix

In [224]:
TP = evaluation_data[ ( evaluation_data[ target_column+'_pred' ] == 1 ) & ( evaluation_data[target_column] == 1 ) ].shape[0]
FP = evaluation_data[ ( evaluation_data[ target_column+'_pred' ] == 1 ) & ( evaluation_data[target_column] == 0 ) ].shape[0]
TN = evaluation_data[ ( evaluation_data[ target_column+'_pred' ] == 0 ) & ( evaluation_data[target_column] == 0 ) ].shape[0]
FN = evaluation_data[ ( evaluation_data[ target_column+'_pred' ] == 0 ) & ( evaluation_data[target_column] == 1 ) ].shape[0]

print('True positive:', TP)
print('False positive:', FP)
print('True negative:', TN)
print('False negative:', FN)
True positive: 199
False positive: 50
True negative: 12714
False negative: 133
In [225]:
metrics_accuracy = evaluation_data[ evaluation_data[ target_column+'_pred' ] == evaluation_data[target_column]  ].shape[0] / evaluation_data.shape[0]
metrics_accuracy
Out[225]:
0.9860262675626146
In [226]:
cm = confusion_matrix( evaluation_data[ target_column ], evaluation_data[ target_column+'_pred' ]  )
In [227]:
cm = np.pad(cm, (1, 0), 'constant')
cm[0][2] = 1
cm[2][0] = 1

cm
Out[227]:
array([[    0,     0,     1],
       [    0, 12714,    50],
       [    1,   133,   199]])

Precision, Recall, Specificity, F1 score¶

Precision - proportion of positive data points that are correctly considered as positive, with respect to data points predicted as positive (both correctly and incorrectly).

Recall (true positive rate or Sensitivity) - proportion of positive data points that are correctly considered as positive, with respect to all positive data points.

Specificity (true negative rate) - proportion of negative data points that are correctly considered as negative, with respect to all negative data points.

In [228]:
metrics_precision = precision_score( evaluation_data[ target_column ] , evaluation_data[ target_column+'_pred' ]  )
metrics_precision
Out[228]:
0.7991967871485943
In [229]:
metrics_recall = recall_score( evaluation_data[ target_column ] ,  evaluation_data[ target_column+'_pred' ]  )
metrics_recall
Out[229]:
0.5993975903614458
In [230]:
metrics_specificity = TN / ( FP + TN )
metrics_specificity
Out[230]:
0.9960827326856785
In [231]:
metrics_f1 = 2 * (metrics_precision * metrics_recall) / (metrics_precision + metrics_recall)
metrics_f1
Out[231]:
0.6850258175559379
In [232]:
#
# End of experiment iteration cells
#

Summary¶

In this Use case we demonstrated that TIM can be used also for classification tasks. With default settings we quickly compared multiple scenarios. In cells below, you can find comparison between various window lengths.

All records¶

In [233]:
#results_all[ iteration_code ] = { 'Window':iteration_code, 'Accuracy': metrics_accuracy, 'Precision': metrics_precision,'Recall':metrics_recall, 'F1-score':metrics_f1 }
In [234]:
results_for_all_test_records = pd.DataFrame.from_dict( results_all, orient='index' )
results_for_all_test_records
Out[234]:
Window Accuracy Precision Recall F1-score
7 7 0.999847 0.500000 1.000000 0.666667
15 15 0.997175 0.815789 0.508197 0.626263
30 30 0.986026 0.799197 0.599398 0.685026

Last record only¶

In [235]:
#results_last[ iteration_code ] = { 'Window':iteration_code, 'Accuracy':metrics_accuracy,'Precision':metrics_precision,'Recall':metrics_recall,'F1-score':metrics_f1 }
In [236]:
results_for_last_test_record = pd.DataFrame.from_dict( results_last, orient='index' )
results_for_last_test_record
Out[236]:
Window Accuracy Precision Recall F1-score
7 7 1.00 1.0 1.000000 1.000000
15 15 0.97 1.0 0.727273 0.842105
30 30 0.91 0.9 0.720000 0.800000