Anomaly | In data mining, anomaly detection is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data. Typically the anomalous items will translate to some kind of problem such as bank fraud, a structural defect, medical problems or errors in a text. Anomalies are also referred to as outliers, novelties, noise, deviations and exceptions. |
Automated Machine Learning | Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real world problems. In a typical machine learning application, practitioners must apply the appropriate data pre-processing, feature engineering, feature extraction, and feature selection methods that make the dataset amenable for machine learning. |
Backtesting | Act of building your model on a set of In-Sample historical data and evaluating it on the Out-of-Sample historical data to get a feel for how the model would work in the real production. |
Business case | Business cases are created to help decision-makers ensure that the proposed initiative will deliver value compared to alternative initiatives based on the objectives and expected benefits laid out in the business case. The performance indicators are defined to be used for evaluation of desired outcome. |
Dataset | A tabular structure containing Target and Predictor time-series in columns. Individual rows contain the data for various time stamps. |
Data availability | In real life, not all values in data set are available for all timestamps. Typically, there is delay of getting actual values for target column, some predictors can contain predicted values up until end of prediction horizon (e.g. weather forecast, or information about public holidays), some are lagging. |
Dictionary | Set of transformations of the same type used for feature generation. In the process of expansion, TIM creates many new features from original variables (predictors and/or target) to enhance the final model's performance. |
Equidistant time series | Time series sampled at a constant rate. |
Experiment | When data scientist changes individual settings for model building definitions in order to get better performance of models built under it. |
Feature | In the process of expansion, TIM creates many new features from original variables to enhance the final model's performance. This is done through sets of common transformations (dictionaries) resulting in new features. After model building, the different features used in the model can be observed in the model's tree map. |
Feature engineering | The process of feature extraction from data to improve the performance of machine learning model. |
Forecast | Values calculated for future timestamps. Calculation is based on data set and model. |
Forecasting | Forecasting is the process of making predictions of the future based on past and present data and most commonly by analysis of trends. |
Forecasting scenario | See Forecasting situation. |
Forecasting situation | Data availability is very closely related to forecasting situation which is described by the following parameters: Timestamp at which you make forecast - this is the timestamp for which the last target value is available; Availability of data for each predictor with respect to the last target timestamp; Prediction horizon - how many steps ahead from the last target timestamp you are predicting. |
In-Sample | Interval of data set used for model building. |
Key Performance Indicator (KPI) | In anomaly detection, the variable that represents the main indicator of the system. |
Math settings | Parameters to TIM Engine that influence which transformations are used and how they are used. |
Model | A model is a representation of what machine learning system learned from data. It is a structure that is able to solve tasks such as prediction, classification, anomaly detection etc. |
Model ZOO | Set of different models working for different availability of data. |
Out-Of-Sample | Interval of data set that was not used for model building. Performance of a model is typically measured on out-of-sample data. |
Predictor | A variable in data set that helps to explain variance of Target variable. Model features are derived from predictors. |
Predictor candidate | A variable in data set that may or may not help to explain variance of Target variable. It will be decided by TIM Engine during model building whether it will be used for model features. |
Prediction horizon | Is a parameter that tells how many samples ahead to forecast, ranging from seconds ahead until months or years ahead, depending on your data and use case. |
Prediction start | Represents first point of a forecast (prediction). |
Predictive analytics | Predictive analytics is the branch of advanced analytics which is used for to make predictions about unknown events in the future. It typically uses many techniques from data mining, statistics, modeling, machine learning and AI to analyze current data and create forecasts of the future. |
Predictor availability | The difference between prediction start and the most recent value of a predictor. |
Prescriptive analytics | Prescriptive analytics is the area of business analytics dedicated to finding the best course of action for a given situation. Prescriptive analytics is related to both descriptive and predictive analytics. While descriptive analytics aims to provide insight into what has happened and predictive analytics helps model and forecast what might happen, prescriptive analytics seeks to determine the best solution or outcome among various choices, given the known parameters. |
REST API | REST or RESTful API design (Representational State Transfer) is designed to take advantage of existing protocols. While REST can be used over nearly any protocol, it usually takes advantage of HTTP(S) when used for Web APIs. |
Sample | One record (row) in data set. |
Sampling rate | Number of samples of equidistant time series per unit of time. |
Sampling period | Time difference between two consecutive samples of equidistant time series. |
Target | In forecasting, the variable to be predicted. In classification, it is a variable to be classified. Equivalent used in anomaly detection tasks is KPI. |
Target availability | The difference between prediction start and the most recent value of a target variable. |
Timestamp | A sequence of characters or encoded information identifying when a certain event occurred, usually giving date and time of day, sometimes accurate to a small fraction of a second. |
TIM Engine API | REST API that provides interface to the TIM model building engine. |
TIM Studio | TIM Studio is web application that offers an intuitive interface to TIM Engine. It allows users to: organize and explore datasets, experiment iteratively, inspect models, organize work with Use Cases. |
Time series | A time-series is a sequence of observations, usually ordered in time. Examples of time-series in a few domains - Meteorology: weather variables like temperature, pressure, wind; Economy and finance: GDP, stock price values, exchange rate spread; Industry: electric load, power consumption, voltage, sensors; Biomedicine: physiological signals (EEG), heart-rate, patient temperature, etc. |
Training region | Interval of data set used for model building. Synonymous with In-Sample interval. |
Training results | Typically values of evaluation metrics for Out-of-Sample and In-Sample intervals. |
Transformation | Mathematical operation applied to initial variables in order to get new features and extract more information from the data. |
Variable | Data in data set are organized in tabular form with one Timestamp column, one Target column and/or Predictor columns. Vector can be also used instead of column in mathematical terminology. Variable is either target vector, or predictor vector without any transformations applied to it. |