Configuration

The following subsections go through all available settings of an outlier detection job. Note that the parameters may overlap with the configuration of other jobs e.g. TIM Detect's KPI-driven anomaly detection.

The settings are divided into two main parts:

configuration - mathematical configuration used to build model
data - data preprocessing and configuration

Mathematical configuration

Configuration parameter	build-model	detect	default
Model complexity	☑	☐	30
Sensitivity	☑	☐	1

☑ available in a given method
☐ not available in a given method

"configuration": {
    "maxModelComplexity": 30,
    "sensitivity": 1
}

Model complexity

This setting defines the maximal complexity to search for when building the gaussian mixture model. Maximal model complexity can be set in the defined range of 1 – 30. Default is 30.

"maxModelComplexity": 30

Sensitivity

The sensitivity setting is a percentual number that defines the sensitivity of the underlying model to outliers. In general, the higher the sensitivity the more outliers are detected. If the parameter is not specified, TIM will use sensitivity of 1%. Read more about sensitivity and how it is connected with an anomaly indicator in the section about anomaly indicator.

"sensitivity": 1

Data configuration

Configuration parameter	build-model	detect	default
Version	☑	☑	Last version of the dataset
Rows	☑	☑	All rows
Columns	☑	☐	All columns / columns from model
Time scale	☑	☐	Originally estimated from dataset

☑ available in a given method
☐ not available in a given method

Version

This setting specifies id of dataset version which should be used for outlier detection. If not specified, last valid (successfully updated) version of dataset will be used.

"version": {
    "id": "afdbb647-22cf-4576-8b82-4b71d4a10e5f"
}

Rows

The rows setting defines which samples should be used for model building or detecting (based on the used method). The user can specify the timestamps as an array of timestamp ranges.

"rows": [
    {
        "from": "2009-06-01 00:00:00",
        "to": "2009-06-10 23:00:00"
    },
    {
        "from": "2009-05-01 00:00:00",
        "to": "2009-05-10 23:00:00"  
    }
]

Alternatively, a relative notation can be used, expressed as an integer number n with its base unit (one of Day, Hour, Minute, Second and Sample). This defines the length of the time range. The type of the relative range defines the start and the direction from which it is calculated. The Last starts from the last non-missing observation (the newest observation) in the dataset going backwards and the First starts from the first non-missing observation in the dataset (the oldest observation) going forward. If no type is specified, default value is Last.

"rows": {
    "type": "Last",
    "baseUnit": "Day",
    "value": 2
}

Columns

This setting lists all columns (given either by their names or numbers) that should be used for model building. If not provided, TIM will use all available columns.

"columns": [5, "y"]

Time scale

This setting determines the rescaling of the original dataset to another sampling period. The baseUnit of the rescaling is limited to one of Day, Hour, Minute or Second). If not set, the original estimated sampling period will be used. Time scaling only works from lower sampling periods to higher sampling periods, and does not work for data sampled monthly.

"timeScale": {
  "baseUnit": "Day",
  "value": 2
}

Mathematical configuration​

Model complexity​

Sensitivity​

Data configuration​

Version​

Rows​

Columns​

Time scale​