Configuration
The following subsections go through all available settings of an outlier detection job. Note that the parameters may overlap with the configuration of other jobs e.g. TIM Detect's KPI-driven anomaly detection.
The settings are divided into two main parts:
- configuration - mathematical configuration used to build model
- data - data preprocessing and configuration
Mathematical configuration
Configuration parameter | build-model | detect | default |
---|---|---|---|
Model complexity | ☑ | ☐ | 30 |
Sensitivity | ☑ | ☐ | 1 |
☑ available in a given method
☐ not available in a given method
"configuration": {
"maxModelComplexity": 30,
"sensitivity": 1
}
Model complexity
This setting defines the maximal complexity to search for when building the gaussian mixture model. Maximal model complexity can be set in the defined range of 1 – 30. Default is 30.
"maxModelComplexity": 30
Sensitivity
The sensitivity setting is a percentual number that defines the sensitivity of the underlying model to outliers. In general, the higher the sensitivity the more outliers are detected. If the parameter is not specified, TIM will use sensitivity of 1%. Read more about sensitivity and how it is connected with an anomaly indicator in the section about anomaly indicator.
"sensitivity": 1
Data configuration
Configuration parameter | build-model | detect | default |
---|---|---|---|
Version | ☑ | ☑ | Last version of the dataset |
Rows | ☑ | ☑ | All rows |
Columns | ☑ | ☐ | All columns / columns from model |
Time scale | ☑ | ☐ | Originally estimated from dataset |
☑ available in a given method
☐ not available in a given method
Version
This setting specifies id of dataset version which should be used for outlier detection. If not specified, last valid (successfully updated) version of dataset will be used.
"version": {
"id": "afdbb647-22cf-4576-8b82-4b71d4a10e5f"
}
Rows
The rows setting defines which samples should be used for model building or detecting (based on the used method). The user can specify the timestamps as an array of timestamp ranges.
"rows": [
{
"from": "2009-06-01 00:00:00",
"to": "2009-06-10 23:00:00"
},
{
"from": "2009-05-01 00:00:00",
"to": "2009-05-10 23:00:00"
}
]
Alternatively, a relative notation can be used, expressed as an integer number n with its base unit (one of Day, Hour, Minute, Second and Sample). This defines the length of the time range. The type of the relative range defines the start and the direction from which it is calculated. The Last starts from the last non-missing observation (the newest observation) in the dataset going backwards and the First starts from the first non-missing observation in the dataset (the oldest observation) going forward. If no type is specified, default value is Last.
"rows": {
"type": "Last",
"baseUnit": "Day",
"value": 2
}
Columns
This setting lists all columns (given either by their names or numbers) that should be used for model building. If not provided, TIM will use all available columns.
"columns": [5, "y"]
Time scale
This setting determines the rescaling of the original dataset to another sampling period. The baseUnit of the rescaling is limited to one of Day, Hour, Minute or Second). If not set, the original estimated sampling period will be used. Time scaling only works from lower sampling periods to higher sampling periods, and does not work for data sampled monthly.
"timeScale": {
"baseUnit": "Day",
"value": 2
}