Dataset Management
After installing, importing and initializing the Python client, the user is completely set to start utilizing the client's functionalities.
The first step to using TIM is to start working with datasets. This section explains how to use the TIM Python client to upload a dataset to the TIM repository.
upload_dataset - upload a dataset
upload_dataset(self, dataset: pandas.core.frame.DataFrame, configuration: tim.data_sources.dataset.types.UploadDatasetConfiguration = {}, wait_to_finish: bool = True, handle_status_poll: Optional[Callable[[tim.data_sources.dataset.types.DatasetStatusResponse], NoneType]] = None) -> Union[tim.data_sources.dataset.types.UploadDatasetResultsResponse, tim.data_sources.dataset.types.UploadDatasetResponse]
The upload_dataset
method serves to upload a dataset to the TIM repository. This method is called on the authenticated instance of Tim created as described in Authentication ("client"), like in the following statement:
client.upload_dataset(dataset = <dataset>, configuration = <configuration>, wait_to_finish = <wait to finish>, handle_status_poll = <callback function>)
using keyword arguments, or in the following statement:
client.upload_dataset(<dataset>, <configuration>, <wait to finish>, <callback function>)
using positional arguments, where <dataset>
and <configuration>
are replaced by the DataFrame and Dictionary representing them, respectively, <wait to finish>
is replaced by an optional boolean indicating whether to wait for the uploading to finish before returning, and <callback function>
is replaced by an optional callback function for status polling.
The arguments are:
- dataset: a DataFrame containing the dataset, which consists of time-series data
- configuration: a Dictionary containing metadata of the dataset. This is an optional argument, available keys are:
- timestampFormat: a string describing the format of the timestamps,
- timestampColumn: a string containing the name of the timestamp column, or an integer containing the index of the timestamp column,
- decimalSeparator: the decimal separator used,
- name: the desired name for the dataset in the TIM repository,
- description: the desired description for the dataset in the TIM repository,
- samplingPeriod: the sampling period of the data,
- wait_to_finish: a boolean indicating whether to wait for the updating to finish before returning; this is an optional parameter,
- handle_status_poll: a callback function handling polling for the status and progress of the dataset upload.
If wait_to_finish is set to True, this method returns the following data:
- metadata: a Dictionary if the upload was successful, containing the following keys:
- id: the ID of the uploaded dataset,
- name: the name of the uploaded dataset,
- description: the description of the uploaded dataset,
- isFavorite: a flag indicating whether this dataset is a favorite,
- estimatedSamplingPeriod: the estimated sampling period of this dataset,
- createdAt: the time of creation of this dataset,
- createdBy: the id of the user who created/uploaded this dataset,
- updatedAt: the time of the last update of this dataset (if applicable),
- updatedBy: the id of the user who last updated this dataset (if applicable),
- latestVersion: a Dictionary containing the following keys:
- id: the ID of the latest version of the dataset,
- status: the status of the latest version of the dataset, possible values are "Failed", "Finished" and "FinishedWithWarning",
- numberOfObservations: the number of observations in the latest version of the dataset,
- numberOfVariables: the number of variables in the latest version of the dataset,
- firstTimestamp: the timestamp of the first observation in the dataset,
- lastTimestamp: the timestamp of the last observation in the dataset,
- workspace: a Dictionary containing the following keys:
- id: the ID of the workspace in which the dataset resides,
- name: the name of the workspace in which the dataset resides;
- logs: a list of Dictionaries, each of which contain the following keys:
- message: the log message,
- messageType: the type of the message, possible values are "Info", "Debug" and "Warning",
- createdAt: the time of creation of the log,
- origin: the origin of the log, in this case this will be "Upload".
Upon succesful upload of the dataset, metadata will be populated; logs will be returned in any case, including failed uploads.
If wait_to_finish is set to False, this method returns a Dictionary with the following keys:
- id: the ID of the uploaded dataset,
- version: a Dictionary containing the key id, refering to the ID of the dataset version that was created by this upload.
If an error is encountered, a Dictionary will be returned with the keys message and code containing additional information about the error.
update_dataset - update a dataset by uploading a new version
update_dataset(self, dataset_id: str, dataset_version: pandas.core.frame.DataFrame, configuration: tim.data_sources.dataset.types.UpdateDatasetConfiguration = {}, wait_to_finish: bool = True, handle_status_poll: Optional[Callable[[tim.data_sources.dataset.types.DatasetStatusResponse], NoneType]] = None) -> Union[tim.data_sources.dataset.types.UploadDatasetResultsResponse, tim.data_sources.dataset.types.UpdateDatasetResponse]
The update_dataset
method serves to update a dataset in the TIM repository by uploading a new version. This method is called on the authenticated instance of Tim created as described in Authentication ("client"), like in the following statement:
client.update_dataset(dataset_id = <dataset ID>, dataset_version = <dataset version>, configuration = <configuration>, wait_to_finish = <wait to finish>, handle_status_poll = <callback function>)
using keyword arguments, or in the following statement:
client.upload_dataset(<dataset ID>, <dataset version>, <configuration>, <wait to finish>, <callback function>)
using positional arguments, where <dataset ID>
is replaced by the ID of the dataset to update, <dataset version>
and <configuration>
are replaced by the DataFrame and Dictionary representing them, respectively, <wait to finish>
is replaced by an optional boolean indicating whether to wait for the updating to finish before returning, and <callback function>
is replaced by an optional callback function for status polling.
The arguments are:
- dataset_id: the ID of the dataset to update,
- dataset_version: a DataFrame containing the dataset version, which consists of time-series data
- configuration: a Dictionary containing metadata of the dataset. This is an optional argument, available keys are:
- timestampFormat: a string describing the format of the timestamps,
- timestampColumn: a string containing the name of the timestamp column, or an integer containing the index of the timestamp column,
- decimalSeparator: the decimal separator used,
- wait_to_finish: a boolean indicating whether to wait for the updating to finish before returning; this is an optional parameter,
- handle_status_poll: a callback function handling polling for the status and progress of the dataset update.
This method returns the following data:
- metadata: a Dictionary if the update was successful, containing the following keys:
- id: the ID of the updated dataset,
- name: the name of the updated dataset,
- description: the description of the updated dataset,
- isFavorite: a flag indicating whether this dataset is a favorite,
- estimatedSamplingPeriod: the estimated sampling period of this dataset,
- createdAt: the time of creation of this dataset,
- createdBy: the id of the user who created/uploaded this dataset,
- updatedAt: the time of the last update of this dataset,
- updatedBy: the id of the user who last updated this dataset,
- latestVersion: a Dictionary describing the dataset version just uploaded, containing the following keys:
- id: the ID of the latest version of the dataset,
- status: the status of the latest version of the dataset, possible values are "Failed", "Finished" and "FinishedWithWarning",
- numberOfObservations: the number of observations in the latest version of the dataset,
- numberOfVariables: the number of variables in the latest version of the dataset,
- firstTimestamp: the timestamp of the first observation in the dataset,
- lastTimestamp: the timestamp of the last observation in the dataset,
- workspace: a Dictionary containing the following keys:
- id: the ID of the workspace in which the dataset resides,
- name: the name of the workspace in which the dataset resides;
- logs: a list of Dictionaries, each of which contain the following keys:
- message: the log message,
- messageType: the type of the message, possible values are "Info", "Debug" and "Warning",
- createdAt: the time of creation of the log,
- origin: the origin of the log, in this case this will be "Update".
Upon succesful update of the dataset, metadata will be populated; logs will be returned in any case, including failed updates.
If wait_to_finish is set to False, this method returns a Dictionary with the following keys:
- version: a Dictionary containing the key id, refering to the ID of the dataset version that was created by this update.
If an error is encountered, a Dictionary will be returned with the keys message and code containing additional information about the error.
delete_dataset - delete a dataset
delete_dataset(self, dataset_id: str) -> tim.types.ExecuteResponse
The delete_dataset
method deletes a dataset from the TIM repository. This method is called on the authenticated instance of Tim created as described in Authentication ("client"), like in the following statement:
client.delete_dataset(dataset_id = <dataset ID>)
using keyword arguments, or in the following statement:
client.delete_dataset(<dataset ID>)
using positional arguments, where <dataset ID>
is replaced by the ID of the dataset to delete.
The argument is:
- dataset_id: the ID of the dataset to delete.
This method returns a Dictionary with the following keys:
- message: a message indicating what has happened (the dataset has successfully been deleted),
- code: a code providing more information on this message; if the deletion was successful, this code will be "DM09038".
If an error is encountered, a similar Dictionary will be returned with the keys message and code containing additional information about the error.
get_datasets - retrieve a list of available datasets
get_datasets(self, offset: Optional[int] = None, limit: Optional[int] = None, workspace_id: Optional[str] = None, sort: Optional[tim.types.SortDirection] = None) -> List[tim.data_sources.dataset.types.Dataset]
The get_datasets
method retrieves a list of available datasets from the TIM repository. This method is called on the authenticated instance of Tim created as described in Authentication ("client"), like in the following statement:
client.get_datasets(offset = <offset>, limit = <limit>, workspace_id = <workspace ID>, sort = <sorting order>)
using keyword arguments, or in the following statement:
client.get_datasets(<offset>, <limit>, <workspace ID>, <sorting order>)
using positional arguments, where <offset>
, <limit>
, <workspace ID>
and <sorting order>
are replaced by the relevant values.
The arguments are:
- offset: the number of datasets to be skipped from the beginning of the list (related to pagination), this is an optional argument with a default value of 0,
- limit: the maximum number of datasets to be returned, this is an optional argument with a default value of 10000,
- workspace_id: a filter on the ID of the workspace a dataset resides in, this is an optional argument with a default value of None,
- sort: a sorting order to sort results by, possible values are "+createdAt" and "-createdAt", where "+" and "-" indicate ascending and descending order, respectively. This is an optional argument with a default value of "-createdAt" (most recently created datasets are returned first).
This method returns a list of Dictionaries, each of which include the following data:
- id: the ID of the dataset,
- name: the name of the dataset,
- description: the description of the dataset,
- isFavorite: a flag indicating whether this dataset is a favorite,
- estimatedSamplingPeriod: the estimated sampling period of this dataset,
- createdAt: the time of creation of this dataset,
- createdBy: the id of the user who created/uploaded this dataset,
- updatedAt: the time of the last update of this dataset (if applicable),
- updatedBy: the id of the user who last updated this dataset (if applicable),
- latestVersion: a Dictionary containing the following keys:
- id: the ID of the latest version of the dataset,
- status: the status of the latest version of the dataset, possible values are "Failed", "Finished" and "FinishedWithWarning",
- numberOfObservations: the number of observation in the latest version of the dataset,
- numberOfVariables: the number of variables in the latest version of the dataset,
- firstTimestamp: the timestamp of the first observation in the dataset,
- lastTimestamp: the timestamp of the last observation in the dataset,
- workspace: a Dictionary containing the following keys:
- id: the ID of the workspace in which the dataset resides,
- name: the name of the workspace in which the dataset resides.
get_dataset_versions - retrieve the list of versions of a dataset
get_dataset_versions(self, id: str, offset: Optional[int] = None, limit: Optional[int] = None) -> List[tim.data_sources.dataset.types.DatasetListVersion]
The get_dataset_versions
method retrieves the list of available dataset versions related to a specific dataset from the TIM repository. This method is called on the authenticated instance of Tim created as described in Authentication ("client"), like in the following statement:
client.get_dataset_versions(id = <dataset ID>, offset = <offset>, limit = <limit>)
using keyword arguments, or in the following statement:
client.get_dataset_versions(<dataset ID>, <offset>, <limit>)
using positional arguments, where <dataset ID>
, <offset>
and <limit>
are replaced by the relevant values.
The arguments are:
- id: the ID of the dataset from which to retrieve the versions,
- offset: the number of datasets to be skipped from the beginning of the list (related to pagination), this is an optional argument with a default value of 0,
- limit: the maximum number of datasets to be returned, this is an optional argument with a default value of 10000.
This method returns a list of Dictionaries, each of which include the following data:
- id: the ID of the dataset version,
- status: the status of the dataset version, possible values are "Failed", "Finished", "FinishedWithWarning" and "Registered",
- createdAt: the time of creation of this dataset version.