Metadata-Version: 2.1
Name: timeseer
Version: 0.4.10
Summary: Python SDK for Timeseer.AI
Author-email: "Timeseer.AI" <pypi@timeseer.ai>
License: Copyright Timeseer.AI 2023
Project-URL: Homepage, https://timeseer.ai/
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: kukur>=0.0.21
Requires-Dist: pyarrow

## Timeseer.AI Client

The Timeseer.AI Client is a Python SDK to access the functionality of [Timeseer.AI](https://www.timeseer.ai/).
Built on [Apache Arrow](https://arrow.apache.org/),
the SDK integrates natively with the [Pandas](https://pandas.pydata.org/) or [Polars](https://www.pola.rs/) ecosystems.

### Installing

The Timeseer.AI Client is available on [PyPI](https://pypi.org/project/timeseer/).

```
(venv) $ pip install timeseer
```

### Connecting

The Timeseer.AI Client relies on [Apache Arrow Flight](https://arrow.apache.org/docs/format/Flight.html) for highly efficient data transfers.

Communications are protected by an API key.
An API key can be generated within Timeseer under `Configure > API keys`.
Each API key has a name and a secret value that is shown only once.

The API key is used to create a connection to a Timeseer instance running at a specific host and port:

```
>>> from timeseer_client import *
>>> api_key=('<api-key-name>', '<api-key>')
>>> client = Client(api_key, host='localhost', port=8081)
```

### Functionality Overview

In Timeseer,
time series data is available through two concepts:

- Sources contain a varying number of time series that are constantly updated with new data.
- Data Sets contain a fixed number of time series in a specific time range.

Sources are typically used for continuous monitoring of data,
while Data Sets are the starting point for a data science project.

Time series data from Sources and Data Sets is processed by Flows.
Flows analyze data or create derived Data Sets.

Insights and data that is generated by Flows are made available through Data Services.

The Timeseer.AI Client represents each of these concepts as a separate class that exposes the functionality that is specific to that concept.
Each concept class is created by passing the `Client` to the constructor.

Full documentation is available in the code by running:

```
>>> import timeseer_client
>>> help(timeseer_client)
```

### Usage

This usage sample generates a sine wave in using Pandas and numpy.
Values below 0 of the sine wave are assumed to be the result of a faulty sensor reading.
It shows how Timeseer can be used to analyze this and how it automatically creates a derived data set.

First install Pandas:

```
(venv) $ pip install pandas
```

Generate the sine wave data:

```
>>> import numpy as np
>>> import pandas as pd
>>> ts = pd.date_range("2022-01-01T00:00:00Z", "2022-02-01T00:00:00Z", freq="H")
>>> values = np.round(10 * np.sin(2 * np.pi * ((ts.astype(np.int64) // 10**9) - ts[0].timestamp()) / (24*60*60)), decimals=2)
>>> df = pd.DataFrame(dict(ts=ts, value=values))
>>> df.head(20)
                          ts  value
0  2022-01-01 00:00:00+00:00   0.00
1  2022-01-01 01:00:00+00:00   2.59
2  2022-01-01 02:00:00+00:00   5.00
3  2022-01-01 03:00:00+00:00   7.07
4  2022-01-01 04:00:00+00:00   8.66
5  2022-01-01 05:00:00+00:00   9.66
6  2022-01-01 06:00:00+00:00  10.00
7  2022-01-01 07:00:00+00:00   9.66
8  2022-01-01 08:00:00+00:00   8.66
9  2022-01-01 09:00:00+00:00   7.07
10 2022-01-01 10:00:00+00:00   5.00
11 2022-01-01 11:00:00+00:00   2.59
12 2022-01-01 12:00:00+00:00  -0.00
13 2022-01-01 13:00:00+00:00  -2.59
14 2022-01-01 14:00:00+00:00  -5.00
15 2022-01-01 15:00:00+00:00  -7.07
16 2022-01-01 16:00:00+00:00  -8.66
17 2022-01-01 17:00:00+00:00  -9.66
18 2022-01-01 18:00:00+00:00 -10.00
19 2022-01-01 19:00:00+00:00  -9.66
```

Define a Timeseer API key in `Configure > API keys` and use it to create a `Client`:

```
>>> from timeseer_client import *
>>> client = Client(("<api key name>", "<api key>"), host='timeseer.example.org', port=8081)
```

Timeseer uses metadata to automatically profile a time series.
In this case, only the `physical lower limit` of the sensor that measured the time series is known,
which is `0`.

```
>>> from timeseer_client.metadata import fields
>>> series = SeriesSelector("Sines", {"function": "sine", "amplitude": "10"})
>>> metadata = Metadata(series, {fields.LimitLowPhysical: 0})
```

Each time series in Timeseer is identified by a `SeriesSelector`.
Each `SeriesSelector` has a source (`"Sines"`),
which will become the data set name,
and [tags and a field](https://docs.influxdata.com/influxdb/v2.6/reference/key-concepts/data-elements/).
This time series has the `"function"` and `"amplitude"` tags and the (default) `"value"` field.

For time series where additional structure is not available,
a `SeriesSelector` can also be created using a single `"series name"` tag:

```
>>> SeriesSelector("Sines", "sine-10") == SeriesSelector("Sines", {"series name": "sine-10"})
```

Profiling this time series can be done using the `profile` convenience function:

```
>>> profile(client, "Sines", [(metadata, df)])
[{'type': 'flow', 'name': 'Sines'}, {'type': 'data service', 'name': 'Sines'}, {'type': 'data set', 'name': 'Sines'}]
```

The `profile` function creates a Data Set, a Data Service and a Flow with the given name,
in this case `"Sines"`.
It also evaluates the flow.

Data should be provided as a `pyarrow.Table` or a Pandas `DataFrame`.

A Data Service summarizes the profiling results as Statistics and Event Frames.

Event Frames define a time range where something interesting has been detected.

```
>>> data_services = DataServices(client)
>>> data_service = DataServiceSelector('Sines', 'Sines')
>>> event_frames = data_services.get_event_frames(data_service)
>>> event_frames.to_pandas()['type'].value_counts()
compression - linear undercompression    61
Out of bounds (lower, physical)          31
Values below zero                        31
Upper limit is present                    1
Interpolation type is present             1
Compression - flat archival rate          1
Description is present                    1
Unit is present                           1
Name: type, dtype: int64
```

Not all profiling results are issues.
In this case we can safely ignore the 'linear undercompression' events.
The 'Out of bounds (lower, physical)' event frames cannot be ignored though,
as was mentioned earlier.

Statistics can be used to gain high-level insight into the data and explain the Event Frames:

```
>>> data_services.get_statistics(data_service, series)
[... Statistic(name='Value statistics', data_type='table', result=[['Min', -10.0], ['Max', 10.0], ['Mean', 4.775152794086695e-18], ['Median', 0], ['Std', 7.073308943835715]]) ...]
```

It is clear (and expected based on the data generation) that the `Out of bounds (lower, physical)` Event Frames occur because the minimum value is `-10.0`.

Timeseer can automatically correct the data to be within bounds using various strategies.
To create derived data in periods where an Event Frame is detected,
a "filter" Block in a Flow on that event frame type needs to be inserted.

The derived data can be stored in a few ways.
It is possible to create another Data Set, for example.
Storing them in a Data Service instead will allow verification that the problem has been resolved,
as data is stored there alongside quality indicators.

There is no shorthand for data cleaning,
as each case will require different action.
The most readable way to define the Flow that will create the derived data is in YAML.

Create `sine-derive.yml`:

```yaml
---

- type: data service
  name: Derived sine results
  kpiSet: Data quality fundamentals
  range:
    start: "2022-01-01T00:00:00Z"
    end: "2022-02-01T00:00:00Z"

- type: flow
  name: Create derived sine
  dataSet: Sines
  blocks:

  - name: Analyze time series
    type: analysis

  - name: Hold last value when out of bounds
    type: filter
    augmentationStrategy: hold last value
    filters:
    - type: univariate
      filter: "Out of bounds (lower, physical)"
      series: ALL

  - name: Analyze derived time series
    type: analysis

  - name: Keep results for derived series in Derived sine results data service
    type: data_service_contribute
    dataServiceName: Derived sine results
    contributionBlockNames: [Analyze derived time series]
```

The `Resources` and `Flows` classes allow creating resources and evaluating flows respectively.

```
>>> resources = Resources(client)
>>> resources.create(path="sine-derive.yml")
>>> flows = Flows(client)
>>> flows.evaluate("Create derived sine")
```

The derived data has been profiled by the Flow.
Profiling results are available in the `"Derived sine results"` Data Service:

```
>>> derived_data_service = DataServiceSelector('Derived sine results', 'Sines')
>>> event_frames = data_services.get_event_frames(derived_data_service)
>>> event_frames.to_pandas()['type'].value_counts()
compression - linear undercompression    31
Compression - flat archival rate          1
Interpolation type is present             1
Unit is present                           1
Description is present                    1
Upper limit is present                    1
Name: type, dtype: int64
```

The derived data does no longer contain values below 0:

```
>>> derived_data = data_services.get_data(derived_data_service, series)
>>> derived_data.to_pandas().head(26)
                           value
ts
2022-01-01 00:00:00+00:00   0.00
2022-01-01 01:00:00+00:00   2.59
2022-01-01 02:00:00+00:00   5.00
2022-01-01 03:00:00+00:00   7.07
2022-01-01 04:00:00+00:00   8.66
2022-01-01 05:00:00+00:00   9.66
2022-01-01 06:00:00+00:00  10.00
2022-01-01 07:00:00+00:00   9.66
2022-01-01 08:00:00+00:00   8.66
2022-01-01 09:00:00+00:00   7.07
2022-01-01 10:00:00+00:00   5.00
2022-01-01 11:00:00+00:00   2.59
2022-01-01 12:00:00+00:00  -0.00
2022-01-01 13:00:00+00:00  -0.00
2022-01-01 14:00:00+00:00  -0.00
2022-01-01 15:00:00+00:00  -0.00
2022-01-01 16:00:00+00:00  -0.00
2022-01-01 17:00:00+00:00  -0.00
2022-01-01 18:00:00+00:00  -0.00
2022-01-01 19:00:00+00:00  -0.00
2022-01-01 20:00:00+00:00  -0.00
2022-01-01 21:00:00+00:00  -0.00
2022-01-01 22:00:00+00:00  -0.00
2022-01-01 23:00:00+00:00  -0.00
2022-01-02 00:00:00+00:00   0.00
2022-01-02 01:00:00+00:00   2.59
```

This only scratches the surface of the functionality in Timeseer.
Learn more in the `Help` menu in the user interface.
All resources, blocks in Flows and event frame types are thoroughly documented.
