Time series data - the basics#

This tutorial focuses on working with basic time series data management and analysis using plans.

Notebook setup#

For users running this tutorial as a Jupyter Notebook, this cell must be executed first:

import sys
from pathlib import Path
import pprint
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Install `plans` in `google.colab`.
# Use `pip install plans` for other environments.

if "google.colab" in sys.modules:
    import os
    os.system(f"{sys.executable} -m pip install -q plans")

# This avoids warnings related to uninstalled fonts
import logging
# Set the matplotlib font manager logger to only show errors (hides warnings)
logging.getLogger('matplotlib.font_manager').setLevel(logging.ERROR)

# define output folder
OUTPUT_DIR = Path("outputs/time-series")
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
print(f"Outputs will be saved to: ./{OUTPUT_DIR}")
Outputs will be saved to: ./outputs/time-series

The TimeSeries object#

The TimeSeries object is a very primitive class that lives under plans.datasets module. This object is a child from the Univar object that lives in plans.analyst module.

The TimeSeries stores all core methods for working with time series, incluing standardization.

Import TimeSeries object:

from plans.datasets import TimeSeries

Create an instance of the TimeSeries:

ts = TimeSeries(name="Testing", alias="tst")

Check out the ts variable type:

print(type(ts))
<class 'plans.datasets.core.TimeSeries'>

Attributes work the same way as in the Univar object:

ts.units = "cm"
ts.description = "Just a tutorial"
print(ts)
[Testing (tst)]
TimeSeries (DS):	<class 'plans.datasets.core.TimeSeries'>
                   field           value
                    name         Testing
                   alias             tst
                    size            None
                   color          orange
                  source            None
             description Just a tutorial
                   units              cm
                    code            None
                       x            None
                       y            None
          variable_field               v
           variable_name        Variable
          variable_alias             Var
      variable_range_min            None
      variable_range_max            None
            variable_min            None
            variable_max            None
          datetime_field        datetime
           datetime_freq            None
            datetime_res            None
                   start            None
                     end            None
             is_standard           False
                gap_size               6
                epochs_n            None
            small_gaps_n            None
file_data_datetime_field        datetime
file_data_variable_field               v
               file_data            None
Data:
None

Working with perfect data#

A perfect data for time series means that there are no time gaps, so it does not need standardization.

Create synthetic time series data#

Lets first make a perfect time series using .make_synthetic_tsn() method and save it to a CSV file. This method makes a Trend-Seasonality-Noise archetype time-series:

# make synthetic TSN (Trend-Seasonality-Noise) time-series
df = TimeSeries.make_synthetic_tsn(
    start="2020-01-01",
    end="2026-01-01",
    base=100,
    freq="3h",
    trend=0.002,
    noise_sd=3.0,
    amplitude=50,
    seasonal_period="YS",
    minor_amplitude=20,
    minor_seasonal_period="D"
)

# Export CSV file
file_csv = OUTPUT_DIR / "time_series.csv"
df.to_csv(file_csv, sep=";", index="False")
print(f"Saved to: {file_csv}")
Saved to: outputs/time-series/time_series.csv

The whole time series looks like this:

# get simple visualization
plt.plot(df['datetime'], df['level'])
plt.ylim([0, 250])
plt.show()
../_images/d35dd1ed1fece2d5e8cdd83d48bd0aeee3e37e11456bea713cd5f4d410307852.png

A zoom to the month scale:

# get simple visualization
plt.plot(df['datetime'], df['level'])
plt.xlim(pd.to_datetime(["2020-01-01 00:00:00", "2020-02-01 00:00:00"]))
plt.show()
../_images/b34844e6af56b01b629328c75063435e0c1073b0d4c5cc9b7b1233c8e5598b83.png

Loading data from the CSV file#

Call the .load_data() method for loading from CSV file:

# reset the ts variable
ts = TimeSeries(name="Testing", alias="tst")

ts.load_data(
    file_data=file_csv,  # file path
    input_dtfield="datetime", # name of datetime field
    input_varfield="level",  # name of variable
    in_sep=";",  # input separator
    filter_dates=["2020-01-01", "2020-03-01"]  # filter dates
)

Data is stored in the .data attribute:

ts.data
datetime v
0 2020-01-01 00:00:00 100.304930
1 2020-01-01 03:00:00 112.234876
2 2020-01-01 06:00:00 116.749649
3 2020-01-01 09:00:00 117.576521
4 2020-01-01 12:00:00 101.507758
... ... ...
475 2020-02-29 09:00:00 158.115923
476 2020-02-29 12:00:00 138.765769
477 2020-02-29 15:00:00 133.442230
478 2020-02-29 18:00:00 126.017321
479 2020-02-29 21:00:00 131.168134

480 rows × 2 columns

Visualizations#

Most plans. objects comes with built-in methods for getting visualizations, both inline and figure output.

See also

Check out more about visualizations on the Visualizations - the basics tutorial.

Standard visualization#

Get the standard visual using the .view() method

ts.view()
../_images/2b494a845c985ed0357904fedab61352770e245511b04f03a796b6101f0ea549.png

Fine-tuning visual items#

Fine-tune plot specs by editing the .view_specs attribute dictionary

# reset view_specs
ts._set_view_specs()
# edit specs
# color of the main line
ts.view_specs["color"] = "blue"
ts.view_specs["color_hist"] = "green"

# Style of line
ts.view_specs["drawstyle"] = "steps-mid"

# Labels
ts.view_specs["ylabel"] = f"Level ({ts.units})"
ts.view_specs["xlabel"] = "Date"

# Titles
ts.view_specs["title"] = "Hello! This is a Tutorial!"
ts.view_specs["subtitle_data"] = r"$\bf{a}$  Time Series"
ts.view_specs["subtitle_hist"] = r"$\bf{b}$  Histogram"
ts.view_specs["subtitle_cdf"] = r"$\bf{c}$  CDF"

# Axis
ts.view_specs["range"] = [0, 200.0]

# Number of dates in the X axis
ts.view_specs["n_dates"] = 7

# Call view() again
ts.view()
../_images/5d75bfa8a7c90e395bf9511f5112d5cd811623d7bb00bada81a017c6a6c51872.png

Layouts available#

List available layouts:

print(ts.layouts.keys())
dict_keys(['full', 'mini', 'simple', 'simple-shallow', 'default'])
ts.view_specs["layout"] = "full"
ts.view()
../_images/5d75bfa8a7c90e395bf9511f5112d5cd811623d7bb00bada81a017c6a6c51872.png
ts.view_specs["layout"] = "mini"
ts.view()
../_images/428e1a3271826d8188e371cd09c355d9fcd10006cc1993148fb7881c1732233b.png
ts.view_specs["layout"] = "simple"
ts.view()
../_images/c15366e1fd8bcdbc7e150edff4a2f644e465d829e808ca5ccdc58b9d637ea322.png
ts.view_specs["layout"] = "simple-shallow"
ts.view()
../_images/309616fdc7e9792a5b3a08d11270ac9493d6c3d40921fa3c6b88ca229f61ef38.png