plans.analyst#

Classes designed to handle statistical analysis.

Overview#

# todo [major docstring improvement] – overview Mauris gravida ex quam, in porttitor lacus lobortis vitae. In a lacinia nisl. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas.

Example#

# todo [major docstring improvement] – examples Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla mollis tincidunt erat eget iaculis. Mauris gravida ex quam, in porttitor lacus lobortis vitae. In a lacinia nisl.

import numpy as np
from plans import analyst

# get data to a vector
data_vector = np.random.rand(1000)

# instantiate the Univar object
uni = analyst.Univar(sample=data_vector, name="my_data")

# view sample
uni.view()

Mauris gravida ex quam, in porttitor lacus lobortis vitae. In a lacinia nisl.

Functions

linear(x, c0, c1)

Linear function f(x) = c0 + c1 * x

power(x, c0, c1, c2)

Power function f(x) = c2 * ((x + c0)^c1)

power_zero(x, c0, c1)

Power function with root in zero f(x) = c1 * ((x)^c0)

Classes

Bayes(df_hypotheses[, name, nomenclature, ...])

The Bayes Theorem Analyst Object

Bivar(df_data[, x_name, y_name, name])

The Bivariate analyst object

GeoUnivar([name, alias])

Univar([name, alias])

The Univariate object

plans.analyst.linear(x, c0, c1)[source]#

Linear function f(x) = c0 + c1 * x

Parameters:
  • x (float | numpy.ndarray) – function inputs

  • c0 (float) – translational parameter

  • c1 (float) – scaling parameter

Returns:

function output

Return type:

float | numpy.ndarray

plans.analyst.power(x, c0, c1, c2)[source]#

Power function f(x) = c2 * ((x + c0)^c1)

Parameters:
  • x (float | numpy.ndarray) – function inputs

  • c0 (float) – translational parameter

  • c1 (float) – exponent parameter

  • c2 (float) – scaling parameter

Returns:

function output

Return type:

float | numpy.ndarray

plans.analyst.power_zero(x, c0, c1)[source]#

Power function with root in zero f(x) = c1 * ((x)^c0)

Parameters:
  • x (float | numpy.ndarray) – function inputs

  • c0 (float) – exponent parameter

  • c1 (float) – scaling parameter

Returns:

function output

Return type:

float | numpy.ndarray

class plans.analyst.Univar(name='MyUnivar', alias='Uv0')[source]#

Bases: DataSet

The Univariate object

__init__(name='MyUnivar', alias='Uv0')[source]#

Initialize the DataSet object.

Parameters:
  • name (str) – unique object name

  • alias (str) – unique object alias. If None, it takes the first and last characters from name

_set_view_specs()[source]#

Set view specifications. Expected to overwrite superior methods.

Returns:

None

Return type:

None

load_data(file_data)[source]#

Load data from file. Expected to overwrite superior methods.

Parameters:

file_data (str) – file path to data.

Returns:

None

Return type:

None

set_array(array)[source]#

Set array to data

Parameters:

array (numpy.ndarray) – Numpy array

Returns:

None

Return type:

None

update()[source]#

Refresh all mutable attributes based on data (including paths).

assess_normality(clevel=0.95)[source]#

Assessment on normality using standard tests

Returns:

dataframe of assessment results

Return type:

pandas.DataFrame

assess_frequency()[source]#

Assessment on data frequencies

Returns:

result dataframe

Return type:

pandas.DataFrame

assess_basic_stats()[source]#

Assesses basic statistics of the variable field.

Returns:

DataFrame with statistics (Count, Sum, Mean, SD, Min, percentiles, Max) and their values.

Return type:

pandas.DataFrame

assess_weibull_cdf()[source]#

Get the Weibull model

Parameters:

x (numpy.ndarray) – function inputs

Returns:

model dataframe

Return type:

pandas.DataFrame

assess_gumbel_cdf()[source]#

Assess the Gumbel CDF for the data (it assumes that is a maxima dataset)

Returns:

multiple results from the assessment

Return type:

dict

plot_hist(bins=100, colored=False, annotated=False, rule=None, show=False, folder='C:/sample', filename='histogram', specs=None, dpi=300)[source]#

Plot histogram of sample

Parameters:
  • bins (int) – number of bins

  • colored (bool) – Boolean to quantile-colored histogram

  • annotated (bool) – Boolean to plot stats texts over histogram

  • rule (str) – name of rule to compute bins

  • show (bool) – Boolean to show instead of saving

  • folder (str) – output folder

  • filename (str) – image file name

  • specs (dict) – specification dictionary

  • dpi (int) – image resolution (default = 96)

plot_qqplot(show=True, folder='C:/sample', filename='qqplot', specs=None, dpi=300)[source]#

Plot Q-Q Plot on Normal distribution

Parameters:
  • show (bool) – Boolean to show instead of saving

  • folder (str) – output folder

  • filename (str) – image file name

  • specs (dct) – specification dictionary

  • dpi (int) – image resolution (default = 300)

Returns:

None

Return type:

None

_plot(fig, gs, specs)[source]#
_build_axes(fig, gs, specs)[source]#
_get_fig_specs()[source]#
view(show=True, return_fig=False)[source]#

Get the basic visualization.

Parameters:

show (bool) – option for showing instead of saving.

Note

Uses values in the view_specs() attribute for plotting.

static plot_mean(ax, y_mu, xmin, xmax)[source]#
static plot_scatter(data, ax, ylim, specs, x_factor=4, formatter=None)[source]#
static plot_histh(data, ax, ylim, specs, formatter=None)[source]#
static plot_cdf(cdf_df, ax, ylim, specs, formatter=None)[source]#
static plot_stats(fig, stats_df, x=0.5, y=0.4)[source]#
static plot_cbar(data, ax, scheme, cmap='viridis', n_classes=5, side='right', width_factor=16)[source]#
static test_distribution(test_name, stat, p, clevel=0.95, distr='normal')[source]#

Util function for statistical testing

Parameters:
  • test_name (str) – name of test

  • stat (float) – statistic

  • p (float) – p-value

  • clevel (float) – confidence level

  • distr (str) – name of distribution

Returns:

summary of test

Return type:

dict

static test_normality_ks(data, clevel=0.95)[source]#

Test for normality using the Kolmogorov-Smirnov test

Kolmogorov-Smirnov Test: This test compares the observed distribution with the expected normal distribution using a test statistic and a p-value. A p-value less than 0.05 indicates that the null hypothesis should be rejected.

Parameters:

data (numpy.ndarray) – vector of data without nan values

Returns:

test result dictionary. Keys: Statistic, p-value and Is normal

Return type:

dct

static test_normality_sw(data, clevel=0.95)[source]#

Test for normality using the Shapiro-Wilk test.

Parameters:

data (numpy.ndarray) – vector of data without nan values

Returns:

test result dictionary. Keys: Statistic, p-value and Is normal

Return type:

dct

static test_normality_dp(data, clevel=0.95)[source]#

Test for normality using the D’Agostino-Pearson test.

Parameters:

data (numpy.ndarray) – vector of data without nan values

Returns:

test result dictionary. Keys: Statistic, p-value and Is normal

Return type:

dct

static get_tx(fx)[source]#

Simple function for computing the Return Period from the CDF

Parameters:

fx (float | numpy.ndarray) – CDF (FX)

Returns:

Return Period

Return type:

float | numpy.ndarray

static gumbel_fx(x, a, b)[source]#

Gumbel probability distribution F(X)

Parameters:
  • x (float | numpy.ndarray) – function inputs

  • a (float) – distribution parameter a

  • b (float) – distribution parameter b

Returns:

function output

Return type:

float | numpy.ndarray

static gumbel_tx(x, a, b)[source]#

Gumbel return period distribution T(X)

Parameters:
  • x (float | numpy.ndarray) – function inputs

  • a (float) – distribution parameter a

  • b (float) – distribution parameter b

Returns:

function output

Return type:

float | numpy.ndarray

static gumbel_freqfactor(tx=2)[source]#

Gumbel Frequency Factor K(T)

Parameters:

tx (float | numpy.ndarray) – return period T

Returns:

function output

Return type:

float | numpy.ndarray

static gumbel_se(std_sample, n_sample, tx)[source]#

Gumbel Standard Error for the MM fitted Gumbel function

Parameters:
  • std_sample (float) – sample standard deviation

  • n_sample (int) – sample size

  • tx (float | numpy.ndarray) – return period T

Returns:

function output

Return type:

float | numpy.ndarray

static empirical_px(ranks)[source]#

Get the empirical exceedance probability P(X)

Parameters:

ranks (class:numpy.array) – vector of ranks

Returns:

empirical exceedance probability P(X)

Return type:

class:numpy.array

static weibull_px(ranks)[source]#

Get the Weibull exceedance probability P(X)

Parameters:

ranks (class:numpy.array) – vector of ranks

Returns:

Weibull exceedance probability P(X)

Return type:

class:numpy.array

static gringorten_px(ranks)[source]#

Get the Gringorten exceedance probability P(X)

Parameters:

ranks (class:numpy.array) – vector of ranks

Returns:

Gringorten exceedance probability P(X)

Return type:

class:numpy.array

static sample_gamma(size, shape=3, scale=1, shape_mode=None, n_min=None, n_max=None)[source]#

Sample Gamma distribution

Parameters:
  • size (int) – Sample size

  • shape (float) – Shape parameter

  • scale (float) – Scale parameter

  • shape_mode (str or None) – Shape mode (overwrites shape value). See dict for options

  • n_min (float or None) – Normalization lower value

  • n_max (float or None) – Normalization upper value

Returns:

Sampled array

Return type:

numpy.ndarray

static nbins_fd(data)[source]#

This function computes the number of bins for histograms using the Freedman-Diaconis rule, which takes into account the interquartile range (IQR) of the sample, in addition to its range.

Parameters:

data (numpy.ndarray) – vector of data without nan values

Returns:

number of bins for histogram using the Freedman-Diaconis rule

Return type:

int

static nbins_sturges(data)[source]#

This function computes the number of bins using the Sturges rule, which assumes that the data follows a normal distribution and computes the number of bins based on its data runsize.

Parameters:

data (numpy.ndarray) – vector of data without nan values

Returns:

number of bins using the Sturges rule

Return type:

int

static nbins_scott(data)[source]#

This function computes the number of bins using the Scott rule, which is similar to the Freedman-Diaconis rule, but uses the standard deviation of the data to compute the bin runsize.

Parameters:

data (numpy.ndarray) – vector of data without nan values

Returns:

number of bins using the Scott rule

Return type:

int

static nbins_by_rule(data, rule=None)[source]#

Util function for rule-based nbins computation

Parameters:
  • data (numpy.ndarray) – vector of data without nan values

  • rule (str) – rule code (sturges, fd, scott)

Returns:

number of bins for histogram

Return type:

int

get_histogram(bins=None, rule=None)[source]#
static histogram(data, bins=100, rule=None)[source]#

Compute the histogram of the sample

Parameters:
  • data (numpy.ndarray) – vector of data without nan values

  • bins (int) – number of bins

  • rule (str) – rule to define the number of bins. If ‘none’, uses bins parameters.

Returns:

dataframe of histogram

Return type:

pandas.DataFrame

static qqplot(data)[source]#

Calculate the QQ-plot of data against normal distribution

Parameters:

data (numpy.ndarray) – vector of data without nan values

Returns:

dataframe of QQ plot

Return type:

pandas.DataFrame

static trace_variance(data)[source]#

Trace the mean variance from sample

Parameters:

data (numpy.ndarray) – vector of data without nan values

Returns:

dataframe of accumulated variance

Return type:

pandas.DataFrame

static get_bins(data, n_bins=5, scheme='equal')[source]#
static bins_equal(data, n_bins)[source]#

Calculates equally spaced (linear) bins for a 1D NumPy array.

Parameters:
  • data (numpy.ndarray) – The 1D NumPy array to analyze.

  • n_bins (int) – The desired number of bins/intervals.

Returns:

A NumPy array containing the linear bin boundary values.

Return type:

numpy.ndarray

static bins_quantiles(data, n_bins=5)[source]#

Calculates the numerical quantile boundaries (bins) for a 1D NumPy array.

Parameters:
  • data (numpy.ndarray) – The 1D NumPy array to analyze.

  • n_bins (int) – The number of quantile classes.

Returns:

A NumPy array containing the quantile boundary values.

Return type:

numpy.ndarray

static classify(data, bins=None, n_classes=5, scheme='equal')[source]#

Classifies a 1D NumPy array into 0-indexed bins based on provided bin boundaries.

Values x are assigned to class i if bins[i] < x <= bins[i+1]. Values less than or equal to bins[0] are assigned to class 0. Values greater than bins[-1] are assigned to the last class (len(bins) - 2).

Parameters:
  • data (numpy.ndarray) – The 1D NumPy array to classify.

  • bins (numpy.ndarray) – A 1D NumPy array specifying the sorted bin edges. E.g., [min_val, boundary1, boundary2, …, max_val]. Must contain at least two elements.

Returns:

A 1D NumPy array of integers representing the 0-indexed class for each element in data_array.

Return type:

numpy.ndarray

static quantiles_classify(data, n_classes=5)[source]#

Classifies a 1D NumPy array into quantile classes.

Parameters:
  • data (numpy.ndarray) – The 1D NumPy array to classify.

  • n_classes (int) – The number of quantile classes.

Returns:

A 1D NumPy array containing the quantile class (0 to n_classes-1) for each element.

Return type:

numpy.ndarray

class plans.analyst.GeoUnivar(name='MyGeoUnivar', alias='GV0')[source]#

Bases: Univar

__init__(name='MyGeoUnivar', alias='GV0')[source]#

Initialize the DataSet object.

Parameters:
  • name (str) – unique object name

  • alias (str) – unique object alias. If None, it takes the first and last characters from name

_set_fields()[source]#

Set fields names. Expected to increment superior methods.

_set_view_specs()[source]#

Set view specifications. Expected to overwrite superior methods.

Returns:

None

Return type:

None

load_data(file_data, layer_name)[source]#

Load data from file. Expected to overwrite superior methods.

Parameters:

file_data (str) – file path to data.

Returns:

None

Return type:

None

_build_axes(fig, gs, specs)[source]#
_get_fig_specs()[source]#
_plot(fig, gs, specs)[source]#
static plot_map(data, ax, column, specs, legend=False)[source]#
class plans.analyst.Bivar(df_data, x_name='x', y_name='y', name='myvars')[source]#

Bases: object

The Bivariate analyst object

# [major docstring]

__init__(df_data, x_name='x', y_name='y', name='myvars')[source]#
fit(model_type='Linear')[source]#

Fit model to bivariate object

Parameters:

model_type (str) – model type. options: Linear, Power, Power_zero

Returns:

None

Return type:

None

update_model(params_mean, params_sd=None, model_type='Linear')[source]#

Update model based on parameters

Parameters:
  • params_mean (list) – list of mean of parameters

  • params_sd (list) – list of standard deviation of parameters

  • model_type (str) – model type. options: Linear, Power, Power_zero

Returns:

None

Return type:

None

updata_model_data(model_type='Linear')[source]#

Update only model data output

Parameters:

model_type (str) – model type. options: Linear, Power, Power_zero

Returns:

None

Return type:

None

view(show=True, folder='C:/sample', filename='view', specs=None, fig_format='jpg', dpi=300)[source]#

Plot basic view of Bivar base_object

Parameters:
  • show (bool) – Boolean to show instead of saving

  • folder (str) – output folder

  • filename (str) – image file name

  • specs (dict) – specification dictionary

  • dpi (int) – image resolution (default = 300)

  • fig_format (str) – image fig_format (ex: png or jpg). Default jpg

Returns:

None

Return type:

None

view_model(model_type='Power', show=True, folder='C:/sample', filename=None, specs=None, dpi=300, fig_format='jpg')[source]#

Plot pannel for model analysis

Parameters:
  • show (bool) – Boolean to show instead of saving

  • folder (str) – output folder

  • filename (str) – image file name

  • specs (dict) – specification dictionary

  • dpi (int) – image resolution (default = 300)

  • fig_format (str) – image fig_format (ex: png or jpg). Default jpg

Returns:

None

Return type:

None

correlation()[source]#

Compute the R correlation coefficient of the base_object

Returns:

R correlation coefficient

Return type:

float

prediction_bands(lst_bounds=None, n_sim=100, n_grid=100, n_seed=None, p0=None)[source]#

Run Monte Carlo Simulation to get prediciton bands

Parameters:
  • lst_bounds (list) – list of prediction bounds [min, max], if None, 3x the date range is used

  • n_sim (int) – number of simulation runs

  • n_grid (int) – number of prediction intervals

  • n_seed (int) – number of random seed for reproducibility. Default = None

  • p0 (list) – list of initial values to search. Default: None

Returns:

base_object with result dataframes

Return type:

dict

assess_error_normality(model_type='Linear', clevel=0.95)[source]#
static bias(pred, obs)[source]#

Compute the Bias between predicted and observated sample

Parameters:
  • pred (numpy.ndarray) – numpy.ndarray vector of prediction vales

  • obs (numpy.ndarray) – numpy.ndarray vector of observed values

Returns:

Bias value

Return type:

float

static rmse(pred, obs)[source]#

Compute the RMSE metric between predicted and observated sample

Parameters:
  • pred (numpy.ndarray) – numpy.ndarray vector of prediction vales

  • obs (numpy.ndarray) – numpy.ndarray vector of observed values

Returns:

RMSE value

Return type:

float

static mae(pred, obs)[source]#

Compute the Mean Absolute Error (MAE) between predicted and observated sample

Parameters:
  • pred (numpy.ndarray) – numpy.ndarray vector of prediction vales

  • obs (numpy.ndarray) – numpy.ndarray vector of observed values

Returns:

MAE value

Return type:

float

static rsq(pred, obs)[source]#

Compute the R-square between predicted and observated sample

Parameters:
  • pred (numpy.ndarray) – numpy.ndarray vector of prediction vales

  • obs (numpy.ndarray) – numpy.ndarray vector of observed values

Returns:

R-square value

Return type:

float

class plans.analyst.Bayes(df_hypotheses, name='myBayes', nomenclature=None, gridsize=100)[source]#

Bases: object

The Bayes Theorem Analyst Object

__init__(df_hypotheses, name='myBayes', nomenclature=None, gridsize=100)[source]#

Deploy the Bayes Analyst :param df_hypotheses: dataframe listing all model hypotheses. Must contain a field Name (of parameter), Min and Max. :type df_hypotheses: :class:pandas.DataFrame :param name: name of analyst :type name: str :param nomenclature: dictionary for rename nomenclature :type nomenclature: dict :param gridsize: grid resolution in histrograms (bins) :type gridsize: int

_reset_nomenclature(dct_names)[source]#

Reset nomenclature :param dct_names: dictionary to rename nomenclatures :type dct_names: dict :return: None :rtype: None

_insert_new_step()[source]#

convenience void function for inserting new step objects :return: None :rtype:

_accumulate(n_step)[source]#

convenience void function for accumulate probability :param n_step: step number to accumulate :type n_step: int :return: None :rtype: none

conditionalize(dct_evidence, s_varfield='E', s_weightfield='W')[source]#

Conditionalize procedure of the Bayes Theorem :param dct_evidence: base_object of evidence dataframes :type dct_evidence: dict :param s_varfield: name of variable field in evidence dataframes :type s_varfield: str :param s_weightfield: name of weights field in evidence dataframes :type s_weightfield: str :return: None :rtype: none

plot_step(n_step, folder='C:/sample', filename='bayes', specs=None, dpi=300, show=False)[source]#

Void function for plot pannel of conditionalization step :param n_step: step number :type n_step: int :param folder: export folder :type folder: str :param filename: file name :type filename: str :param specs: plot specs base_object :type specs: dict :param dpi: plot resolution :type dpi: int :param show: control to show plot instead of saving :type show: bool :return: None :rtype: None