plans.analyst#

Classes designed to handle statistical analysis.

Overview#

# todo [major docstring improvement] – overview Mauris gravida ex quam, in porttitor lacus lobortis vitae. In a lacinia nisl. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas.

Example#

# todo [major docstring improvement] – examples Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla mollis tincidunt erat eget iaculis. Mauris gravida ex quam, in porttitor lacus lobortis vitae. In a lacinia nisl.

import numpy as np
from plans import analyst

# get data to a vector
data_vector = np.random.rand(1000)

# instantiate the Univar object
uni = analyst.Univar(sample=data_vector, name="my_data")

# view sample
uni.view()

Mauris gravida ex quam, in porttitor lacus lobortis vitae. In a lacinia nisl.

Functions

`linear`(x, c0, c1)	Linear function f(x) = c0 + c1 * x
`power`(x, c0, c1, c2)	Power function f(x) = c2 * ((x + c0)^c1)
`power_zero`(x, c0, c1)	Power function with root in zero f(x) = c1 * ((x)^c0)

Classes

`Bayes`(df_hypotheses[, name, nomenclature, ...])	The Bayes Theorem Analyst Object
`Bivar`(df_data[, x_name, y_name, name])	The Bivariate analyst object
`GeoUnivar`([name, alias])
`Univar`([name, alias])	The Univariate object

plans.analyst.linear(x, c0, c1)[source]#

Linear function f(x) = c0 + c1 * x

Parameters:

x (float | numpy.ndarray) – function inputs
c0 (float) – translational parameter
c1 (float) – scaling parameter

Returns:

function output

Return type:

float | numpy.ndarray

plans.analyst.power(x, c0, c1, c2)[source]#

Power function f(x) = c2 * ((x + c0)^c1)

Parameters:

x (float | numpy.ndarray) – function inputs
c0 (float) – translational parameter
c1 (float) – exponent parameter
c2 (float) – scaling parameter

Returns:

function output

Return type:

float | numpy.ndarray

plans.analyst.power_zero(x, c0, c1)[source]#

Power function with root in zero f(x) = c1 * ((x)^c0)

Parameters:

x (float | numpy.ndarray) – function inputs
c0 (float) – exponent parameter
c1 (float) – scaling parameter

Returns:

function output

Return type:

float | numpy.ndarray

class plans.analyst.Univar(name='MyUnivar', alias='Uv0')[source]#

Bases: DataSet

The Univariate object

__init__(name='MyUnivar', alias='Uv0')[source]#

Initialize the DataSet object.

Parameters:

name (str) – unique object name
alias (str) – unique object alias. If None, it takes the first and last characters from name

_set_view_specs()[source]#

Set view specifications. Expected to overwrite superior methods.

Returns:: None
Return type:: None

load_data(file_data)[source]#

Load data from file. Expected to overwrite superior methods.

Parameters:: file_data (str) – file path to data.
Returns:: None
Return type:: None

set_array(array)[source]#

Set array to data

Parameters:: array (numpy.ndarray) – Numpy array
Returns:: None
Return type:: None

update()[source]#: Refresh all mutable attributes based on data (including paths).

assess_normality(clevel=0.95)[source]#

Assessment on normality using standard tests

Returns:: dataframe of assessment results
Return type:: pandas.DataFrame

assess_frequency()[source]#

Assessment on data frequencies

Returns:: result dataframe
Return type:: pandas.DataFrame

assess_basic_stats()[source]#

Assesses basic statistics of the variable field.

Returns:: DataFrame with statistics (Count, Sum, Mean, SD, Min, percentiles, Max) and their values.
Return type:: pandas.DataFrame

assess_weibull_cdf()[source]#

Get the Weibull model

Parameters:: x (numpy.ndarray) – function inputs
Returns:: model dataframe
Return type:: pandas.DataFrame

assess_gumbel_cdf()[source]#

Assess the Gumbel CDF for the data (it assumes that is a maxima dataset)

Returns:: multiple results from the assessment
Return type:: dict

plot_hist(bins=100, colored=False, annotated=False, rule=None, show=False, folder='C:/sample', filename='histogram', specs=None, dpi=300)[source]#

Plot histogram of sample

Parameters:

bins (int) – number of bins
colored (bool) – Boolean to quantile-colored histogram
annotated (bool) – Boolean to plot stats texts over histogram
rule (str) – name of rule to compute bins
show (bool) – Boolean to show instead of saving
folder (str) – output folder
filename (str) – image file name
specs (dict) – specification dictionary
dpi (int) – image resolution (default = 96)

plot_qqplot(show=True, folder='C:/sample', filename='qqplot', specs=None, dpi=300)[source]#

Plot Q-Q Plot on Normal distribution

Parameters:

show (bool) – Boolean to show instead of saving
folder (str) – output folder
filename (str) – image file name
specs (dct) – specification dictionary
dpi (int) – image resolution (default = 300)

Returns:

None

Return type:

None

_plot(fig, gs, specs)[source]#

_build_axes(fig, gs, specs)[source]#

_get_fig_specs()[source]#

view(show=True, return_fig=False)[source]#

Get the basic visualization.

Parameters:: show (bool) – option for showing instead of saving.

Note

Uses values in the view_specs() attribute for plotting.

static plot_mean(ax, y_mu, xmin, xmax)[source]#

static plot_scatter(data, ax, ylim, specs, x_factor=4, formatter=None)[source]#

static plot_histh(data, ax, ylim, specs, formatter=None)[source]#

static plot_cdf(cdf_df, ax, ylim, specs, formatter=None)[source]#

static plot_stats(fig, stats_df, x=0.5, y=0.4)[source]#

static plot_cbar(data, ax, scheme, cmap='viridis', n_classes=5, side='right', width_factor=16)[source]#

static test_distribution(test_name, stat, p, clevel=0.95, distr='normal')[source]#

Util function for statistical testing

Parameters:

test_name (str) – name of test
stat (float) – statistic
p (float) – p-value
clevel (float) – confidence level
distr (str) – name of distribution

Returns:

summary of test

Return type:

dict

static test_normality_ks(data, clevel=0.95)[source]#

Test for normality using the Kolmogorov-Smirnov test

Kolmogorov-Smirnov Test: This test compares the observed distribution with the expected normal distribution using a test statistic and a p-value. A p-value less than 0.05 indicates that the null hypothesis should be rejected.

Parameters:: data (numpy.ndarray) – vector of data without nan values
Returns:: test result dictionary. Keys: Statistic, p-value and Is normal
Return type:: dct

static test_normality_sw(data, clevel=0.95)[source]#

Test for normality using the Shapiro-Wilk test.

Parameters:: data (numpy.ndarray) – vector of data without nan values
Returns:: test result dictionary. Keys: Statistic, p-value and Is normal
Return type:: dct

static test_normality_dp(data, clevel=0.95)[source]#

Test for normality using the D’Agostino-Pearson test.

Parameters:: data (numpy.ndarray) – vector of data without nan values
Returns:: test result dictionary. Keys: Statistic, p-value and Is normal
Return type:: dct

static get_tx(fx)[source]#

Simple function for computing the Return Period from the CDF

Parameters:: fx (float | numpy.ndarray) – CDF (FX)
Returns:: Return Period
Return type:: float | numpy.ndarray

static gumbel_fx(x, a, b)[source]#

Gumbel probability distribution F(X)

Parameters:

x (float | numpy.ndarray) – function inputs
a (float) – distribution parameter a
b (float) – distribution parameter b

Returns:

function output

Return type:

float | numpy.ndarray

static gumbel_tx(x, a, b)[source]#

Gumbel return period distribution T(X)

Parameters:

x (float | numpy.ndarray) – function inputs
a (float) – distribution parameter a
b (float) – distribution parameter b

Returns:

function output

Return type:

float | numpy.ndarray

static gumbel_freqfactor(tx=2)[source]#

Gumbel Frequency Factor K(T)

Parameters:: tx (float | numpy.ndarray) – return period T
Returns:: function output
Return type:: float | numpy.ndarray

static gumbel_se(std_sample, n_sample, tx)[source]#

Gumbel Standard Error for the MM fitted Gumbel function

Parameters:

std_sample (float) – sample standard deviation
n_sample (int) – sample size
tx (float | numpy.ndarray) – return period T

Returns:

function output

Return type:

float | numpy.ndarray

static empirical_px(ranks)[source]#

Get the empirical exceedance probability P(X)

Parameters:: ranks (class:numpy.array) – vector of ranks
Returns:: empirical exceedance probability P(X)
Return type:: class:numpy.array

static weibull_px(ranks)[source]#

Get the Weibull exceedance probability P(X)

Parameters:: ranks (class:numpy.array) – vector of ranks
Returns:: Weibull exceedance probability P(X)
Return type:: class:numpy.array

static gringorten_px(ranks)[source]#

Get the Gringorten exceedance probability P(X)

Parameters:: ranks (class:numpy.array) – vector of ranks
Returns:: Gringorten exceedance probability P(X)
Return type:: class:numpy.array

static sample_gamma(size, shape=3, scale=1, shape_mode=None, n_min=None, n_max=None)[source]#

Sample Gamma distribution

Parameters:

size (int) – Sample size
shape (float) – Shape parameter
scale (float) – Scale parameter
shape_mode (str or None) – Shape mode (overwrites shape value). See dict for options
n_min (float or None) – Normalization lower value
n_max (float or None) – Normalization upper value

Returns:

Sampled array

Return type:

numpy.ndarray

static nbins_fd(data)[source]#

This function computes the number of bins for histograms using the Freedman-Diaconis rule, which takes into account the interquartile range (IQR) of the sample, in addition to its range.

Parameters:: data (numpy.ndarray) – vector of data without nan values
Returns:: number of bins for histogram using the Freedman-Diaconis rule
Return type:: int

static nbins_sturges(data)[source]#

This function computes the number of bins using the Sturges rule, which assumes that the data follows a normal distribution and computes the number of bins based on its data runsize.

Parameters:: data (numpy.ndarray) – vector of data without nan values
Returns:: number of bins using the Sturges rule
Return type:: int

static nbins_scott(data)[source]#

This function computes the number of bins using the Scott rule, which is similar to the Freedman-Diaconis rule, but uses the standard deviation of the data to compute the bin runsize.

Parameters:: data (numpy.ndarray) – vector of data without nan values
Returns:: number of bins using the Scott rule
Return type:: int

static nbins_by_rule(data, rule=None)[source]#

Util function for rule-based nbins computation

Parameters:

data (numpy.ndarray) – vector of data without nan values
rule (str) – rule code (sturges, fd, scott)

Returns:

number of bins for histogram

Return type:

int

get_histogram(bins=None, rule=None)[source]#

static histogram(data, bins=100, rule=None)[source]#

Compute the histogram of the sample

Parameters:

data (numpy.ndarray) – vector of data without nan values
bins (int) – number of bins
rule (str) – rule to define the number of bins. If ‘none’, uses bins parameters.

Returns:

dataframe of histogram

Return type:

pandas.DataFrame

static qqplot(data)[source]#

Calculate the QQ-plot of data against normal distribution

Parameters:: data (numpy.ndarray) – vector of data without nan values
Returns:: dataframe of QQ plot
Return type:: pandas.DataFrame

static trace_variance(data)[source]#

Trace the mean variance from sample

Parameters:: data (numpy.ndarray) – vector of data without nan values
Returns:: dataframe of accumulated variance
Return type:: pandas.DataFrame

static get_bins(data, n_bins=5, scheme='equal')[source]#

static bins_equal(data, n_bins)[source]#

Calculates equally spaced (linear) bins for a 1D NumPy array.

Parameters:

data (numpy.ndarray) – The 1D NumPy array to analyze.
n_bins (int) – The desired number of bins/intervals.

Returns:

A NumPy array containing the linear bin boundary values.

Return type:

numpy.ndarray

static bins_quantiles(data, n_bins=5)[source]#

Calculates the numerical quantile boundaries (bins) for a 1D NumPy array.

Parameters:

data (numpy.ndarray) – The 1D NumPy array to analyze.
n_bins (int) – The number of quantile classes.

Returns:

A NumPy array containing the quantile boundary values.

Return type:

numpy.ndarray

static classify(data, bins=None, n_classes=5, scheme='equal')[source]#

Classifies a 1D NumPy array into 0-indexed bins based on provided bin boundaries.

Values x are assigned to class i if bins[i] < x <= bins[i+1]. Values less than or equal to bins[0] are assigned to class 0. Values greater than bins[-1] are assigned to the last class (len(bins) - 2).

Parameters:

data (numpy.ndarray) – The 1D NumPy array to classify.
bins (numpy.ndarray) – A 1D NumPy array specifying the sorted bin edges. E.g., [min_val, boundary1, boundary2, …, max_val]. Must contain at least two elements.

Returns:

A 1D NumPy array of integers representing the 0-indexed class for each element in data_array.

Return type:

numpy.ndarray

static quantiles_classify(data, n_classes=5)[source]#

Classifies a 1D NumPy array into quantile classes.

Parameters:

data (numpy.ndarray) – The 1D NumPy array to classify.
n_classes (int) – The number of quantile classes.

Returns:

A 1D NumPy array containing the quantile class (0 to n_classes-1) for each element.

Return type:

numpy.ndarray

class plans.analyst.GeoUnivar(name='MyGeoUnivar', alias='GV0')[source]#

Bases: Univar

__init__(name='MyGeoUnivar', alias='GV0')[source]#

Initialize the DataSet object.

Parameters:

name (str) – unique object name
alias (str) – unique object alias. If None, it takes the first and last characters from name

_set_fields()[source]#: Set fields names. Expected to increment superior methods.

_set_view_specs()[source]#

Set view specifications. Expected to overwrite superior methods.

Returns:: None
Return type:: None

load_data(file_data, layer_name)[source]#

Load data from file. Expected to overwrite superior methods.

Parameters:: file_data (str) – file path to data.
Returns:: None
Return type:: None

_build_axes(fig, gs, specs)[source]#

_get_fig_specs()[source]#

_plot(fig, gs, specs)[source]#

static plot_map(data, ax, column, specs, legend=False)[source]#

class plans.analyst.Bivar(df_data, x_name='x', y_name='y', name='myvars')[source]#

Bases: object

The Bivariate analyst object

# [major docstring]

__init__(df_data, x_name='x', y_name='y', name='myvars')[source]#

fit(model_type='Linear')[source]#

Fit model to bivariate object

Parameters:: model_type (str) – model type. options: Linear, Power, Power_zero
Returns:: None
Return type:: None

update_model(params_mean, params_sd=None, model_type='Linear')[source]#

Update model based on parameters

Parameters:

params_mean (list) – list of mean of parameters
params_sd (list) – list of standard deviation of parameters
model_type (str) – model type. options: Linear, Power, Power_zero

Returns:

None

Return type:

None

updata_model_data(model_type='Linear')[source]#

Update only model data output

Parameters:: model_type (str) – model type. options: Linear, Power, Power_zero
Returns:: None
Return type:: None

view(show=True, folder='C:/sample', filename='view', specs=None, fig_format='jpg', dpi=300)[source]#

Plot basic view of Bivar base_object

Parameters:

show (bool) – Boolean to show instead of saving
folder (str) – output folder
filename (str) – image file name
specs (dict) – specification dictionary
dpi (int) – image resolution (default = 300)
fig_format (str) – image fig_format (ex: png or jpg). Default jpg

Returns:

None

Return type:

None

view_model(model_type='Power', show=True, folder='C:/sample', filename=None, specs=None, dpi=300, fig_format='jpg')[source]#

Plot pannel for model analysis

Parameters:

show (bool) – Boolean to show instead of saving
folder (str) – output folder
filename (str) – image file name
specs (dict) – specification dictionary
dpi (int) – image resolution (default = 300)
fig_format (str) – image fig_format (ex: png or jpg). Default jpg

Returns:

None

Return type:

None

correlation()[source]#

Compute the R correlation coefficient of the base_object

Returns:: R correlation coefficient
Return type:: float

prediction_bands(lst_bounds=None, n_sim=100, n_grid=100, n_seed=None, p0=None)[source]#

Run Monte Carlo Simulation to get prediciton bands

Parameters:

lst_bounds (list) – list of prediction bounds [min, max], if None, 3x the date range is used
n_sim (int) – number of simulation runs
n_grid (int) – number of prediction intervals
n_seed (int) – number of random seed for reproducibility. Default = None
p0 (list) – list of initial values to search. Default: None

Returns:

base_object with result dataframes

Return type:

dict

assess_error_normality(model_type='Linear', clevel=0.95)[source]#

static bias(pred, obs)[source]#

Compute the Bias between predicted and observated sample

Parameters:

pred (numpy.ndarray) – numpy.ndarray vector of prediction vales
obs (numpy.ndarray) – numpy.ndarray vector of observed values

Returns:

Bias value

Return type:

float

static rmse(pred, obs)[source]#

Compute the RMSE metric between predicted and observated sample

Parameters:

pred (numpy.ndarray) – numpy.ndarray vector of prediction vales
obs (numpy.ndarray) – numpy.ndarray vector of observed values

Returns:

RMSE value

Return type:

float

static mae(pred, obs)[source]#

Compute the Mean Absolute Error (MAE) between predicted and observated sample

Parameters:

pred (numpy.ndarray) – numpy.ndarray vector of prediction vales
obs (numpy.ndarray) – numpy.ndarray vector of observed values

Returns:

MAE value

Return type:

float

static rsq(pred, obs)[source]#

Compute the R-square between predicted and observated sample

Parameters:

pred (numpy.ndarray) – numpy.ndarray vector of prediction vales
obs (numpy.ndarray) – numpy.ndarray vector of observed values

Returns:

R-square value

Return type:

float

class plans.analyst.Bayes(df_hypotheses, name='myBayes', nomenclature=None, gridsize=100)[source]#

Bases: object

The Bayes Theorem Analyst Object

__init__(df_hypotheses, name='myBayes', nomenclature=None, gridsize=100)[source]#: Deploy the Bayes Analyst :param df_hypotheses: dataframe listing all model hypotheses. Must contain a field Name (of parameter), Min and Max. :type df_hypotheses: :class:pandas.DataFrame :param name: name of analyst :type name: str :param nomenclature: dictionary for rename nomenclature :type nomenclature: dict :param gridsize: grid resolution in histrograms (bins) :type gridsize: int

_reset_nomenclature(dct_names)[source]#: Reset nomenclature :param dct_names: dictionary to rename nomenclatures :type dct_names: dict :return: None :rtype: None

_insert_new_step()[source]#: convenience void function for inserting new step objects :return: None :rtype:

_accumulate(n_step)[source]#: convenience void function for accumulate probability :param n_step: step number to accumulate :type n_step: int :return: None :rtype: none

conditionalize(dct_evidence, s_varfield='E', s_weightfield='W')[source]#: Conditionalize procedure of the Bayes Theorem :param dct_evidence: base_object of evidence dataframes :type dct_evidence: dict :param s_varfield: name of variable field in evidence dataframes :type s_varfield: str :param s_weightfield: name of weights field in evidence dataframes :type s_weightfield: str :return: None :rtype: none

plot_step(n_step, folder='C:/sample', filename='bayes', specs=None, dpi=300, show=False)[source]#: Void function for plot pannel of conditionalization step :param n_step: step number :type n_step: int :param folder: export folder :type folder: str :param filename: file name :type filename: str :param specs: plot specs base_object :type specs: dict :param dpi: plot resolution :type dpi: int :param show: control to show plot instead of saving :type show: bool :return: None :rtype: None

plans.analyst#

Overview#

Example#

This Page