plans.analyst#
Classes designed to handle statistical analysis.
Overview#
# todo [major docstring improvement] – overview Mauris gravida ex quam, in porttitor lacus lobortis vitae. In a lacinia nisl. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas.
Example#
# todo [major docstring improvement] – examples Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla mollis tincidunt erat eget iaculis. Mauris gravida ex quam, in porttitor lacus lobortis vitae. In a lacinia nisl.
import numpy as np
from plans import analyst
# get data to a vector
data_vector = np.random.rand(1000)
# instantiate the Univar object
uni = analyst.Univar(sample=data_vector, name="my_data")
# view sample
uni.view()
Mauris gravida ex quam, in porttitor lacus lobortis vitae. In a lacinia nisl.
Functions
|
Linear function f(x) = c0 + c1 * x |
|
Power function f(x) = c2 * ((x + c0)^c1) |
|
Power function with root in zero f(x) = c1 * ((x)^c0) |
Classes
|
The Bayes Theorem Analyst Object |
|
The Bivariate analyst object |
|
|
|
The Univariate object |
- plans.analyst.linear(x, c0, c1)[source]#
Linear function f(x) = c0 + c1 * x
- Parameters:
x (float |
numpy.ndarray
) – function inputsc0 (float) – translational parameter
c1 (float) – scaling parameter
- Returns:
function output
- Return type:
float |
numpy.ndarray
- plans.analyst.power(x, c0, c1, c2)[source]#
Power function f(x) = c2 * ((x + c0)^c1)
- Parameters:
x (float |
numpy.ndarray
) – function inputsc0 (float) – translational parameter
c1 (float) – exponent parameter
c2 (float) – scaling parameter
- Returns:
function output
- Return type:
float |
numpy.ndarray
- plans.analyst.power_zero(x, c0, c1)[source]#
Power function with root in zero f(x) = c1 * ((x)^c0)
- Parameters:
x (float |
numpy.ndarray
) – function inputsc0 (float) – exponent parameter
c1 (float) – scaling parameter
- Returns:
function output
- Return type:
float |
numpy.ndarray
- class plans.analyst.Univar(name='MyUnivar', alias='Uv0')[source]#
Bases:
DataSet
The Univariate object
- __init__(name='MyUnivar', alias='Uv0')[source]#
Initialize the
DataSet
object.- Parameters:
name (str) – unique object name
alias (str) – unique object alias. If None, it takes the first and last characters from name
- _set_view_specs()[source]#
Set view specifications. Expected to overwrite superior methods.
- Returns:
None
- Return type:
None
- load_data(file_data)[source]#
Load data from file. Expected to overwrite superior methods.
- Parameters:
file_data (str) – file path to data.
- Returns:
None
- Return type:
None
- set_array(array)[source]#
Set array to data
- Parameters:
array (
numpy.ndarray
) – Numpy array- Returns:
None
- Return type:
None
- assess_normality(clevel=0.95)[source]#
Assessment on normality using standard tests
- Returns:
dataframe of assessment results
- Return type:
pandas.DataFrame
- assess_frequency()[source]#
Assessment on data frequencies
- Returns:
result dataframe
- Return type:
pandas.DataFrame
- assess_basic_stats()[source]#
Assesses basic statistics of the variable field.
- Returns:
DataFrame with statistics (Count, Sum, Mean, SD, Min, percentiles, Max) and their values.
- Return type:
pandas.DataFrame
- assess_weibull_cdf()[source]#
Get the Weibull model
- Parameters:
x (
numpy.ndarray
) – function inputs- Returns:
model dataframe
- Return type:
pandas.DataFrame
- assess_gumbel_cdf()[source]#
Assess the Gumbel CDF for the data (it assumes that is a maxima dataset)
- Returns:
multiple results from the assessment
- Return type:
dict
- plot_hist(bins=100, colored=False, annotated=False, rule=None, show=False, folder='C:/sample', filename='histogram', specs=None, dpi=300)[source]#
Plot histogram of sample
- Parameters:
bins (int) – number of bins
colored (bool) – Boolean to quantile-colored histogram
annotated (bool) – Boolean to plot stats texts over histogram
rule (str) – name of rule to compute bins
show (bool) – Boolean to show instead of saving
folder (str) – output folder
filename (str) – image file name
specs (dict) – specification dictionary
dpi (int) – image resolution (default = 96)
- plot_qqplot(show=True, folder='C:/sample', filename='qqplot', specs=None, dpi=300)[source]#
Plot Q-Q Plot on Normal distribution
- Parameters:
show (bool) – Boolean to show instead of saving
folder (str) – output folder
filename (str) – image file name
specs (dct) – specification dictionary
dpi (int) – image resolution (default = 300)
- Returns:
None
- Return type:
None
- view(show=True, return_fig=False)[source]#
Get the basic visualization.
- Parameters:
show (bool) – option for showing instead of saving.
Note
Uses values in the
view_specs()
attribute for plotting.
- static plot_cbar(data, ax, scheme, cmap='viridis', n_classes=5, side='right', width_factor=16)[source]#
- static test_distribution(test_name, stat, p, clevel=0.95, distr='normal')[source]#
Util function for statistical testing
- Parameters:
test_name (str) – name of test
stat (float) – statistic
p (float) – p-value
clevel (float) – confidence level
distr (str) – name of distribution
- Returns:
summary of test
- Return type:
dict
- static test_normality_ks(data, clevel=0.95)[source]#
Test for normality using the Kolmogorov-Smirnov test
Kolmogorov-Smirnov Test: This test compares the observed distribution with the expected normal distribution using a test statistic and a p-value. A p-value less than 0.05 indicates that the null hypothesis should be rejected.
- Parameters:
data (
numpy.ndarray
) – vector of data without nan values- Returns:
test result dictionary. Keys: Statistic, p-value and Is normal
- Return type:
dct
- static test_normality_sw(data, clevel=0.95)[source]#
Test for normality using the Shapiro-Wilk test.
- Parameters:
data (
numpy.ndarray
) – vector of data without nan values- Returns:
test result dictionary. Keys: Statistic, p-value and Is normal
- Return type:
dct
- static test_normality_dp(data, clevel=0.95)[source]#
Test for normality using the D’Agostino-Pearson test.
- Parameters:
data (
numpy.ndarray
) – vector of data without nan values- Returns:
test result dictionary. Keys: Statistic, p-value and Is normal
- Return type:
dct
- static get_tx(fx)[source]#
Simple function for computing the Return Period from the CDF
- Parameters:
fx (float |
numpy.ndarray
) – CDF (FX)- Returns:
Return Period
- Return type:
float |
numpy.ndarray
- static gumbel_fx(x, a, b)[source]#
Gumbel probability distribution F(X)
- Parameters:
x (float |
numpy.ndarray
) – function inputsa (float) – distribution parameter a
b (float) – distribution parameter b
- Returns:
function output
- Return type:
float |
numpy.ndarray
- static gumbel_tx(x, a, b)[source]#
Gumbel return period distribution T(X)
- Parameters:
x (float |
numpy.ndarray
) – function inputsa (float) – distribution parameter a
b (float) – distribution parameter b
- Returns:
function output
- Return type:
float |
numpy.ndarray
- static gumbel_freqfactor(tx=2)[source]#
Gumbel Frequency Factor K(T)
- Parameters:
tx (float |
numpy.ndarray
) – return period T- Returns:
function output
- Return type:
float |
numpy.ndarray
- static gumbel_se(std_sample, n_sample, tx)[source]#
Gumbel Standard Error for the MM fitted Gumbel function
- Parameters:
std_sample (float) – sample standard deviation
n_sample (int) – sample size
tx (float |
numpy.ndarray
) – return period T
- Returns:
function output
- Return type:
float |
numpy.ndarray
- static empirical_px(ranks)[source]#
Get the empirical exceedance probability P(X)
- Parameters:
ranks (class:numpy.array) – vector of ranks
- Returns:
empirical exceedance probability P(X)
- Return type:
class:numpy.array
- static weibull_px(ranks)[source]#
Get the Weibull exceedance probability P(X)
- Parameters:
ranks (class:numpy.array) – vector of ranks
- Returns:
Weibull exceedance probability P(X)
- Return type:
class:numpy.array
- static gringorten_px(ranks)[source]#
Get the Gringorten exceedance probability P(X)
- Parameters:
ranks (class:numpy.array) – vector of ranks
- Returns:
Gringorten exceedance probability P(X)
- Return type:
class:numpy.array
- static sample_gamma(size, shape=3, scale=1, shape_mode=None, n_min=None, n_max=None)[source]#
Sample Gamma distribution
- Parameters:
size (int) – Sample size
shape (float) – Shape parameter
scale (float) – Scale parameter
shape_mode (str or None) – Shape mode (overwrites shape value). See dict for options
n_min (float or None) – Normalization lower value
n_max (float or None) – Normalization upper value
- Returns:
Sampled array
- Return type:
numpy.ndarray
- static nbins_fd(data)[source]#
This function computes the number of bins for histograms using the Freedman-Diaconis rule, which takes into account the interquartile range (IQR) of the sample, in addition to its range.
- Parameters:
data (
numpy.ndarray
) – vector of data without nan values- Returns:
number of bins for histogram using the Freedman-Diaconis rule
- Return type:
int
- static nbins_sturges(data)[source]#
This function computes the number of bins using the Sturges rule, which assumes that the data follows a normal distribution and computes the number of bins based on its data runsize.
- Parameters:
data (
numpy.ndarray
) – vector of data without nan values- Returns:
number of bins using the Sturges rule
- Return type:
int
- static nbins_scott(data)[source]#
This function computes the number of bins using the Scott rule, which is similar to the Freedman-Diaconis rule, but uses the standard deviation of the data to compute the bin runsize.
- Parameters:
data (
numpy.ndarray
) – vector of data without nan values- Returns:
number of bins using the Scott rule
- Return type:
int
- static nbins_by_rule(data, rule=None)[source]#
Util function for rule-based nbins computation
- Parameters:
data (
numpy.ndarray
) – vector of data without nan valuesrule (str) – rule code (sturges, fd, scott)
- Returns:
number of bins for histogram
- Return type:
int
- static histogram(data, bins=100, rule=None)[source]#
Compute the histogram of the sample
- Parameters:
data (
numpy.ndarray
) – vector of data without nan valuesbins (int) – number of bins
rule (str) – rule to define the number of bins. If ‘none’, uses bins parameters.
- Returns:
dataframe of histogram
- Return type:
pandas.DataFrame
- static qqplot(data)[source]#
Calculate the QQ-plot of data against normal distribution
- Parameters:
data (
numpy.ndarray
) – vector of data without nan values- Returns:
dataframe of QQ plot
- Return type:
pandas.DataFrame
- static trace_variance(data)[source]#
Trace the mean variance from sample
- Parameters:
data (
numpy.ndarray
) – vector of data without nan values- Returns:
dataframe of accumulated variance
- Return type:
pandas.DataFrame
- static bins_equal(data, n_bins)[source]#
Calculates equally spaced (linear) bins for a 1D NumPy array.
- Parameters:
data (numpy.ndarray) – The 1D NumPy array to analyze.
n_bins (int) – The desired number of bins/intervals.
- Returns:
A NumPy array containing the linear bin boundary values.
- Return type:
numpy.ndarray
- static bins_quantiles(data, n_bins=5)[source]#
Calculates the numerical quantile boundaries (bins) for a 1D NumPy array.
- Parameters:
data (numpy.ndarray) – The 1D NumPy array to analyze.
n_bins (int) – The number of quantile classes.
- Returns:
A NumPy array containing the quantile boundary values.
- Return type:
numpy.ndarray
- static classify(data, bins=None, n_classes=5, scheme='equal')[source]#
Classifies a 1D NumPy array into 0-indexed bins based on provided bin boundaries.
Values x are assigned to class i if bins[i] < x <= bins[i+1]. Values less than or equal to bins[0] are assigned to class 0. Values greater than bins[-1] are assigned to the last class (len(bins) - 2).
- Parameters:
data (numpy.ndarray) – The 1D NumPy array to classify.
bins (numpy.ndarray) – A 1D NumPy array specifying the sorted bin edges. E.g., [min_val, boundary1, boundary2, …, max_val]. Must contain at least two elements.
- Returns:
A 1D NumPy array of integers representing the 0-indexed class for each element in data_array.
- Return type:
numpy.ndarray
- static quantiles_classify(data, n_classes=5)[source]#
Classifies a 1D NumPy array into quantile classes.
- Parameters:
data (numpy.ndarray) – The 1D NumPy array to classify.
n_classes (int) – The number of quantile classes.
- Returns:
A 1D NumPy array containing the quantile class (0 to n_classes-1) for each element.
- Return type:
numpy.ndarray
- class plans.analyst.GeoUnivar(name='MyGeoUnivar', alias='GV0')[source]#
Bases:
Univar
- __init__(name='MyGeoUnivar', alias='GV0')[source]#
Initialize the
DataSet
object.- Parameters:
name (str) – unique object name
alias (str) – unique object alias. If None, it takes the first and last characters from name
- _set_view_specs()[source]#
Set view specifications. Expected to overwrite superior methods.
- Returns:
None
- Return type:
None
- class plans.analyst.Bivar(df_data, x_name='x', y_name='y', name='myvars')[source]#
Bases:
object
The Bivariate analyst object
# [major docstring]
- fit(model_type='Linear')[source]#
Fit model to bivariate object
- Parameters:
model_type (str) – model type. options: Linear, Power, Power_zero
- Returns:
None
- Return type:
None
- update_model(params_mean, params_sd=None, model_type='Linear')[source]#
Update model based on parameters
- Parameters:
params_mean (list) – list of mean of parameters
params_sd (list) – list of standard deviation of parameters
model_type (str) – model type. options: Linear, Power, Power_zero
- Returns:
None
- Return type:
None
- updata_model_data(model_type='Linear')[source]#
Update only model data output
- Parameters:
model_type (str) – model type. options: Linear, Power, Power_zero
- Returns:
None
- Return type:
None
- view(show=True, folder='C:/sample', filename='view', specs=None, fig_format='jpg', dpi=300)[source]#
Plot basic view of Bivar base_object
- Parameters:
show (bool) – Boolean to show instead of saving
folder (str) – output folder
filename (str) – image file name
specs (dict) – specification dictionary
dpi (int) – image resolution (default = 300)
fig_format (str) – image fig_format (ex: png or jpg). Default jpg
- Returns:
None
- Return type:
None
- view_model(model_type='Power', show=True, folder='C:/sample', filename=None, specs=None, dpi=300, fig_format='jpg')[source]#
Plot pannel for model analysis
- Parameters:
show (bool) – Boolean to show instead of saving
folder (str) – output folder
filename (str) – image file name
specs (dict) – specification dictionary
dpi (int) – image resolution (default = 300)
fig_format (str) – image fig_format (ex: png or jpg). Default jpg
- Returns:
None
- Return type:
None
- correlation()[source]#
Compute the R correlation coefficient of the base_object
- Returns:
R correlation coefficient
- Return type:
float
- prediction_bands(lst_bounds=None, n_sim=100, n_grid=100, n_seed=None, p0=None)[source]#
Run Monte Carlo Simulation to get prediciton bands
- Parameters:
lst_bounds (list) – list of prediction bounds [min, max], if None, 3x the date range is used
n_sim (int) – number of simulation runs
n_grid (int) – number of prediction intervals
n_seed (int) – number of random seed for reproducibility. Default = None
p0 (list) – list of initial values to search. Default: None
- Returns:
base_object with result dataframes
- Return type:
dict
- static bias(pred, obs)[source]#
Compute the Bias between predicted and observated sample
- Parameters:
pred (
numpy.ndarray
) –numpy.ndarray
vector of prediction valesobs (
numpy.ndarray
) –numpy.ndarray
vector of observed values
- Returns:
Bias value
- Return type:
float
- static rmse(pred, obs)[source]#
Compute the RMSE metric between predicted and observated sample
- Parameters:
pred (
numpy.ndarray
) –numpy.ndarray
vector of prediction valesobs (
numpy.ndarray
) –numpy.ndarray
vector of observed values
- Returns:
RMSE value
- Return type:
float
- class plans.analyst.Bayes(df_hypotheses, name='myBayes', nomenclature=None, gridsize=100)[source]#
Bases:
object
The Bayes Theorem Analyst Object
- __init__(df_hypotheses, name='myBayes', nomenclature=None, gridsize=100)[source]#
Deploy the Bayes Analyst :param df_hypotheses: dataframe listing all model hypotheses. Must contain a field Name (of parameter), Min and Max. :type df_hypotheses: :class:pandas.DataFrame :param name: name of analyst :type name: str :param nomenclature: dictionary for rename nomenclature :type nomenclature: dict :param gridsize: grid resolution in histrograms (bins) :type gridsize: int
- _reset_nomenclature(dct_names)[source]#
Reset nomenclature :param dct_names: dictionary to rename nomenclatures :type dct_names: dict :return: None :rtype: None
- _insert_new_step()[source]#
convenience void function for inserting new step objects :return: None :rtype:
- _accumulate(n_step)[source]#
convenience void function for accumulate probability :param n_step: step number to accumulate :type n_step: int :return: None :rtype: none
- conditionalize(dct_evidence, s_varfield='E', s_weightfield='W')[source]#
Conditionalize procedure of the Bayes Theorem :param dct_evidence: base_object of evidence dataframes :type dct_evidence: dict :param s_varfield: name of variable field in evidence dataframes :type s_varfield: str :param s_weightfield: name of weights field in evidence dataframes :type s_weightfield: str :return: None :rtype: none
- plot_step(n_step, folder='C:/sample', filename='bayes', specs=None, dpi=300, show=False)[source]#
Void function for plot pannel of conditionalization step :param n_step: step number :type n_step: int :param folder: export folder :type folder: str :param filename: file name :type filename: str :param specs: plot specs base_object :type specs: dict :param dpi: plot resolution :type dpi: int :param show: control to show plot instead of saving :type show: bool :return: None :rtype: None