plans.datasets.core#

Primitive classes for handling datasets.

Overview#

# todo [docstring] – overview Mauris gravida ex quam, in porttitor lacus lobortis vitae. In a lacinia nisl. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas.

Example#

# todo [docstring] – examples Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla mollis tincidunt erat eget iaculis. Mauris gravida ex quam, in porttitor lacus lobortis vitae. In a lacinia nisl.

import numpy as np
print("Hello World!")

Mauris gravida ex quam, in porttitor lacus lobortis vitae. In a lacinia nisl. Mauris gravida ex quam, in porttitor lacus lobortis vitae. In a lacinia nisl.

Functions

dataframe_prepro(dataframe)

Utility function for dataframe pre-processing.

get_colors([size, cmap, randomize])

Utility function to get a list of random colors

Classes

QualiHard([name])

A Quali-Hard is a hard-coded qualitative map (that is, the table is pre-set) todo [docstring] -- examples

QualiRaster([name, dtype])

Basic qualitative raster map dataset.

QualiRasterCollection(name)

The raster collection base dataset.

QualiRasterSeries(name, varname, varalias[, ...])

A RasterSeries where date matters and all maps in collections are expected to be QualiRaster` with the same variable, same projection and same grid.

Raster([name, alias, dtype])

The basic Raster map dataset.

RasterCollection([name])

The raster collection base dataset.

RasterSeries(name, varname, varalias, units)

A RasterCollection` where datetime matters and all maps in collections are expected to be the same variable, same projection and same grid.

SciRaster([name, alias])

TimeSeries([name, alias])

TimeSeriesCluster([name, base_object])

The TimeSeriesCluster instance is desgined for holding a collection of same variable time series.

TimeSeriesCollection([name, base_object])

A collection of time series objects with associated metadata.

TimeSeriesSamples([name, base_object])

The TimeSeriesSamples instance is desgined for holding a collection of same variable time series arising from the same underlying process.

TimeSeriesSpatialSamples([name, base_object])

The TimeSeriesSpatialSamples instance is desgined for holding a collection of same variable time series arising from the same underlying process in space.

Zones([name])

Zones map dataset is a QualiRaster designed to handle large volume of positive integer numbers (ids of zones) todo [docstring] -- examples

plans.datasets.core.dataframe_prepro(dataframe)[source]#

Utility function for dataframe pre-processing.

Parameters:

dataframe (pandas.DataFrame) – incoming dataframe

Returns:

prepared dataframe

Return type:

pandas.DataFrame

plans.datasets.core.get_colors(size=10, cmap='tab20', randomize=True)[source]#

Utility function to get a list of random colors

Parameters:
  • size (int) – Size of list of colors

  • cmap (str) – Name of matplotlib color map (cmap)

Returns:

list of random colors

Return type:

list

class plans.datasets.core.TimeSeries(name='MyTimeSeries', alias='TS0')[source]#

Bases: Univar

__init__(name='MyTimeSeries', alias='TS0')[source]#

Initialize the DataSet object.

Parameters:
  • name (str) – unique object name

  • alias (str) – unique object alias. If None, it takes the first and last characters from name

_set_fields()[source]#

Set catalog fields names. Expected to increment superior methods.

_set_frequency()[source]#

Guess the datetime resolution of a time series based on the consistency of timestamp components (e.g., seconds, minutes).

Caution

This method infers the datetime frequency of the time series data based on the consistency of timestamp components.

get_metadata()[source]#

Get a dictionary with object metadata. Expected to increment superior methods.

Note

Metadata does not necessarily inclue all object attributes.

Returns:

dictionary with all metadata

Return type:

dict

update()[source]#

Update internal attributes based on the current data.

Notes

  • Calls the set_frequency method to update the datetime frequency attribute.

  • Updates the start attribute with the minimum datetime value in the data.

  • Updates the end attribute with the maximum datetime value in the data.

  • Updates the var_min attribute with the minimum value of the variable field in the data.

  • Updates the var_max attribute with the maximum value of the variable field in the data.

  • Updates the data_size attribute with the length of the data.

setter(dict_setter, load_data=True)[source]#

Set selected attributes based on an incoming dictionary. Expected to increment superior methods.

Parameters:
  • dict_setter (dict) – incoming dictionary with attribute values

  • load_data (bool) – option for loading data from incoming file. Default is True.

set_data(input_df, input_dtfield, input_varfield, filter_dates=None, dropnan=True)[source]#

Set time series data from an inputs DataFrame.

Parameters:
  • input_df (pandas.DataFrame) – Input DataFrame containing time series data.

  • input_dtfield (str) – Name of the datetime field in the inputs DataFrame.

  • input_varfield (str) – Name of the variable field in the inputs DataFrame.

  • filter_dates – List of [Start, End] used for filtering date range.

  • dropnan (bool) – If True, drop NaN values from the DataFrame. Default is True.

Notes

  • Assumes the inputs DataFrame has a datetime column in the format “YYYY-mm-DD HH:MM:SS”.

  • Renames columns to standard format (datetime: self.dtfield, variable: self.varfield).

  • Converts the datetime column to standard format.

load_data(file_data, input_dtfield=None, input_varfield=None, in_sep=';', filter_dates=None)[source]#

Load data from file. Expected to overwrite superior methods.

Parameters:
  • file_data (str) – Absolute Path to the csv inputs file.

  • input_varfield (str) – Name of the incoming varfield.

  • input_dtfield (str) – Name of the incoming datetime field. Default is “datetime”.

  • sep (list) – String separator. Default is ;.

  • filter_dates – List of start and end date to filter. Default is None

Notes

  • Assumes the inputs file is in csv format.

  • Expects a datetime column in the format YYYY-mm-DD HH:MM:SS.

cut_edges(inplace=False)[source]#

Cut off initial and final NaN records in a given time series.

Parameters:

inplace (bool) – If True, the operation will be performed in-place, and the original data will be modified. If False, a new DataFrame with cut edges will be returned, and the original data will remain unchanged. Default is False.

Returns:

If inplace is False, a new DataFrame with cut edges. If inplace is True, returns None, and the original data is modified in-place.

Return type:

pandas.DataFrame`` or None

Notes

  • This function removes leading and trailing rows with NaN values in the specified variable field.

  • The operation is performed on a copy of the original data, and the original data remains unchanged.

standardize()[source]#

Standardize the data based on regular datetime steps and the time resolution.

Notes

  • Creates a full date range with the expected frequency for the standardization period.

  • Groups the data by epochs (based on the frequency and datetime field), applies the specified aggregation function, and fills in missing values with left merges.

  • Updates internal attributes, including self.isstandard to indicate that the data has been standardized.

Warning

The standardize method modifies the internal data representation. Ensure to review the data after standardization.

clear_outliers(inplace=False)[source]#

Clears outlier values from the specified variable field in the DataFrame.

Parameters:

inplace (if inplace is True, otherwise the DataFrame with outliers cleared.) – If True, the operation is performed in-place, modifying the DataFrame directly. If False, a new DataFrame with outliers removed is returned. Default value = False

Return type:

pandas.DataFrame or None

get_epochs(inplace=False)[source]#

Get Epochs (periods) for continuous time series (0 = gap epoch).

Parameters:

inplace (bool) – Option to set Epochs inplace. Default is False.

Returns:

A DataFrame if inplace is False or None.

Return type:

pandas.DataFrame`, None

Notes

This function labels continuous chunks of data as Epochs, with Epoch 0 representing gaps in the time series.

update_epochs_stats()[source]#

Update all epochs statistics.

Notes

  • This function updates statistics for all epochs in the time series.

  • Ensures that the data is standardized by calling the standardize method if it’s not already standardized.

  • Removes epoch 0 from the statistics since it typically represents non-standardized or invalid data.

  • Groups the data by Epoch_Id and calculates statistics such as count, start, and end timestamps for each epoch.

  • Generates random colors for each epoch using the get_random_colors function with a specified colormap (cmap` attribute).

  • Includes the time series name in the statistics for identification.

  • Organizes the statistics DataFrame to include relevant columns: Name, Epoch_Id, Count, Start, End, and Color.

  • Updates the attribute epochs_n with the number of epochs in the statistics.

Examples

todo [examples]

interpolate_gaps(method='linear', constant=0, inplace=False)[source]#

Fills gaps in a time series using various interpolation methods.

Parameters:
  • method (str) – Specifies the interpolation method. The default value is linear.

  • constant (float) – The constant value used when the constant method is selected. Default value = 0.

  • inplace (bool) – If True, modifies the original DataFrame in-place. Default value = False.

Returns:

A new pandas.DataFrame with interpolated values if inplace is False, otherwise None.

Return type:

pandas.DataFrame or None

Notes

This function handles time series data, standardizing it if necessary before performing interpolation. The process is applied to each unique epoch within the series.

  • linear: linear interpolation

  • nearest: uses the value of the closest data point.

  • zero: fills gaps with zeros.

  • constant: fills gaps with a constant value provided in method parameter

  • slinear: first order spline interpolation

  • quadratic: second order spline interpolation

  • cubic: third order spline interpolation

aggregate(freq, bad_max, agg_funcs=None)[source]#

Aggregate the time series data based on a specified frequency using various aggregation functions.

Parameters:
  • freq (str) – Pandas-like alias frequency at which to aggregate the time series data.

  • bad_max (int) – The maximum number of Bad records allowed in a time window for aggregation. Records with more Bad entries will be excluded from the aggregated result. Default is 7.

  • agg_funcs (dict) – A dictionary specifying customized aggregation functions for each variable. Default is None, which uses standard aggregation functions (sum, mean, median, min, max, std, var, percentiles).

Returns:

A new pandas.DataFrame with aggregated values based on the specified frequency.

Return type:

pandas.DataFrame

Notes

Resamples the time series data to the specified frequency using Pandas-like alias strings. Aggregates the values using the specified aggregation functions. Counts the number of Bad records in each time window and excludes time windows with more Bad entries than the specified threshold.

Common options include:
  • h for hourly frequency

  • D for daily frequency

  • W for weekly frequency

  • MS for monthly/start frequency

  • QS for quarterly/start frequency

  • YS for yearly/start frequency

More options and details can be found in the Pandas documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases.

upscale(freq, bad_max, inplace=True)[source]#

Upscale time series for larger time steps. This method uses the agg attribute. See the aggregate method.

Parameters:
  • freq (str) – Pandas-like alias frequency at which to aggregate the time series data. Common options include: - h for hourly frequency - D for daily frequency - W for weekly frequency - MS for monthly/start frequency - QS for quarterly/start frequency - YS for yearly/start frequency More options and details can be found in the Pandas documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases.

  • bad_max (int) – The maximum number of Bad records allowed in a time window for aggregation. Records with more Bad entries will be excluded from the aggregated result.

  • inplace (bool) – option for overwrite data, default True

downscale(freq)[source]#

Donwscale time series for smaller time steps using linear inteporlation.

Parameters:

freq (str) – new time step frequency

Returns:

Dataframe of downscaled data

Return type:

pandas.DataFrame

assess_extreme_values(eva_freq='YS', eva_agg='max')[source]#

Run Extreme Values Analysis (EVA) over the Time Series and set the eva attribute

Parameters:
  • eva_freq (str) – standard pandas frequency alias for upscaling data

  • eva_agg (str) – standard pandas aggregation alias for upscaling (expected: max or min)

view_epochs(show=True)[source]#

Get a basic visualization. Expected to overwrite superior methods.

Parameters:

show (bool) – option for showing instead of saving.

Notes

  • Uses values in the view_specs() attribute for plotting

_set_view_specs()[source]#

Set view specifications.

view(show=True, return_fig=False, minimal_dates=False, include_eva=False)[source]#

Get a basic visualization.

Parameters:
  • show (bool) – option for showing instead of saving.

  • return_fig (bool) – option for returning the figure object itself.

  • minimal_dates (bool or file path to figure or str) – option for setting minimal dates layout in x-axis.

Note

Use values in the view_specs() attribute for plotting

static add_hour(df, hour=12, dt_field='datetime')[source]#
class plans.datasets.core.TimeSeriesCollection(name='myTSCollection', base_object=None)[source]#

Bases: Collection

A collection of time series objects with associated metadata.

The TimeSeriesCollection or simply TSC class extends the Collection class and is designed to handle time series data. It can be miscellaneous datasets.

Note

See TimeSeriesCluster for managing time series of the same variable

__init__(name='myTSCollection', base_object=None)[source]#

Deploy the time series collection data structure.

Parameters:
  • name (str) – Name of the time series collection. Default is “myTSCollection”.

  • base_object (TimeSeries or None) – Base object for the time series collection. If None, a default TimeSeries object is created. Default is None.

Notes

  • If base_object is not provided, a default TimeSeries object is created.

update(details=False)[source]#

Update the time series collection.

Parameters:

details (bool) – bool, optional If True, update additional details. Default is False.

Examples

load_data(table_file, filter_dates=None)[source]#

Load data from table file (information table) into the time series collection.

Parameters:

table_file (str) –

str Path to file. Expected to be a csv table.

todo place this in IO files docs Required columns:

  • Id: int, required. Unique number id.

  • Name: str, required. Simple name.

  • Alias: str, required. Short nickname.

  • X: float, required. Longitude in WGS 84 Datum (EPSG4326).

  • Y: float, required. Latitude in WGS 84 Datum (EPSG4326).

  • Code: str, required.

  • Source: str, required.

  • Description: str, required.

  • Color: str, required.

  • Units or <Varname>_Units: str, required. Units of data.

  • VarField``or ``<Varname>_VarField: str, required. Variable column in data file.

  • DtField``or ``<Varname>_DtField: str, required. Date-time column in data file

  • File``or ``<Varname>_File: str, required. Name or path to data time series csv file.

Ex**Examples

set_data(df_info, src_dir=None, filter_dates=None)[source]#

Set data for the time series collection from a info class:pandas.DataFrame.

Parameters:
  • df_info (class:pandas.DataFrame) – class:pandas.DataFrame containing metadata information for the time series collection. This DataFrame is expected to have matching fields to the metadata keys.

  • src_dir (str) – Path for inputs directory in the case for only file names in File column.

  • filter_dates (str) – List of Start and End dates for filter data

Notes

  • The set_data method populates the time series collection with data based on the provided DataFrame.

  • It creates time series objects, loads data, and performs additional processing steps.

  • Adjust skip_process according to your data processing needs.

Examples

ts_collection.set_data(df, "path/to/data", filter_dates=["2020-01-01 00:00:00", "2020-03-12 00:00:00"])
clear_outliers()[source]#

Clear outliers in the collection based on the datarange_min and datarange_max attributes.

merge_data()[source]#

Merge data from multiple sources into a single DataFrame.

Returns:

DataFrame A merged DataFrame with datetime and variable fields from different sources.

Return type:

pandas.DataFrame

Notes - Updates the catalog details. - Merges data from different sources based on the specified datetime field and variable field. - The merged DataFrame includes a date range covering the entire period.

standardize()[source]#

Standardize the time series data.

This method standardizes all time series objects in the collection.

Notes

  • The method iterates through each time series in the collection and standardizes it.

  • After standardizing individual time series, the data is merged.

  • The merged data is then reset for each individual time series in the collection.

  • Epoch statistics are updated for each time series after the reset.

  • Finally, the collection catalog is updated with details.

Examples

merge_local_epochs()[source]#

Merge local epochs statistics from individual time series within the collection.

Returns:

Merged pandas.DataFrame`` containing epochs statistics.

Return type:

pandas.DataFrame

Notes

  • This method creates an empty list to store individual epochs statistics dataframes.

  • It iterates through each time series in the collection.

  • For each time series, it updates the local epochs statistics using the :meth:update_epochs_stats method.

  • The local epochs statistics dataframes are then appended to the list.

  • Finally, the list of dataframes is concatenated into a single dataframe.

Examples

ts_collection = TimeSeriesCollection() merged_epochs = ts_collection.merge_local_epochs()

get_epochs()[source]#

Calculate epochs for the time series data.

Returns:

DataFrame with epochs information.

Return type:

pandas.DataFrame

Notes

  • This method merges the time series data to create a working DataFrame.

  • It creates a copy of the DataFrame for NaN-value calculation.

  • Converts non-NaN values to 1 and NaN values to 0.

  • Calculates the sum of non-NaN values for each row and updates the DataFrame with the result.

  • Extracts relevant columns for epoch calculation.

  • Sets 0 values in the specified overfield column to NaN.

  • Creates a new TimeSeries` instance for epoch calculation using the overfield values.

  • Calculates epochs using the :meth:get_epochs method of the new TimeSeries` instance.

  • Updates the original DataFrame with the calculated epochs.

Examples

epochs_data = ts_instance.get_epochs()

_set_view_specs()[source]#
view(show=True, folder='./output', filename=None, dpi=300, fig_format='jpg', suff='', usealias=False)[source]#

Visualize the time series collection.

Parameters:
  • show (bool) – bool, optional If True, the plot will be displayed interactively. If False, the plot will be saved to a file. Default is True.

  • folder (str) – str, optional The folder where the plot file will be saved. Used only if show is False. Default is “./output”.

  • filename (str or None) – str, optional The base name of the plot file. Used only if show is False. If None, a default filename is generated. Default is None.

  • dpi (int) – int, optional The dots per inch (resolution) of the plot file. Used only if show is False. Default is 300.

  • fig_format (str) – str, optional The format of the plot file. Used only if show is False. Default is “jpg”.

  • usealias – bool, optional Option for using the Alias instead of Name in the plot. Default is False.

Notes

This function generates a scatter plot with colored epochs based on the epochs’ start and end times. The plot includes data points within each epoch, and each epoch is labeled with its corresponding ID.

Examples

export_views(folder, dpi=300, fig_format='jpg', suff='', skip_main=False, raw=False)[source]#

Export views of time series data and individual time series within the collection.

Parameters:
  • folder (str) – str The folder path where the views will be exported.

  • dpi (int) – int, optional Dots per inch (resolution) for the exported images, default is 300.

  • fig_format (str) – str, optional Format for the exported figures, default is “jpg”.

  • suff (str) – str, optional Suffix to be appended to the exported file names, default is an empty string.

  • skip_main (bool) – bool, optional Option for skipping the main plot (pannel)

  • raw (str) – bool, optional Option for considering a raw data series. No epochs analysis. Default is False.

Notes

  • Updates the collection details and epoch statistics.

  • Calls the view method for the entire collection and individual time series with specified parameters.

  • Sets view specifications for individual time series, such as y-axis limits and time range.

Examples

tscoll.export_views(folder=”/path/to/export”, dpi=300, fig_format=”jpg”, suff=”_views”)

export_data(folder, filename=None, merged=True)[source]#
class plans.datasets.core.TimeSeriesCluster(name='myTimeSeriesCluster', base_object=None)[source]#

Bases: TimeSeriesCollection

The TimeSeriesCluster instance is desgined for holding a collection of same variable time series. That is, no miscellaneus data is allowed.

__init__(name='myTimeSeriesCluster', base_object=None)[source]#

Deploy the time series collection data structure.

Parameters:
  • name (str) – Name of the time series collection. Default is “myTSCollection”.

  • base_object (TimeSeries or None) – Base object for the time series collection. If None, a default TimeSeries object is created. Default is None.

Notes

  • If base_object is not provided, a default TimeSeries object is created.

class plans.datasets.core.TimeSeriesSamples(name='myTimeSeriesSamples', base_object=None)[source]#

Bases: TimeSeriesCluster

The TimeSeriesSamples instance is desgined for holding a collection of same variable time series arising from the same underlying process. This means that all elements in the collection are statistical data.

This instance allows for the reducer() method.

__init__(name='myTimeSeriesSamples', base_object=None)[source]#

Deploy the time series collection data structure.

Parameters:
  • name (str) – Name of the time series collection. Default is “myTSCollection”.

  • base_object (TimeSeries or None) – Base object for the time series collection. If None, a default TimeSeries object is created. Default is None.

Notes

  • If base_object is not provided, a default TimeSeries object is created.

reducer(reducer_funcs=None, stepwise=False)[source]#
mean()[source]#
rng()[source]#
std()[source]#
min()[source]#
max()[source]#
percentile(p=90)[source]#
percentiles(values=None)[source]#
stats(basic=False)[source]#
class plans.datasets.core.TimeSeriesSpatialSamples(name='myTimeSeriesSpatialSample', base_object=None)[source]#

Bases: TimeSeriesSamples

The TimeSeriesSpatialSamples instance is desgined for holding a collection of same variable time series arising from the same underlying process in space. This means that all elements in the collection are statistical data in space.

This instance allows for the regionalize() method.

__init__(name='myTimeSeriesSpatialSample', base_object=None)[source]#

Deploy the time series collection data structure.

Parameters:
  • name (str) – Name of the time series collection. Default is “myTSCollection”.

  • base_object (TimeSeries or None) – Base object for the time series collection. If None, a default TimeSeries object is created. Default is None.

Notes

  • If base_object is not provided, a default TimeSeries object is created.

get_weights_by_name(name, method='average')[source]#
regionalize(method='average')[source]#

Regionalize the time series data using a specified method.

Parameters:

method (str) – str, optional Method for regionalization, default is “average”.

Notes

  • This method handles standardization. If the time series data is not standardized, it applies standardization.

  • Computes epochs for the time series data.

  • Iterates through each time series in the collection and performs regionalization.

  • For each time series, sets up source and destination vectors, computes weights, and calculates regionalized values.

  • Updates the destination column in-place with the regionalized values and updates epochs statistics.

  • Updates the collection catalog with details.

class plans.datasets.core.Raster(name='myRasterMap', alias='Rst', dtype='float32')[source]#

Bases: DataSet

The basic Raster map dataset.

__init__(name='myRasterMap', alias='Rst', dtype='float32')[source]#

Deploy a basic raster map object.

Parameters:
  • name (str) – Map name, defaults to “myRasterMap”

  • dtype (str) – Data type of raster cells. Options: byte, uint8, int16, int32, float32, etc., defaults to “float32”

_set_fields()[source]#

Set fields names. Expected to increment superior methods.

_set_view_specs()[source]#

Set default view specs.

get_metadata()[source]#

Get a dictionary with object metadata. Expected to increment superior methods.

Note

Metadata does not necessarily inclue all object attributes.

Returns:

dictionary with all metadata

Return type:

dict

update()[source]#

Refresh all mutable attributes based on data (includins paths). Expected to be incremented downstream.

set_data(grid)[source]#

Set the data grid for the raster object. This function allows setting the data grid for the raster object. The incoming grid should be a NumPy array.

Parameters:

grid (numpy.ndarray) – The data grid to be set for the raster.

Notes

  • The function overwrites the existing data grid in the raster object with the incoming grid, ensuring that the data type matches the raster’s dtype.

  • Nodata values are masked after setting the grid.

set_raster_metadata(metadata)[source]#

Set metadata for the raster object based on incoming metadata. This function allows setting metadata for the raster object from an incoming metadata dictionary. The metadata should include information such as the number of columns, number of rows, corner coordinates, cell size, and nodata value.

Parameters:

metadata (dict) – A dictionary containing metadata for the raster.

load_data(file_data, file_prj=None, id_band=1)[source]#

Load data and metadata from files to the Raster object.

Parameters:
  • file_data (str) – The path to the raster file.

  • file_prj (str) – The path to the ‘.prj’ projection file. If not provided, an attempt is made to use the same path and name as the .asc file with the ‘.prj’ extension.

  • id_band (int) – Band id to read for GeoTIFF. Default value = 1

load_metadata(file_data)[source]#

Load only metadata from files to the raster object.

Parameters:

file_data (str) – The path to the raster file.

load_image(file_input, xxl=False)[source]#

Load data from an image ‘.tif’ raster files.

Parameters:
  • file_input (str) – The file path of the ‘.tif’ raster file.

  • xxl (bool) – option flag for very large images

Notes

  • The function uses the Pillow (PIL) library to open the ‘.tif’ file and converts it to a NumPy array.

  • Metadata may need to be provided separately, as this function focuses on loading raster data.

  • The loaded data grid is set using the set_grid method of the raster object.

load_tif(file_input, id_band=1)[source]#

Load data and metadata from .tif raster file.

Parameters:

file_input (str) – The file path to the .tif raster file.

load_tif_metadata(file_input, id_band=1)[source]#

Load only metadata from .tif raster file.

Parameters:

file_input (str) – The file path to the .tif raster file.

load_asc(file_input)[source]#

Load data and metadata from .asc raster file.

Parameters:

file_input (str) – The file path to the .asc raster file.

load_asc_metadata(file_input)[source]#

Load only metadata from .asc raster files.

Parameters:

file_input (str) – The file path to the .asc raster file.

load_prj(file_input)[source]#

Load ‘.prj’ auxiliary file to the ‘prj’ attribute.

Parameters:

file_input (str) – The file path to the ‘.prj’ auxiliary file.

copy_structure(raster_ref, n_nodatavalue=None)[source]#

Copy structure (metadata and prj) from another raster object.

Parameters:
  • raster_ref (datasets.Raster) – The reference incoming raster object from which to copy.

  • n_nodatavalue (float) – The new nodata value for different raster objects. If None, the nodata value remains unchanged.

export(folder, filename=None, mode='tif')[source]#

Exports the raster to a specified file format and location.

Parameters:
  • folder (str) – The destination folder for the exported file.

  • filename (str) – [optional] The name of the output file. If None, the original raster name is used.

  • mode (str) – The export format, either “tif” (default) or “asc”. Default value = “tif”

export_tif(folder, filename=None)[source]#

Export an .tif raster file..

Parameters:
  • folder (str) – The directory path to export the raster file.

  • filename (str) – The name of the exported file without extension. If None, the name of the raster object is used.

Returns:

The full file name (path and extension) of the exported raster file.

Return type:

str

export_asc(folder, filename=None)[source]#

Export an .asc raster file.

Parameters:
  • folder (str) – The directory path to export the raster file.

  • filename (str) – The name of the exported file without extension. If None, the name of the raster object is used.

Returns:

The full file name (path and extension) of the exported raster file.

Return type:

str

export_prj(folder, filename=None)[source]#

Export a ‘.prj’ file. This function exports the coordinate system information to a ‘.prj’ file in the specified folder.

Parameters:
  • folder (str) – The directory path to export the ‘.prj’ file.

  • filename (str) – The name of the exported file without extension. If None, the name of the raster object is used.

Returns:

The full file name (path and extension) of the exported ‘.prj’ file, or None if no coordinate system information is available.

Return type:

str or None

reset_nodata(new_nodata, ensure=True)[source]#

Resets the no-data value in the raster metadata and updates the data mask accordingly.

This method first ensures the current no-data values are masked, then updates the NODATA_value in the raster metadata, and finally re-applies the mask based on the new no-data value.

Parameters:
  • new_nodata (int or float) – The new no-data value to set.

  • ensure (bool) – If True, ensures the current no-data values are masked before resetting. Default value = True

mask_nodata()[source]#

Mask grid cells as NaN where data is NODATA.

Notes

  • The function masks grid cells as NaN where the data is equal to the specified NODATA value.

  • If NODATA value is not set, no masking is performed.

__insert_nodata()#

Set grid cells as NODATA where data is NaN.

insert_nodata()[source]#

Set grid cells as NODATA where data is NaN using the static method.

rebase_grid(base_raster, inplace=False, method='linear_model')[source]#

Rebase the grid of a raster. This function creates a new grid based on a provided reference raster. Both rasters are expected to be in the same coordinate system and have overlapping bounding boxes.

Parameters:
  • base_raster (datasets.Raster) – The reference raster used for rebase. It should be in the same coordinate system and have overlapping bounding boxes.

  • inplace (bool) – If True, the rebase operation will be performed in-place, and the original raster’s grid will be modified. If False, a new rebased grid will be returned, and the original data will remain unchanged. Default is False.

  • method (str) – Interpolation method for rebasing the grid. Options include “linear_model,” “nearest,” and “cubic.” Default is “linear_model.”

Returns:

If inplace is False, a new rebased grid as a NumPy array. If inplace is True, returns None, and the original raster’s grid is modified in-place.

Return type:

numpy.ndarray` or None

Notes

  • The rebase operation involves interpolating the values of the original grid to align with the reference raster’s grid.

  • The method parameter specifies the interpolation method and can be “linear_model,” “nearest,” or “cubic.”

  • The rebase assumes that both rasters are in the same coordinate system and have overlapping bounding boxes.

load_aoi_mask(file_raster, inplace=False)[source]#

Loads an Area of Interest (AOI) mask from a raster file and applies it to the current object’s data.

Parameters:
  • file_raster (str) – The file path to the AOI raster.

  • inplace (bool) – If True, the mask is applied in-place to the current object’s data. Default value = False

apply_aoi_mask(grid_aoi, inplace=False)[source]#

Apply AOI (area of interest) mask to the raster map. This function applies an AOI (area of interest) mask to the raster map, replacing values outside the AOI with the NODATA value.

Notes The function replaces values outside the AOI (where grid_aoi is 0) with the NODATA value. If NODATA value is not set, no replacement is performed. If inplace is True, the main grid is modified. If False, a backup of the grid is created before modification. This function is useful for focusing analysis or visualization on a specific area within the raster map.

Parameters:
  • grid_aoi (numpy.ndarray) – Map of AOI (masked array or pseudo-boolean). Expected to have the same grid shape as the raster.

  • inplace (bool) – If True, overwrite the main grid with the masked values. If False, create a backup and modify a copy of the grid. Default is False.

release_aoi_mask()[source]#

Release AOI mask from the main grid. Backup grid is restored.

This function releases the AOI (area of interest) mask from the main grid, restoring the original values from the backup grid.

Notes If an AOI mask has been applied, this function restores the original values to the main grid from the backup grid. If no AOI mask has been applied, the function has no effect. After releasing the AOI mask, the backup grid is set to None, and the raster object is no longer considered to have an AOI mask.

cut_edges(upper, lower, inplace=False)[source]#

Cutoff upper and lower values of the raster grid.

Notes Values in the raster grid below the lower value are set to the lower value. Values in the raster grid above the upper value are set to the upper value. If inplace is False, a processed copy of the grid is returned, leaving the original grid unchanged. This function is useful for clipping extreme values in the raster grid.

Parameters:
  • upper (float or int) – The upper value for the cutoff.

  • lower (float or int) – The lower value for the cutoff.

  • inplace (bool) – If True, modify the main grid in-place. If False, create a processed copy of the grid. Default is False.

Returns:

The processed grid if inplace is False. If inplace is True, returns None.

Return type:

Union[None, np.ndarray]

get_bbox()[source]#

Get the Bounding Box of the map.

Returns:

Dictionary of xmin, xmax, ymin, and ymax. - “xmin” (float): Minimum x-coordinate. - “xmax” (float): Maximum x-coordinate. - “ymin” (float): Minimum y-coordinate. - “ymax” (float): Maximum y-coordinate.

Return type:

dict

get_extent()[source]#

Get the Extent of the map. See get_bbox.

Returns:

list of [xmin, xmax, ymin, ymax]

Return type:

list

get_grid_datapoints(drop_nan=False)[source]#

Get flat and cleared grid data points (x, y, and z).

Notes This function extracts coordinates (x, y, and z) from the raster grid. The x and y coordinates are determined based on the grid cell center positions. If drop_nan is True, nan values are ignored in the resulting DataFrame. The resulting DataFrame includes columns for x, y, z, i, and j coordinates.

Parameters:

drop_nan (bool) – Option to ignore nan values.

Returns:

DataFrame of x, y, and z fields.

Return type:

pandas.DataFrame`` or None. If the grid is None, returns None.

get_grid_data()[source]#

Get flat and cleared grid values.

Returns:

1D vector of cleared sample.

Return type:

numpy.ndarray` or None. If the grid is None, returns None.

Notes

  • This function extracts and flattens the grid, removing any masked or NaN values.

  • For integer grids, the masked values are ignored.

  • For floating-point grids, both masked and NaN values are ignored.

get_univar()[source]#

Creates and returns a Univar object initialized with the current object’s grid data.

Returns:

A Univar object containing the grid data for univariate analysis.

Return type:

plans.analyst.Univar

get_grid_stats()[source]#

Get basic statistics from flat and cleared grid.

Returns:

DataFrame of basic statistics. If the grid is None, returns None.

Return type:

pandas.DataFrame`` or None

get_aoi(by_value_lo, by_value_hi)[source]#

Get the AOI map from an interval of values (values are expected to exist in the raster).

Parameters:
  • by_value_lo (float) – Number for the lower bound (inclusive).

  • by_value_hi (float) – Number for the upper bound (inclusive).

Returns:

AOI map.

Return type:

AOI` object

Notes

  • This function creates an AOI (Area of Interest) map based on a specified value range.

  • The AOI map is constructed as a binary grid where values within the specified range are set to 1, and others to 0.

_plot(fig, gs, specs)[source]#

Generates a plot visualizing the Raster data.

Parameters:
  • fig (matplotlib.figure.Figure) – The matplotlib figure object.

  • gs (matplotlib.gridspec.GridSpec) – The matplotlib gridspec object for arranging subplots.

  • specs (dict) – A dictionary containing plotting specifications and options.

Returns:

The modified matplotlib figure object with the plots.

Return type:

matplotlib.figure.Figure

view(show=True, return_fig=False, helper_geometry=None)[source]#

Displays or returns a visualization of the spatial data.

Parameters:
  • show (bool) – If True, the plot is displayed. Default value = True

  • return_fig (bool) – If True, the matplotlib figure object is returned. Default value = False

  • helper_geometry (object) – [optional] An optional geometry object to overlay on the map.

Returns:

The matplotlib figure object if return_fig is True, otherwise None.

Return type:

matplotlib.figure.Figure or None

static plot_metadata(fig, metadata, x=0.0, y=0.1)[source]#

Adds raster metadata as text annotations to a matplotlib figure.

Parameters:
  • fig (matplotlib.figure.Figure) – The matplotlib figure object to which metadata will be added.

  • metadata (dict) – A dictionary containing raster metadata (e.g., ‘nrows’, ‘ncols’, ‘cellsize’, ‘xllcorner’, ‘yllcorner’, ‘NODATA_value’).

  • x (float) – The x-coordinate (figure fraction) for the left-most column of metadata. Default value = 0.0

  • y (float) – The y-coordinate (figure fraction) for the top row of metadata. Default value = 0.1

Returns:

The modified matplotlib figure object.

Return type:

matplotlib.figure.Figure

static read_tif_metadata(file_input, n_band=1)[source]#

Read raster metadata from a file.

Parameters:
  • file_input (str) – Path to the input raster file.

  • n_band (int) – [optional] Band number to read. Default value = 1

Returns:

Dictionary containing raster metadata.

Return type:

dict

static read_tif(file_input, dtype='float', id_band=1, metadata=True)[source]#

Read a raster band from a file.

Parameters:
  • file_input (str) – Path to the input raster file.

  • dtype (str) – Data type for the output grid. Default value = “float”

  • id_band (int) – Band id to read. Default value = 1

  • metadata (bool) – Whether to include metadata in the output dictionary. Default value = True

Returns:

Dictionary containing the raster grid and optionally its metadata.

Return type:

dict

static write_tif(grid_output, dc_metadata, file_output, dtype='float32', n_bands=1, id_band=1)[source]#

Write a raster band to a file.

Parameters:
  • grid_output (numpy.ndarray) – The grid data to write.

  • dc_metadata (dict) – Dictionary containing the raster metadata.

  • file_output (str) – Path to the output raster file.

  • dtype (str) – Data type alias for the output grid (numpy standard). Default value = “float32”

  • n_bands (int) – Number of bands in the output raster. Default value = 1

  • id_band (int) – Band ID to write the data to. Default value = 1

Returns:

Path to the output raster file. (echo)

Return type:

str

static read_asc_metadata(file_input)[source]#

Reads metadata from an ASCII raster file.

Parameters:

file_input (str) – Path to the input ASCII file.

Returns:

A dictionary containing the metadata.

Return type:

dict

static read_asc(file_input, dtype='float32', metadata=True)[source]#

Reads an ASCII raster file into a dictionary.

Parameters:
  • file_input (str) – Path to the input ASCII file.

  • dtype (str) – Data type for the raster data. Default value = “float32”

  • metadata (bool) – Whether to read and include metadata from the ASCII file. Default value = True

Returns:

A dictionary containing the raster data and optionally its metadata.

Return type:

dict

static write_asc(grid_output, dc_metadata, file_output, dtype='float32')[source]#

Writes a raster grid and its metadata to an ASCII file.

Parameters:
  • grid_output (numpy.ndarray) – The raster data to write.

  • dc_metadata (dict) – Dictionary containing the metadata for the ASCII file.

  • file_output (str) – Path for the output ASCII file.

  • dtype (str) – Data type for the raster data in the output file. Default value = “float32”

Returns:

The path of the generated output ASCII file.

Return type:

str

static apply_nodata(grid_input, nodatavalue=None)[source]#

Applies a nodata value to the input grid.

Parameters:
  • grid_input (numpy.ndarray) – The input grid.

  • nodatavalue (int or float) – [optional] The nodata value to apply. Default value = None

Returns:

The grid with the nodata value applied.

Return type:

numpy.ndarray

static make_square(grid_input)[source]#

Reshapes a 2D input grid into a square array, padding with NaNs or masked values if necessary.

Parameters:

grid_input (numpy.ndarray) – The input 2D array (grid).

Returns:

A square array containing the original grid, padded with NaNs or masked values.

Return type:

numpy.ndarray

class plans.datasets.core.SciRaster(name='MySciRaster', alias=None)[source]#

Bases: Raster

__init__(name='MySciRaster', alias=None)[source]#

Initializes a new instance of the SciRaster class.

Parameters:
  • name (str) – The name of the scientific raster. Default value = “MySciRaster”

  • alias (str) – [optional] An alias for the scientific raster.

_set_view_specs()[source]#

Sets the default viewing specifications for the scientific raster, including the default data range.

_overwrite_nodata()[source]#

Overwrite Nodata in metadata is set by default

set_raster_metadata(metadata)[source]#

Sets the raster metadata for the object and overwrites the no-data value based on the new metadata.

Parameters:

metadata (dict) – A dictionary containing the raster metadata.

class plans.datasets.core.QualiRaster(name='QualiMap', dtype='uint8')[source]#

Bases: Raster

Basic qualitative raster map dataset. todo [docstring] – examples

__init__(name='QualiMap', dtype='uint8')[source]#

Initialize dataset.

Parameters:
  • name (str) – name of map

  • dtype (str) – data type of raster cells, defaults to

_set_fields()[source]#

Set fields names. Expected to increment superior methods.

_overwrite_nodata()[source]#

Overwrite Nodata in metadata is set by default

set_raster_metadata(metadata)[source]#

Sets the raster metadata for the object and overwrites the no-data value based on the new metadata.

Parameters:

metadata (dict) – A dictionary containing the raster metadata.

rebase_grid(base_raster, inplace=False)[source]#

Rebases the grid of the current object to match a base raster’s grid.

This method calls the rebase_grid method of the superclass to perform the grid rebasement using the “nearest” interpolation method.

Parameters:
  • base_raster (Raster) – The raster object to rebase against.

  • inplace (bool) – If True, the operation is performed in-place. Default value = False

Returns:

The rebased object.

Return type:

Raster

reclassify(dict_ids, df_new_table, talk=False)[source]#

Reclassify QualiRaster Ids in grid and table

Parameters:
  • dict_ids (dict) – dictionary to map from “Old_Id” to “New_id”

  • df_new_table (pandas.DataFrame) – new table for QualiRaster

  • talk (bool) – option for printing messages

load_data(file_data, file_table=None, file_prj=None, id_band=1)[source]#

Load data from files to the raster object.

Parameters:
  • file_data (str) – The path to the raster file.

  • file_table (str) – path to table file

  • file_prj (str) – The path to the ‘.prj’ projection file. If not provided, an attempt is made to use the same path and name as the .asc file with the ‘.prj’ extension.

  • id_band (int) – Band id to read for GeoTIFF. Default value = 1

load_table(file_table)[source]#

Load attributes dataframe from table file.

Parameters:

file_table (str) – path to to file

export(folder, filename=None)[source]#

Export raster sample

Parameters:
  • folder (str) – path to folder,

  • filename (str) – string of file without extension, defaults to None

export_table(folder, filename=None)[source]#

Export table file.

Parameters:
  • folder (str) – path to folde

  • filename (str) – string of file without extension

Returns:

full file name (path to and extension) string

Return type:

str

set_table(dataframe)[source]#

Set attributes dataframe from incoming pandas.DataFrame.

Parameters:

dataframe (pandas.DataFrame) – incoming pandas dataframe

clear_table()[source]#

Clear the unfound values in the map from the table.

set_random_colors()[source]#

Set random colors to attribute table.

get_areas(inplace=False)[source]#

Get areas in map of each category in table.

Parameters:

inplace (bool, defaults to False) – option to merge data with raster table

Returns:

areas dataframe

Return type:

pandas.DataFrame

get_zonal_stats(raster_sample, merge=False, skip_count=False)[source]#

Get zonal stats from other raster map to sample.

Parameters:
  • raster_sample (datasets.Raster) – raster map to sample

  • merge (bool) – option to merge data with raster table, defaults to False

  • skip_count (bool) – set True to skip count, defaults to False

Returns:

dataframe of zonal stats

Return type:

pandas.DataFrame

get_aoi(by_value_id=None)[source]#

Get the AOI map from a specific value id (value is expected to exist in the raster) :param by_value_id: category id value :type by_value_id: int :return: AOI map :rtype: AOI` object

apply_values(table_field)[source]#
get_metadata()[source]#

Get all metadata from base_object

Returns:

metadata

Return type:

dict

_set_view_specs()[source]#

Get default view specs

_plot(fig, gs, specs)[source]#

Generates a plot visualizing the spatial data and its area distribution.

This method creates a figure with a map of the spatial data and a horizontal bar chart showing the percentage area of each unique class. It can aggregate smaller classes into an “others” category and displays raster metadata.

Parameters:
  • fig (matplotlib.figure.Figure) – The matplotlib figure object.

  • gs (matplotlib.gridspec.GridSpec) – The matplotlib gridspec object for arranging subplots.

  • specs (dict) – A dictionary containing plotting specifications and options.

Returns:

The modified matplotlib figure object with the plots.

Return type:

matplotlib.figure.Figure

view(show=True, return_fig=False, helper_geometry=None)[source]#

Displays or returns a visualization of the spatial data and its area distribution.

This method orchestrates the plotting process by setting up figure specifications, calling the internal plotting function (_plot), and then either displaying or saving the generated figure.

Parameters:
  • show (bool) – If True, the plot is displayed. Default value = True

  • return_fig (bool) – If True, the matplotlib figure object is returned. Default value = False

  • helper_geometry (object) – [optional] An optional geometry object to overlay on the map.

Returns:

The matplotlib figure object if return_fig is True, otherwise None.

Return type:

matplotlib.figure.Figure or None

class plans.datasets.core.QualiHard(name='qualihard')[source]#

Bases: QualiRaster

A Quali-Hard is a hard-coded qualitative map (that is, the table is pre-set) todo [docstring] – examples

__init__(name='qualihard')[source]#

Initialize dataset.

Parameters:
  • name (str) – name of map

  • dtype (str) – data type of raster cells, defaults to

get_table()[source]#

Retrieves a sample DataFrame representing a classification table.

Returns:

A DataFrame with sample classification data.

Return type:

pandas.DataFrame

load_data(file_data, file_prj=None, id_band=1, file_table=None)[source]#

Load data from file to the raster object.

Parameters:
  • file_data (str) – The path to the raster file.

  • file_prj (str) – The path to the ‘.prj’ projection file. If not provided, an attempt is made to use the same path and name as the .asc file with the ‘.prj’ extension.

  • id_band (int) – Band id to read for GeoTIFF. Default value = 1

class plans.datasets.core.Zones(name='ZonesMap')[source]#

Bases: QualiRaster

Zones map dataset is a QualiRaster designed to handle large volume of positive integer numbers (ids of zones) todo [docstring] – examples

__init__(name='ZonesMap')[source]#

Initialize dataset.

Parameters:
  • name (str) – name of map

  • dtype (str) – data type of raster cells, defaults to

compute_table()[source]#

Computes an internal table summarizing unique values in the spatial data, assigns aliases, names, and sets up viewing specifications.

set_data(grid)[source]#

Sets the spatial data for the object and recomputes the internal table.

Parameters:

grid (numpy.ndarray) – The input grid data.

load_data(asc_file, prj_file)[source]#

Load data from files to raster

Parameters:
  • asc_file (str) – path to raster file

  • prj_file (str) – path to projection file

get_aoi(zone_id)[source]#

Get the AOI map from a zone id

Parameters:

zone_id (int) – number of zone ID

Returns:

AOI map

Return type:

AOI` object

view(show=True, folder='./output', filename=None, specs=None, dpi=150, fig_format='jpg')[source]#

Plot a basic pannel of raster map.

Parameters:
  • show (bool) – boolean to show plot instead of saving, defaults to False

  • folder (str) – path to output folder, defaults to ./output

  • filename (str) – name of file, defaults to None

  • specs (dict) – specifications dictionary, defaults to None

  • dpi (int) – image resolution, defaults to 96

  • fig_format (str) – image fig_format (ex: png or jpg). Default jpg

class plans.datasets.core.RasterCollection(name='myRasterCollection')[source]#

Bases: Collection

The raster collection base dataset. This data strucute is designed for holding and comparing Raster` objects.

__init__(name='myRasterCollection')[source]#

Deploy the raster collection data structure.

Parameters:

name (str) – name of raster collection

_set_fields()[source]#

Set fields names. Expected to increment superior methods.

load_data(name, file_data, file_prj=None, varname=None, varalias=None, units=None, datetime=None, dtype='float32', skip_grid=False)[source]#

Load a Raster` base_object from a raster file.

Parameters:
  • name (str) – Raster.name` name attribute

  • file_data (str) – path to raster file

  • varname (str) – Raster.varname` variable name attribute, defaults to None

  • varalias (str) – Raster.varalias` variable alias attribute, defaults to None

  • units (str) – Raster.units` units attribute, defaults to None

  • datetime (str) – Raster.date` date attribute, defaults to None

  • skip_grid (bool) – option for loading only the metadata

load_folder(folder, name_pattern, talk=False, file_format='tif', parallel=False, isseries=False)[source]#

Load all rasters from a folder by following a name pattern. Datetime is expected to be at the end of name before file extension.

Parameters:
  • folder (str) – path to folder

  • name_pattern (str) – name pattern. example map_*

  • talk (bool) – option for printing messages

  • file_format (str) – file extension.

  • parallel (bool) – flag to use parallel processing

is_same_grid()[source]#

Checks if all datasets in the catalog have the same grid dimensions (number of columns and rows).

Returns:

True if all datasets have the same grid dimensions, False otherwise.

Return type:

bool

reduce(reducer_func, reduction_name, extra_arg=None, skip_nan=False, talk=False)[source]#

This method reduces the collection by applying a numpy broadcasting function (example: np.mean)

Parameters:
  • reducer_func (numpy function) – reducer numpy function (example: np.mean)

  • reduction_name (str) – name for the output raster

  • extra_arg (any) – extra argument for function (example: np.percentiles) - Default: None

  • skip_nan (bool) – Option for skipping NaN values in map

  • talk (bool) – option for printing messages

Returns:

raster object based on the first object found in the collection

Return type:

Raster

to_mean(skip_nan=False, talk=False)[source]#

Reduce Collection to the Mean raster

Parameters:
  • skip_nan (bool) – Option for skipping NaN values in map

  • talk (bool) – option for printing messages

Returns:

raster object based on the first object found in the collection

Return type:

Raster

to_sd(skip_nan=False, talk=False)[source]#

Reduce Collection to the Standard Deviation raster

Parameters:
  • skip_nan (bool) – Option for skipping NaN values in map

  • talk (bool) – option for printing messages

Returns:

raster object based on the first object found in the collection

Return type:

Raster

to_min(skip_nan=False, talk=False)[source]#

Reduce Collection to the Min raster

Parameters:
  • skip_nan (bool) – Option for skipping NaN values in map

  • talk (bool) – option for printing messages

Returns:

raster object based on the first object found in the collection

Return type:

Raster

to_max(skip_nan=False, talk=False)[source]#

Reduce Collection to the Max raster

Parameters:
  • skip_nan (bool) – Option for skipping NaN values in map

  • talk (bool) – option for printing messages

Returns:

raster object based on the first object found in the collection

Return type:

Raster

to_sum(skip_nan=False, talk=False)[source]#

Reduce Collection to the Sum raster

Parameters:
  • skip_nan (bool) – Option for skipping NaN values in map

  • talk (bool) – option for printing messages

Returns:

raster object based on the first object found in the collection

Return type:

Raster

to_percentile(percentile, skip_nan=False, talk=False)[source]#

Reduce Collection to the Nth Percentile raster

Parameters:
  • percentile (float) – Nth percentile (from 0 to 100)

  • skip_nan (bool) – Option for skipping NaN values in map

  • talk (bool) – option for printing messages

Returns:

raster object based on the first object found in the collection

Return type:

Raster

to_median(skip_nan=False, talk=False)[source]#

Reduce Collection to the Median raster

Parameters:
  • skip_nan (bool) – Option for skipping NaN values in map

  • talk (bool) – option for printing messages

Returns:

raster object based on the first object found in the collection

Return type:

Raster

get_collection_stats()[source]#

Get basic statistics from collection.

Returns:

statistics sample

Return type:

pandas.DataFrame

get_views(show=False, folder='./output', dpi=300, fig_format='jpg', talk=False, specs=None, suffix=None)[source]#

Plot all basic pannel of raster maps in collection.

Parameters:
  • show (bool) – boolean to show plot instead of saving,

  • folder (str) – path to output folder, defaults to ./output

  • dpi (int) – image resolution, defaults to 96

  • fig_format (str) – image fig_format (ex: png or jpg). Default jpg

  • talk (bool) – option for print messages

view_bboxes(colors=None, datapoints=False, show=True, folder='./output', filename=None, dpi=150, fig_format='jpg')[source]#

View Bounding Boxes of Raster collection

Parameters:
  • colors (list) – list of colors for plotting. expected to be the same runsize of catalog

  • datapoints (bool) – option to plot datapoints as well, defaults to False

  • show (bool) – option to show plot instead of saving, defaults to False

  • folder (str) – path to output folder, defaults to ./output

  • filename (str) – name of file, defaults to None

  • dpi (int) – image resolution, defaults to 96

  • fig_format (str) – image fig_format (ex: png or jpg). Default jpg

Return type:

none

get_catalog(mode='full')[source]#

Retrieves the data catalog in different modes.

Parameters:

mode (str) – The mode of the catalog to retrieve. Can be “full” for the complete catalog, “short” for a truncated version, or any other value to filter by a list ls. Default value = “full”

Returns:

The requested data catalog.

Return type:

pandas.DataFrame

class plans.datasets.core.QualiRasterCollection(name)[source]#

Bases: RasterCollection

The raster collection base dataset.

This data strucute is designed for holding and comparing QualiRaster` objects.

__init__(name)[source]#

Deploy Qualitative Raster Series

Parameters:
  • name (str) – RasterSeries.name` name attribute

  • varname (str) – Raster.varname` variable name attribute, defaults to None

  • varalias (str) – Raster.varalias` variable alias attribute, defaults to None

load_data(name, file_data, file_table=None, prj_file=None)[source]#

Load a QualiRaster` from file.

Parameters:
  • name (str) – Raster.name` name attribute

  • file_data (str) – path to raster file.

  • file_table (str) – path to table file

  • prj_file (str) – path to projection file

class plans.datasets.core.RasterSeries(name, varname, varalias, units, dtype='float32')[source]#

Bases: RasterCollection

A RasterCollection` where datetime matters and all maps in collections are expected to be the same variable, same projection and same grid.

__init__(name, varname, varalias, units, dtype='float32')[source]#

Deploy RasterSeries

Parameters:
  • name (str) – RasterSeries.name` name attribute

  • varname (str) – Raster.varname` variable name attribute, defaults to None

  • varalias (str) – Raster.varalias` variable alias attribute, defaults to None

  • units (str) – Raster.units` units attribute, defaults to None

load_data(name, datetime, file_data, prj_file=None, dtype='float32', skip_grid=False)[source]#

Load a Raster` object from raster file.

Parameters:
  • name (str) – Raster.name` name attribute

  • datetime (str) – Raster.date` date attribute, defaults to None

  • file_data (str) – path to raster file

  • prj_file (str) – path to projection file

  • skip_grid (bool) – option for loading only the metadata

load_folder(folder, name_pattern, talk=False, file_format='tif', parallel=False)[source]#

Load all rasters from a folder by following a name pattern. Datetime is expected to be at the end of name before file extension.

Parameters:
  • folder (str) – path to folder

  • name_pattern (str) – name pattern. example map_*

  • talk (bool) – option for printing messages

  • file_format (str) – file extension.

  • parallel (bool) – flag to use parallel processing

apply_aoi_masks(grid_aoi, inplace=False)[source]#

Batch method to apply AOI mask over all maps in collection

Parameters:
  • grid_aoi (numpy.ndarray) – aoi grid

  • inplace (bool) – overwrite the main grid if True, defaults to False

release_aoi_masks()[source]#

Batch method to release the AOI mask over all maps in collection

rebase_grids(base_raster, talk=False)[source]#

Batch method for rebase all maps in collection

Parameters:
  • base_raster (datasets.Raster) – base raster for rebasing

  • talk (bool) – option for print messages

get_series_stats()[source]#

Get the raster series statistics

Returns:

dataframe of raster series statistics

Return type:

pandas.DataFrame

view_series_stats(statistic='mean', folder='./output', filename=None, specs=None, show=True, dpi=150, fig_format='jpg')[source]#

View raster series statistics

Parameters:
  • statistic (str) – statistc to view. Default mean

  • show (bool) – option to show plot instead of saving, defaults to False

  • folder (str) – path to output folder, defaults to ./output

  • filename (str) – name of file, defaults to None

  • specs (dict) – specifications dictionary, defaults to None

  • dpi (int) – image resolution, defaults to 96

  • fig_format (str) – image fig_format (ex: png or jpg). Default jpg

class plans.datasets.core.QualiRasterSeries(name, varname, varalias, dtype='uint8')[source]#

Bases: RasterSeries

A RasterSeries where date matters and all maps in collections are expected to be QualiRaster` with the same variable, same projection and same grid.

__init__(name, varname, varalias, dtype='uint8')[source]#

Deploy Qualitative Raster Series

Parameters:
  • name (str) – RasterSeries.name` name attribute

  • varname (str) – Raster.varname variable name attribute, defaults to None

  • varalias (str) – Raster.varalias variable alias attribute, defaults to None

update_table(clear=True)[source]#

Update series table (attributes)

Parameters:

clear (bool) – option for clear table from unfound values. default: True

append(raster)[source]#

Append a Raster base_object to collection. Pre-existing objects with the same Raster.name attribute are replaced

Parameters:

raster (Raster) – incoming Raster to append

load_data(name, datetime, file_data, prj_file=None, table_file=None)[source]#

Load a QualiRaster base_object from raster file.

Parameters:
  • name (str) – Raster.name name attribute

  • datetime (str) – Raster.date date attribute

  • file_data (str) – path to raster file

  • prj_file (str) – path to projection file

  • table_file (str) – path to .txt table file

w_load_file(file_info)[source]#

Worker function to load a single file.

load_folder(folder, file_table, name_pattern, talk=False, file_format='tif', parallel=False)[source]#

Load all rasters from a folder by following a name pattern. Datetime is expected to be at the end of name before file extension.

Parameters:
  • folder (str) – path to folder

  • file_table (str) – path to file table

  • name_pattern (str) – name pattern. example map_*

  • talk (bool) – option for printing messages

  • file_format (str) – file extension.

  • parallel (bool) – flag to use parallel processing

get_series_areas()[source]#

Get areas prevalance for all series

Returns:

dataframe of series areas

Return type:

pandas.DataFrame

static view_series_areas(df_table, df_areas, specs=None, show=True, export_areas=True, folder='./output', filename=None, dpi=300, fig_format='jpg')[source]#

View series areas

Parameters:
  • specs (dict) – specifications dictionary, defaults to None

  • show (bool) – option to show plot instead of saving, defaults to False

  • folder (str) – path to output folder, defaults to ./output

  • filename (str) – name of file, defaults to None

  • dpi (int) – image resolution, defaults to 96

  • fig_format (str) – image fig_format (ex: png or jpg). Default jpg