pem.risk module#

Module for pre- and post-processing routines supporting the Habitat Risk Index workflow.

This module provides a collection of helper functions for organizing, rasterizing, and reprojecting spatial data layers used in the InVEST Habitat Risk model. It is intended to be executed within the QGIS Python environment and relies on its processing framework (processing algorithms, CRS handling, rasterization utilities, etc.).

See also

Refer to the InVEST Habitat Risk model documentation for theoretical and technical details regarding input data preparation for the Habitat Risk Index.

Requirements

The following libraries and environment are required:

  • QGIS 3 (Python environment)

  • numpy

  • pandas

  • geopandas

Overview

This module includes routines for:

  • Preparing and rasterizing stressor and habitat layers.

  • Reprojecting rasters and generating blank templates.

  • Splitting vector datasets into grouped layers.

  • Creating structured, time-stamped output directories for reproducible runs.

Each function performs a self-contained processing step designed to integrate smoothly with other spatial workflows in QGIS.

Examples

Scripted usage examples are provided in the docstrings of each function. No global examples are included at module level.

pem.risk.setup_stressors(output_folder, input_db, groups, reference_raster, is_blank=False, resolution=400)[source]#

Sets up stressor layers by rasterizing multiple vector layers from a database into single stressor rasters and reprojecting them to a specified resolution.

Note

This script is a utility for running the InVEST Habitat Risk model.

Parameters:
  • output_folder (str) – The base directory where a new run-specific folder for all outputs will be created.

  • input_db (str) – The path to the source vector database (e.g., GeoPackage) containing the stressor layers.

  • groups (dict) – A dictionary defining the stressor groups. Keys are the desired output stressor names (e.g., Coastal_Pollution), and values are dictionaries containing two keys: layers (a list of vector layer names to be combined) and buffer (the required buffer distance in meters for subsequent analysis).

  • reference_raster (str) – The path to a raster file whose extent, CRS, and other properties will be used as a template for the output stressor rasters.

  • is_blank (bool) – [optional] If True, the reference_raster is assumed to be a blank (zero-valued) template already, skipping the internal blanking step. Default value = False

  • resolution (float) – The desired resolution (cell size) for the final output stressor rasters. Default value = 400

Returns:

The path to the newly created run-specific output folder containing the stressor rasters and metadata.

Return type:

str

Notes

This function combines features from multiple vector layers into a single output stressor raster for each defined group.

  1. Template Raster: If is_blank is False, a blank raster is generated from the reference_raster to serve as the template.

  2. Rasterization Loop: For each group, the template raster is copied, and all vector layers listed under the group’s layers key are sequentially rasterized onto the copy using a burn value of 1 (features are present).

  3. Reprojection: The resulting raster is reprojected to the desired resolution (and default CRS of 5641).

  4. Metadata: An info_stressors.csv file is created, detailing the name, file path, and required STRESSOR BUFFER (meters) for each generated stressor raster.

Intermediate rasters are cleaned up at the end.

Script example

Warning

The following script is expected to be executed under the QGIS Python Environment with numpy, pandas and geopandas installed.

# WARNING: run this in QGIS Python Environment

import importlib.util as iu

# define the paths to this module
# ----------------------------------------
the_module = "path/to/risk.py"

spec = iu.spec_from_file_location("module", the_module)
module = iu.module_from_spec(spec)
spec.loader.exec_module(module)

# define the paths to input and output folders
# ----------------------------------------
input_dir = "path/to/dir"
output_dir = "path/to/dir"

# define the path to input database
# ----------------------------------------
input_db = f"{input_dir}/pem.gpkg"

# organize stressors groups
groups = {
    "MINERACAO": {
        "layers": ["mineracao_processos", "mineracao_areas_potenciais"],
        "buffer": 10000,
        "raster": None
    },
    "TURISMO": {
        "layers": ["turismo_atividades_esportivas_sul"],
        "buffer": 10000,
        "raster": None
    },
    "EOLICAS": {
        "layers": ["eolico_parques"],
        "buffer": 10000,
        "raster": None
    },
}

# call the function
# ----------------------------------------
output_file = module.setup_stressors(
    input_db=input_db,
    output_folder=output_dir,
    groups=groups,
    reference_raster=f"{input_dir}/raster.tif",
    is_blank=False,
    resolution=1000
)
pem.risk.setup_habitats(output_folder, input_db, input_layer, groups, field_name, reference_raster, is_blank=False, resolution=400)[source]#

Sets up habitat layers by splitting a vector layer, rasterizing each resulting habitat group, and reprojecting the rasters to a specified resolution.

Note

This script is a utility for running the InVEST Habitat Risk model.

Parameters:
  • output_folder (str) – The base directory where a new run-specific folder for all outputs will be created.

  • input_db (str) – The path to the source vector database (e.g., GeoPackage) containing the habitat features.

  • input_layer (str) – The name of the layer or table within the input_db to read the features from.

  • groups (dict) – A dictionary defining the habitat groups: keys are the desired output habitat names (layer names), and values are lists of string values from field_name to include in that habitat layer.

  • field_name (str) – The name of the attribute field in the input data used for grouping and querying the habitat features.

  • reference_raster (str) – The path to a raster file whose extent, CRS, and other properties will be used as a template for the output habitat rasters.

  • is_blank (bool) – [optional] If True, the reference_raster is assumed to be a blank (zero-valued) template already, skipping the internal blanking step. Default value = False

  • resolution (float) – The desired resolution (cell size) for the final output habitat rasters. Default value = 400

Returns:

The path to the newly created run-specific output folder containing the habitat rasters and metadata.

Return type:

str

Notes

This function orchestrates a multi-step process:

  1. Vector Split: Calls split_features to create a temporary GeoPackage where each habitat group is saved as a separate layer.

  2. Template Raster: If is_blank is False, a blank raster is generated from the reference_raster to serve as the template for rasterization.

  3. Rasterization Loop: Each habitat layer is individually rasterized onto a copy of the template raster, setting the habitat cells to a burn value of 1.

  4. Reprojection: The resulting raster is reprojected to the desired resolution (and default CRS of 5641).

  5. Metadata: An info_habitats.csv file is created, detailing the name and file path for each generated habitat raster.

Temporary files (split GeoPackage and intermediate rasters) are cleaned up at the end.

Script example

Warning

The following script is expected to be executed under the QGIS Python Environment with numpy, pandas and geopandas installed.

# WARNING: run this in QGIS Python Environment

import importlib.util as iu

# define the paths to this module
# ----------------------------------------
the_module = "path/to/risk.py"

spec = iu.spec_from_file_location("module", the_module)
module = iu.module_from_spec(spec)
spec.loader.exec_module(module)

# define the paths to input and output folders
# ----------------------------------------
input_dir = "path/to/dir"
output_dir = "path/to/dir"

# define the path to input database
# ----------------------------------------
input_db = f"{input_dir}/pem.gpkg"

# organize habitat groups
groups = {
    "MB3_MC3": ["MB3", "MC3"],
    "MB4_MB5_MB6": ["MB4", "MB5", "MB6"],
    "MC4_MC5_MC6": ["MC4", "MC5", "MC6"],
    "MD3": ["MD3"],
    "MD4_MD5_MD6": ["MD4", "MD5", "MD6"],
    "ME1": ["ME1"],
    "ME4_MF4_MF5": ["ME4", "MF4", "MF5"],
    "MG4_MG6": ["MG4", "MG6"],
}

# call the function
# ----------------------------------------
output_file = module.setup_habitats(
    input_db=input_db,
    output_folder=output_dir,
    input_layer="habitats_bentonicos_sul_v2",
    groups=groups,
    field_name="code",
    reference_raster=f"{input_dir}/raster.tif",
    resolution=1000
)
pem.risk.util_split_features(output_folder, input_db, input_layer, groups, field_name)[source]#

Splits features from a source GeoDataFrame into separate layers within a single GeoPackage file based on predefined groups of field values.

Parameters:
  • output_folder (str) – The base directory where a new run-specific folder for the outputs will be created.

  • input_db (str) – The path to the source vector database (e.g., a GeoPackage file).

  • input_layer (str) – The name of the layer or table within the input_db to read the features from.

  • groups (dict) – A dictionary where keys are the desired output layer names (groups) and values are lists of string values from field_name to include in that layer.

  • field_name (str) – The name of the attribute field in the input data used for grouping and querying the features.

Returns:

The path to the output GeoPackage file containing the newly created layers.

Return type:

pathlib.Path

Notes

The function first reads the entire layer into a single GeoDataFrame. It then iterates through the groups dictionary, querying the GeoDataFrame for features where the value in field_name matches any of the values in the group’s list. All resulting group GeoDataFrames are concatenated and saved as separate layers (named after the group keys) into a new GeoPackage file called split.gpkg within a run-specific subfolder in output_folder.

pem.risk.util_raster_blank(output_folder, output_raster, input_raster)[source]#

Creates a blank (zero-valued) raster based on the extent, resolution, and CRS of an existing input raster.

Parameters:
  • output_folder (str) – The directory where the blank raster will be saved (used for organization, though the full path is given by output_raster).

  • output_raster (str) – The full path and filename for the resulting blank raster file.

  • input_raster (str) – The path to the source raster file whose properties (extent, resolution, CRS) will be used.

Returns:

The full path to the newly created blank raster file.

Return type:

str

Notes

This function uses the QGIS processing algorithm native:rastercalc (Raster calculator). It works by multiplying every cell in the input raster by zero, effectively preserving the metadata (extent, resolution, CRS) while setting all data values to zero. The output raster is a new file and does not modify the input raster.

pem.risk.util_raster_reproject(output_folder, output_raster, input_raster, dst_resolution, dst_crs='5641', src_crs='4326', dtype=6, resampling=0)[source]#

Reprojects and optionally resamples an input raster to a new Coordinate Reference System (CRS) and resolution.

Parameters:
  • output_folder (str) – The directory where the reprojected raster will be saved (though not directly used in the current implementation, it implies the output location).

  • output_raster (str) – The full path and filename for the resulting reprojected raster file.

  • input_raster (str) – The path to the source raster file to be reprojected.

  • dst_resolution (float) – The desired resolution (cell size) for the output raster, usually in the units of the target CRS.

  • dst_crs (str) – The EPSG code (as a string) for the target CRS. Default value = 5641

  • src_crs (str) – The EPSG code (as a string) for the source CRS. Default value = 4326

  • dtype (int) – The desired data type for the output raster bands (GDAL data type code). Default value = 6 (Float32)

  • resampling (int) – The resampling method to use (GDAL resampling code). Default value = 0 (Nearest Neighbour)

Returns:

The full path to the newly created output raster file.

Return type:

str

Notes

This function uses the QGIS processing algorithm gdal:warpreproject. The default NoData value is set to -99999. Common values for dtype include 1 (Byte), 4 (Int32), 6 (Float32). Common values for resampling include 0 (Nearest Neighbour), 1 (Bilinear), 2 (Cubic). The output path is constructed using the output_raster parameter.

pem.risk.util_layer_rasterize(input_raster, input_db, input_layer, input_table=None, burn_value=1, extra='')[source]#

Rasterizes a vector layer from a database into an existing raster file, assigning a fixed burn value.

Parameters:
  • input_raster (str) – The path to the existing raster file to be modified (must be writable).

  • input_db (str) – The path or connection string to the vector database (e.g., GeoPackage, PostGIS connection).

  • input_layer (str) – The name of the vector layer or table to rasterize.

  • input_table (str or None) – [optional] The schema or parent table name if the input_layer is a sub-table or view (e.g., for PostGIS).

  • burn_value (int or float) – The fixed value to burn into the raster cells covered by the vector features. Default value = 1

  • extra (str) – Additional command-line options passed directly to the underlying GDAL tool. Default value = ''

Returns:

The path to the modified input raster file.

Return type:

str

Notes

This function uses the QGIS processing algorithm gdal:rasterize_over_fixed_value, which modifies the input_raster in place. If input_table is not provided, it assumes a standard layer format (e.g., GeoPackage layer). If input_table is provided, it constructs a PostgreSQL-like table reference.