pem.risk module#
Module for pre- and post-processing routines supporting the Habitat Risk Index workflow.
This module provides a collection of helper functions for organizing, rasterizing, and reprojecting spatial data layers used in the InVEST Habitat Risk model. It is intended to be executed within the QGIS Python environment and relies on its processing framework (processing algorithms, CRS handling, rasterization utilities, etc.).
See also
Refer to the InVEST Habitat Risk model documentation for theoretical and technical details regarding input data preparation for the Habitat Risk Index.
Requirements
The following libraries and environment are required:
QGIS 3 (Python environment)
numpypandasgeopandas
Overview
This module includes routines for:
Preparing and rasterizing stressor and habitat layers.
Reprojecting rasters and generating blank templates.
Splitting vector datasets into grouped layers.
Creating structured, time-stamped output directories for reproducible runs.
Each function performs a self-contained processing step designed to integrate smoothly with other spatial workflows in QGIS.
Examples
Scripted usage examples are provided in the docstrings of each function. No global examples are included at module level.
- pem.risk.compute_risk_index(folder_output, raster_hra_benthic, raster_hra_pelagic)[source]#
Computes a final risk index raster by applying fuzzy logic to benthic and pelagic Human Risk Assessment (HRA) rasters.
- Parameters:
folder_output (str) – The path to the main output folder where a new run-specific subfolder will be created.
raster_hra_benthic (str) – Full path to the input benthic HRA raster.
raster_hra_pelagic (str) – Full path to the input pelagic HRA raster.
- Returns:
The path to the run-specific output folder containing the resulting risk index rasters.
- Return type:
str
Extra notes
The function executes the following steps:
Creates a unique run output subfolder within
folder_output.Fuzzifies both the benthic and pelagic HRA rasters using
util_fuzzify_rasterwith a lower bound of 0 and an upper bound calculated from the 99th percentile of each raster (use_percentiles=True). This producesrisk_b.tifandrisk_p.tif.Sums the original (non-fuzzified) benthic and pelagic HRA rasters using QGIS’s raster calculator (
native:rastercalc) to create an intermediate sum raster (hra_sum.tif).Fuzzifies the resulting sum raster (
hra_sum.tif) using the same 0 to 99th percentile bounds to produce the final Risk Index raster (risk.tif).
The final risk index raster (
risk.tif) represents the combined and normalized risk.
- pem.risk.setup_hra_model(folder_output, file_criteria, input_db, reference_raster, resolution, aoi_layer, habitat_layer, habitat_field, habitat_groups, stressor_groups, dst_crs='5880', suffix='')[source]#
Sets up all necessary input layers and data structures for the HRA (Habitat Risk Assessment) model run.
Note
This script is a utility for running the InVEST Habitat Risk model.
- Parameters:
folder_output (str) – Path to the main output directory where all generated files will be stored.
file_criteria (str) – Path to criteria table for habitat sensitivity
input_db (str) – Path to the GeoPackage file containing the input vector layers.
reference_raster (str) – Path to the raster used as a spatial template for model rasters.
resolution (float) – The target spatial resolution (pixel size) for the model outputs.
aoi_layer (str) – The name of the vector layer in
input_dbdefining the Area of Interest (AOI).habitat_layer (str) – The name of the vector layer in
input_dbcontaining the habitat features.habitat_field (str) – The name of the field in
habitat_layerused to classify habitats into groups.habitat_groups (dict) – A dictionary defining how habitats are grouped and processed.
stressor_groups (dict) – A dictionary defining the stressor layers and their properties.
dst_crs (str) – The EPSG code for the destination Coordinate Reference System (CRS). Default value =
5880suffix (str) – String suffix for model runs
- Returns:
The path to the newly created main output folder for the model run.
- Return type:
str
Extra notes
This function prepares the environment by creating output folders, reprojecting the AOI, creating a blank reference raster, and running
setup_habitatsandsetup_stressorsto prepare those inputs. It also generates a combinedinfo.csvtable.Script sample
Warning
The following script is expected to be executed under the QGIS Python Environment with
numpy,pandasandgeopandasinstalled.# WARNING: run this in QGIS Python Environment import importlib.util as iu # define the paths to this module # ---------------------------------------- the_module = "path/to/risk.py" spec = iu.spec_from_file_location("module", the_module) module = iu.module_from_spec(spec) spec.loader.exec_module(module) # define major parameters # ---------------------------------------- project_dir = "path/to/dir" scenario = "baseline" habitat_type = "benthic" # infer the paths to IO # ---------------------------------------- input_dir = f"{project_dir}/inputs" output_dir = f"{project_dir}/outputs" input_db = f"{input_dir}/layers.gpkg" score_table = f"{input_dir}/risk/{scenario}/scores_{habitat_type}.csv" reference_raster = f"{input_dir}/bathymetry.tif" # organize habitat groups # ---------------------------------------- habitat_groups = { # Habitat group "MB3_MC3": ["MB3", "MC3"], # list of habitats names "MB4_MB5_MB6": ["MB4", "MB5", "MB6"], "MC4_MC5_MC6": ["MC4", "MC5", "MC6"], "MD3": ["MD3"], "MD4_MD5_MD6": ["MD4", "MD5", "MD6"], "ME1": ["ME1"], "ME4_MF4_MF5": ["ME4", "MF4", "MF5"], "MG4_MG6": ["MG4", "MG6"], } # organize stressors groups # ---------------------------------------- stressor_groups = { # stressor name "MINERACAO": { # list of layers in input database "layers": ["mineracao_processos", "mineracao_areas_potenciais"], "buffer": 10000, # in meters }, "TURISMO": { "layers": ["turismo_atividades_esportivas_sul"], "buffer": 10000, }, "EOLICAS": { "layers": ["eolico_parques"], "buffer": 10000, }, } # call the function # ---------------------------------------- folder_output = module.setup_hra_model( folder_output=output_dir, file_criteria=criteria_table, input_db=input_db, reference_raster=reference_raster, resolution=1000, # in meters aoi_layer="sul", habitat_layer="habitats_bentonicos_sul_v2", habitat_field="code", habitat_groups=habitat_groups, stressor_groups=stressor_groups, )
- pem.risk.setup_stressors(folder_output, input_db, groups, reference_raster, is_blank=False, resolution=500, subfolder=True)[source]#
Sets up stressor layers by rasterizing multiple vector layers from a database into single stressor rasters and reprojecting them to a specified resolution.
Note
This script is a utility for running the InVEST Habitat Risk model.
- Parameters:
folder_output (str) – The base directory where a new run-specific folder for all outputs will be created.
input_db (str) – The path to the source vector database (e.g., GeoPackage) containing the stressor layers.
groups (dict) – A dictionary defining the stressor groups. Keys are the desired output stressor names (e.g.,
Coastal_Pollution), and values are dictionaries containing two keys:layers(a list of vector layer names to be combined) andbuffer(the required buffer distance in meters for subsequent analysis).reference_raster (str) – The path to a raster file whose extent, CRS, and other properties will be used as a template for the output stressor rasters.
is_blank (bool) – [optional] If
True, thereference_rasteris assumed to be a blank (zero-valued) template already, skipping the internal blanking step. Default value = Falseresolution (float) – The desired resolution (cell size) for the final output stressor rasters. Default value = 500
- Returns:
The path to the newly created run-specific output folder containing the stressor rasters and metadata.
- Return type:
str
Extra notes
This function combines features from multiple vector layers into a single output stressor raster for each defined group.
Template Raster: If
is_blankisFalse, a blank raster is generated from thereference_rasterto serve as the template.Rasterization Loop: For each group, the template raster is copied, and all vector layers listed under the group’s
layerskey are sequentially rasterized onto the copy using a burn value of 1 (features are present).Reprojection: The resulting raster is reprojected to the desired
resolution(and default CRS of 5880).Metadata: An
info_stressors.csvfile is created, detailing the name, file path, and required STRESSOR BUFFER (meters) for each generated stressor raster.
Intermediate rasters are cleaned up at the end.
Script sample
Warning
The following script is expected to be executed under the QGIS Python Environment with
numpy,pandasandgeopandasinstalled.# WARNING: run this in QGIS Python Environment import importlib.util as iu # define the paths to this module # ---------------------------------------- the_module = "path/to/risk.py" spec = iu.spec_from_file_location("module", the_module) module = iu.module_from_spec(spec) spec.loader.exec_module(module) # define the paths to input and output folders # ---------------------------------------- input_dir = "path/to/dir" output_dir = "path/to/dir" # define the path to input database # ---------------------------------------- input_db = f"{input_dir}/pem.gpkg" # organize stressors groups groups = { "MINERACAO": { "layers": ["mineracao_processos", "mineracao_areas_potenciais"], "buffer": 10000, "raster": None }, "TURISMO": { "layers": ["turismo_atividades_esportivas_sul"], "buffer": 10000, "raster": None }, "EOLICAS": { "layers": ["eolico_parques"], "buffer": 10000, "raster": None }, } # call the function # ---------------------------------------- output_file = module.setup_stressors( input_db=input_db, folder_output=output_dir, groups=groups, reference_raster="{input_dir}/bathymetry.tif", is_blank=False, resolution=1000 )
- pem.risk.setup_habitats(folder_output, input_db, input_layer, groups, field_name, reference_raster, is_blank=False, resolution=500, subfolder=True)[source]#
Sets up habitat layers by splitting a vector layer, rasterizing each resulting habitat group, and reprojecting the rasters to a specified resolution.
Note
This script is a utility for running the InVEST Habitat Risk model.
- Parameters:
folder_output (str) – The base directory where a new run-specific folder for all outputs will be created.
input_db (str) – The path to the source vector database (e.g., GeoPackage) containing the habitat features.
input_layer (str) – The name of the layer or table within the
input_dbto read the features from.groups (dict) – A dictionary defining the habitat groups: keys are the desired output habitat names (layer names), and values are lists of string values from
field_nameto include in that habitat layer.field_name (str) – The name of the attribute field in the input data used for grouping and querying the habitat features.
reference_raster (str) – The path to a raster file whose extent, CRS, and other properties will be used as a template for the output habitat rasters.
is_blank (bool) – [optional] If
True, thereference_rasteris assumed to be a blank (zero-valued) template already, skipping the internal blanking step. Default value = Falseresolution (float) – The desired resolution (cell size) for the final output habitat rasters. Default value = 500
- Returns:
The path to the newly created run-specific output folder containing the habitat rasters and metadata.
- Return type:
str
Extra notes
This function orchestrates a multi-step process:
Vector Split: Calls
split_featuresto create a temporary GeoPackage where each habitat group is saved as a separate layer.Template Raster: If
is_blankisFalse, a blank raster is generated from thereference_rasterto serve as the template for rasterization.Rasterization Loop: Each habitat layer is individually rasterized onto a copy of the template raster, setting the habitat cells to a burn value of 1.
Reprojection: The resulting raster is reprojected to the desired
resolution(and default CRS of 5880).Metadata: An
info_habitats.csvfile is created, detailing the name and file path for each generated habitat raster.
Temporary files (split GeoPackage and intermediate rasters) are cleaned up at the end.
Script sample
Warning
The following script is expected to be executed under the QGIS Python Environment with
numpy,pandasandgeopandasinstalled.# WARNING: run this in QGIS Python Environment import importlib.util as iu # define the paths to this module # ---------------------------------------- the_module = "path/to/risk.py" spec = iu.spec_from_file_location("module", the_module) module = iu.module_from_spec(spec) spec.loader.exec_module(module) # define the paths to input and output folders # ---------------------------------------- input_dir = "path/to/dir" output_dir = "path/to/dir" # define the path to input database # ---------------------------------------- input_db = f"{input_dir}/pem.gpkg" # organize habitat groups groups = { "MB3_MC3": ["MB3", "MC3"], "MB4_MB5_MB6": ["MB4", "MB5", "MB6"], "MC4_MC5_MC6": ["MC4", "MC5", "MC6"], "MD3": ["MD3"], "MD4_MD5_MD6": ["MD4", "MD5", "MD6"], "ME1": ["ME1"], "ME4_MF4_MF5": ["ME4", "MF4", "MF5"], "MG4_MG6": ["MG4", "MG6"], } # call the function # ---------------------------------------- output_file = module.setup_habitats( input_db=input_db, folder_output=output_dir, input_layer="habitats_bentonicos_sul_v2", groups=groups, field_name="code", reference_raster=f"{input_dir}/bathymetry.tif", resolution=1000 )
- pem.risk.util_split_features(folder_output, input_db, input_layer, groups, field_name)[source]#
Splits features from a source GeoDataFrame into separate layers within a single GeoPackage file based on predefined groups of field values.
- Parameters:
folder_output (str) – The base directory where a new run-specific folder for the outputs will be created.
input_db (str) – The path to the source vector database (e.g., a GeoPackage file).
input_layer (str) – The name of the layer or table within the
input_dbto read the features from.groups (dict) – A dictionary where keys are the desired output layer names (groups) and values are lists of string values from
field_nameto include in that layer.field_name (str) – The name of the attribute field in the input data used for grouping and querying the features.
- Returns:
The path to the output GeoPackage file containing the newly created layers.
- Return type:
pathlib.Path
Notes
The function first reads the entire layer into a single GeoDataFrame. It then iterates through the
groupsdictionary, querying the GeoDataFrame for features where the value infield_namematches any of the values in the group’s list. All resulting group GeoDataFrames are concatenated and saved as separate layers (named after the group keys) into a new GeoPackage file calledsplit.gpkgwithin a run-specific subfolder infolder_output.
- pem.risk.util_raster_blank(output_raster, input_raster)[source]#
Creates a blank (zero-valued) raster based on the extent, resolution, and CRS of an existing input raster.
- Parameters:
output_raster (str) – name for the resulting blank raster file (without extension).
input_raster (str) – The path to the source raster file whose properties (extent, resolution, CRS) will be used.
- Returns:
The full path to the newly created blank raster file.
- Return type:
str
Notes
This function uses the QGIS processing algorithm
native:rastercalc(Raster calculator). It works by multiplying every cell in the input raster by zero, effectively preserving the metadata (extent, resolution, CRS) while setting all data values to zero. The output raster is a new file and does not modify the input raster.
- pem.risk.util_raster_reproject(output_raster, input_raster, dst_resolution, dst_crs='5880', src_crs='4326', dtype=6, resampling=0)[source]#
Reprojects and optionally resamples an input raster to a new Coordinate Reference System (CRS) and resolution.
- Parameters:
output_raster (str) – The full path and filename for the resulting reprojected raster file.
input_raster (str) – The path to the source raster file to be reprojected.
dst_resolution (float) – The desired resolution (cell size) for the output raster, usually in the units of the target CRS.
dst_crs (str) – The EPSG code (as a string) for the target CRS. Default value =
5880src_crs (str) – The EPSG code (as a string) for the source CRS. Default value =
4326dtype (int) – The desired data type for the output raster bands (GDAL data type code). Default value = 6 (Float32)
resampling (int) – The resampling method to use (GDAL resampling code). Default value = 0 (Nearest Neighbour)
- Returns:
The full path to the newly created output raster file.
- Return type:
str
Notes
This function uses the QGIS processing algorithm
gdal:warpreproject. The default NoData value is set to-99999. Common values fordtypeinclude 1 (Byte), 4 (Int32), 6 (Float32). Common values forresamplinginclude 0 (Nearest Neighbour), 1 (Bilinear), 2 (Cubic). The output path is constructed using theoutput_rasterparameter.
- pem.risk.util_layer_rasterize(input_raster, input_db, input_layer, input_table=None, burn_value=1, extra='')[source]#
Rasterizes a vector layer from a database into an existing raster file, assigning a fixed burn value.
- Parameters:
input_raster (str) – The path to the existing raster file to be modified (must be writable).
input_db (str) – The path or connection string to the vector database (e.g., GeoPackage, PostGIS connection).
input_layer (str) – The name of the vector layer or table to rasterize.
input_table (str or None) – [optional] The schema or parent table name if the
input_layeris a sub-table or view (e.g., for PostGIS).burn_value (int or float) – The fixed value to burn into the raster cells covered by the vector features. Default value = 1
extra (str) – Additional command-line options passed directly to the underlying GDAL tool. Default value =
''
- Returns:
The path to the modified input raster file.
- Return type:
str
Notes
This function uses the QGIS processing algorithm
gdal:rasterize_over_fixed_value, which modifies theinput_rasterin place. Ifinput_tableis not provided, it assumes a standard layer format (e.g., GeoPackage layer). Ifinput_tableis provided, it constructs a PostgreSQL-like table reference.
- pem.risk.util_fuzzify_raster(input_raster, output_raster, lo=None, hi=None, use_percentiles=False)[source]#
Applies a linear membership fuzzy set function to a raster, reclassifying values between a low and high bound.
- Parameters:
input_raster (str) – Full path to the input raster file.
output_raster (str) – Full path for the output fuzzy raster file.
lo (float or int) – [optional] The lower bound for the linear membership function. If
None, it defaults to the minimum value (or 1st percentile ifuse_percentilesisTrue).hi (float or int) – [optional] The upper bound for the linear membership function. If
None, it defaults to the maximum value (or 99th percentile ifuse_percentilesisTrue).use_percentiles (bool) – Use the 1st and 99th percentiles as default bounds instead of the absolute min/max when
loorhiareNone. Default value =False
- Returns:
The file path of the resulting fuzzy raster.
- Return type:
str
Notes
This function uses the QGIS processing algorithm
native:fuzzifyrasterlinearmembership. Values in the input raster are reclassified: - Values $le$loget a fuzzy membership value of 0. - Values $ge$higet a fuzzy membership value of 1. - Values betweenloandhiare linearly interpolated between 0 and 1. The internal call toutil_get_raster_statsis assumed to return ‘p01’ and ‘p99’ keys whenfull=True.
- pem.risk.util_get_raster_stats(input_raster, band=1, full=False)[source]#
Calculates the statistics for a specified band of a given raster file.
- Parameters:
input_raster (str) – Full path to the input raster file.
band (int) – The band number to calculate statistics for. Default value =
1
- Returns:
A dictionary containing the raster band’s mean, standard deviation, minimum, maximum, sum, and element count.
- Return type:
dict
Notes
The function uses QGIS classes internally to read the raster and compute statistics. The keys in the returned dictionary are
mean,sd,min,max,sum, andcount.
- pem.risk.util_read_raster(file_input, n_band=1, metadata=True)[source]#
Read a raster (GeoTIFF) file
- Parameters:
file_input (str) – path to raster file
n_band (int) – number of the band to read
metadata (bool) – option to return
- Returns:
dictionary with “data” and (optional) “metadata”
- Return type:
dict
- pem.risk.util_get_raster_crs(file_input, code_only=True)[source]#
Extracts the Coordinate Reference System (CRS) from a raster file.
- Parameters:
file_input (str) – The file path to the raster source.
code_only (bool) – Whether to return only the numerical ID (e.g.,
31983) or the full authority ID (e.g.,EPSG:31983). Default value =True
- Returns:
The CRS identifier as a string.
- Return type:
str