pem.risk module#
Module for pre- and post-processing routines supporting the Habitat Risk Index workflow.
This module provides a collection of helper functions for organizing, rasterizing, and reprojecting spatial data layers used in the InVEST Habitat Risk model. It is intended to be executed within the QGIS Python environment and relies on its processing framework (processing algorithms, CRS handling, rasterization utilities, etc.).
See also
Refer to the InVEST Habitat Risk model documentation for theoretical and technical details regarding input data preparation for the Habitat Risk Index.
Requirements
The following libraries and environment are required:
QGIS 3 (Python environment)
numpy
pandas
geopandas
Overview
This module includes routines for:
Preparing and rasterizing stressor and habitat layers.
Reprojecting rasters and generating blank templates.
Splitting vector datasets into grouped layers.
Creating structured, time-stamped output directories for reproducible runs.
Each function performs a self-contained processing step designed to integrate smoothly with other spatial workflows in QGIS.
Examples
Scripted usage examples are provided in the docstrings of each function. No global examples are included at module level.
- pem.risk.setup_stressors(output_folder, input_db, groups, reference_raster, is_blank=False, resolution=400)[source]#
Sets up stressor layers by rasterizing multiple vector layers from a database into single stressor rasters and reprojecting them to a specified resolution.
Note
This script is a utility for running the InVEST Habitat Risk model.
- Parameters:
output_folder (str) – The base directory where a new run-specific folder for all outputs will be created.
input_db (str) – The path to the source vector database (e.g., GeoPackage) containing the stressor layers.
groups (dict) – A dictionary defining the stressor groups. Keys are the desired output stressor names (e.g.,
Coastal_Pollution
), and values are dictionaries containing two keys:layers
(a list of vector layer names to be combined) andbuffer
(the required buffer distance in meters for subsequent analysis).reference_raster (str) – The path to a raster file whose extent, CRS, and other properties will be used as a template for the output stressor rasters.
is_blank (bool) – [optional] If
True
, thereference_raster
is assumed to be a blank (zero-valued) template already, skipping the internal blanking step. Default value = Falseresolution (float) – The desired resolution (cell size) for the final output stressor rasters. Default value = 400
- Returns:
The path to the newly created run-specific output folder containing the stressor rasters and metadata.
- Return type:
str
Notes
This function combines features from multiple vector layers into a single output stressor raster for each defined group.
Template Raster: If
is_blank
isFalse
, a blank raster is generated from thereference_raster
to serve as the template.Rasterization Loop: For each group, the template raster is copied, and all vector layers listed under the group’s
layers
key are sequentially rasterized onto the copy using a burn value of 1 (features are present).Reprojection: The resulting raster is reprojected to the desired
resolution
(and default CRS of 5641).Metadata: An
info_stressors.csv
file is created, detailing the name, file path, and required STRESSOR BUFFER (meters) for each generated stressor raster.
Intermediate rasters are cleaned up at the end.
Script example
Warning
The following script is expected to be executed under the QGIS Python Environment with
numpy
,pandas
andgeopandas
installed.# WARNING: run this in QGIS Python Environment import importlib.util as iu # define the paths to this module # ---------------------------------------- the_module = "path/to/risk.py" spec = iu.spec_from_file_location("module", the_module) module = iu.module_from_spec(spec) spec.loader.exec_module(module) # define the paths to input and output folders # ---------------------------------------- input_dir = "path/to/dir" output_dir = "path/to/dir" # define the path to input database # ---------------------------------------- input_db = f"{input_dir}/pem.gpkg" # organize stressors groups groups = { "MINERACAO": { "layers": ["mineracao_processos", "mineracao_areas_potenciais"], "buffer": 10000, "raster": None }, "TURISMO": { "layers": ["turismo_atividades_esportivas_sul"], "buffer": 10000, "raster": None }, "EOLICAS": { "layers": ["eolico_parques"], "buffer": 10000, "raster": None }, } # call the function # ---------------------------------------- output_file = module.setup_stressors( input_db=input_db, output_folder=output_dir, groups=groups, reference_raster=f"{input_dir}/raster.tif", is_blank=False, resolution=1000 )
- pem.risk.setup_habitats(output_folder, input_db, input_layer, groups, field_name, reference_raster, is_blank=False, resolution=400)[source]#
Sets up habitat layers by splitting a vector layer, rasterizing each resulting habitat group, and reprojecting the rasters to a specified resolution.
Note
This script is a utility for running the InVEST Habitat Risk model.
- Parameters:
output_folder (str) – The base directory where a new run-specific folder for all outputs will be created.
input_db (str) – The path to the source vector database (e.g., GeoPackage) containing the habitat features.
input_layer (str) – The name of the layer or table within the
input_db
to read the features from.groups (dict) – A dictionary defining the habitat groups: keys are the desired output habitat names (layer names), and values are lists of string values from
field_name
to include in that habitat layer.field_name (str) – The name of the attribute field in the input data used for grouping and querying the habitat features.
reference_raster (str) – The path to a raster file whose extent, CRS, and other properties will be used as a template for the output habitat rasters.
is_blank (bool) – [optional] If
True
, thereference_raster
is assumed to be a blank (zero-valued) template already, skipping the internal blanking step. Default value = Falseresolution (float) – The desired resolution (cell size) for the final output habitat rasters. Default value = 400
- Returns:
The path to the newly created run-specific output folder containing the habitat rasters and metadata.
- Return type:
str
Notes
This function orchestrates a multi-step process:
Vector Split: Calls
split_features
to create a temporary GeoPackage where each habitat group is saved as a separate layer.Template Raster: If
is_blank
isFalse
, a blank raster is generated from thereference_raster
to serve as the template for rasterization.Rasterization Loop: Each habitat layer is individually rasterized onto a copy of the template raster, setting the habitat cells to a burn value of 1.
Reprojection: The resulting raster is reprojected to the desired
resolution
(and default CRS of 5641).Metadata: An
info_habitats.csv
file is created, detailing the name and file path for each generated habitat raster.
Temporary files (split GeoPackage and intermediate rasters) are cleaned up at the end.
Script example
Warning
The following script is expected to be executed under the QGIS Python Environment with
numpy
,pandas
andgeopandas
installed.# WARNING: run this in QGIS Python Environment import importlib.util as iu # define the paths to this module # ---------------------------------------- the_module = "path/to/risk.py" spec = iu.spec_from_file_location("module", the_module) module = iu.module_from_spec(spec) spec.loader.exec_module(module) # define the paths to input and output folders # ---------------------------------------- input_dir = "path/to/dir" output_dir = "path/to/dir" # define the path to input database # ---------------------------------------- input_db = f"{input_dir}/pem.gpkg" # organize habitat groups groups = { "MB3_MC3": ["MB3", "MC3"], "MB4_MB5_MB6": ["MB4", "MB5", "MB6"], "MC4_MC5_MC6": ["MC4", "MC5", "MC6"], "MD3": ["MD3"], "MD4_MD5_MD6": ["MD4", "MD5", "MD6"], "ME1": ["ME1"], "ME4_MF4_MF5": ["ME4", "MF4", "MF5"], "MG4_MG6": ["MG4", "MG6"], } # call the function # ---------------------------------------- output_file = module.setup_habitats( input_db=input_db, output_folder=output_dir, input_layer="habitats_bentonicos_sul_v2", groups=groups, field_name="code", reference_raster=f"{input_dir}/raster.tif", resolution=1000 )
- pem.risk.util_split_features(output_folder, input_db, input_layer, groups, field_name)[source]#
Splits features from a source GeoDataFrame into separate layers within a single GeoPackage file based on predefined groups of field values.
- Parameters:
output_folder (str) – The base directory where a new run-specific folder for the outputs will be created.
input_db (str) – The path to the source vector database (e.g., a GeoPackage file).
input_layer (str) – The name of the layer or table within the
input_db
to read the features from.groups (dict) – A dictionary where keys are the desired output layer names (groups) and values are lists of string values from
field_name
to include in that layer.field_name (str) – The name of the attribute field in the input data used for grouping and querying the features.
- Returns:
The path to the output GeoPackage file containing the newly created layers.
- Return type:
pathlib.Path
Notes
The function first reads the entire layer into a single GeoDataFrame. It then iterates through the
groups
dictionary, querying the GeoDataFrame for features where the value infield_name
matches any of the values in the group’s list. All resulting group GeoDataFrames are concatenated and saved as separate layers (named after the group keys) into a new GeoPackage file calledsplit.gpkg
within a run-specific subfolder inoutput_folder
.
- pem.risk.util_raster_blank(output_folder, output_raster, input_raster)[source]#
Creates a blank (zero-valued) raster based on the extent, resolution, and CRS of an existing input raster.
- Parameters:
output_folder (str) – The directory where the blank raster will be saved (used for organization, though the full path is given by
output_raster
).output_raster (str) – The full path and filename for the resulting blank raster file.
input_raster (str) – The path to the source raster file whose properties (extent, resolution, CRS) will be used.
- Returns:
The full path to the newly created blank raster file.
- Return type:
str
Notes
This function uses the QGIS processing algorithm
native:rastercalc
(Raster calculator). It works by multiplying every cell in the input raster by zero, effectively preserving the metadata (extent, resolution, CRS) while setting all data values to zero. The output raster is a new file and does not modify the input raster.
- pem.risk.util_raster_reproject(output_folder, output_raster, input_raster, dst_resolution, dst_crs='5641', src_crs='4326', dtype=6, resampling=0)[source]#
Reprojects and optionally resamples an input raster to a new Coordinate Reference System (CRS) and resolution.
- Parameters:
output_folder (str) – The directory where the reprojected raster will be saved (though not directly used in the current implementation, it implies the output location).
output_raster (str) – The full path and filename for the resulting reprojected raster file.
input_raster (str) – The path to the source raster file to be reprojected.
dst_resolution (float) – The desired resolution (cell size) for the output raster, usually in the units of the target CRS.
dst_crs (str) – The EPSG code (as a string) for the target CRS. Default value =
5641
src_crs (str) – The EPSG code (as a string) for the source CRS. Default value =
4326
dtype (int) – The desired data type for the output raster bands (GDAL data type code). Default value = 6 (Float32)
resampling (int) – The resampling method to use (GDAL resampling code). Default value = 0 (Nearest Neighbour)
- Returns:
The full path to the newly created output raster file.
- Return type:
str
Notes
This function uses the QGIS processing algorithm
gdal:warpreproject
. The default NoData value is set to-99999
. Common values fordtype
include 1 (Byte), 4 (Int32), 6 (Float32). Common values forresampling
include 0 (Nearest Neighbour), 1 (Bilinear), 2 (Cubic). The output path is constructed using theoutput_raster
parameter.
- pem.risk.util_layer_rasterize(input_raster, input_db, input_layer, input_table=None, burn_value=1, extra='')[source]#
Rasterizes a vector layer from a database into an existing raster file, assigning a fixed burn value.
- Parameters:
input_raster (str) – The path to the existing raster file to be modified (must be writable).
input_db (str) – The path or connection string to the vector database (e.g., GeoPackage, PostGIS connection).
input_layer (str) – The name of the vector layer or table to rasterize.
input_table (str or None) – [optional] The schema or parent table name if the
input_layer
is a sub-table or view (e.g., for PostGIS).burn_value (int or float) – The fixed value to burn into the raster cells covered by the vector features. Default value = 1
extra (str) – Additional command-line options passed directly to the underlying GDAL tool. Default value =
''
- Returns:
The path to the modified input raster file.
- Return type:
str
Notes
This function uses the QGIS processing algorithm
gdal:rasterize_over_fixed_value
, which modifies theinput_raster
in place. Ifinput_table
is not provided, it assumes a standard layer format (e.g., GeoPackage layer). Ifinput_table
is provided, it constructs a PostgreSQL-like table reference.