Warning

Documentation website under active development. This is not a stable release.

Data Structures#

Files used in plans are related to the following two primitive data structures:

A Table can store a frame of data in rows and columns in a single file. A Raster can store a map in a matrix/grid of numbers in a single file.

Input files must be formatted in by standard way, otherwise the tool is not going to work. The standards are meant to be simple and user-friendly.

Using open-source applications

Open-source applications like LibreOffice and QGIS are very convenient to fit data into plans standards.

Table#

A Table in plans is a frame of data defined by rows and columns. Usually, the first row stores the names of the fields and the subsequent rows stores the data itself.

Structure rules

  1. [required] file extension: .csv

  2. [required] column separator: semi-colon ;

  3. [required] first row stores field names

  4. [required] decimal separator for numbers: .

  5. [required] no-data convention: empty cell

Example

In the following table id is an integer number int field, ndvi_mean is a real number float field and the remaining are text str fields.

id;     name; alias;    color;  ndvi_mean
 1;    Water;     W;     blue;       -0.9
 2;   Forest;     F;    green;       0.87
 3;    Crops;     C;  magenta;       0.42
 4;  Pasture;     P;   orange;       0.76
 5;    Urban;     U;   9AA7A3;       0.24

plans is case-sensitive

Upper case and lower case matters. Name is different than name.

Column names standards

Field/column names may follow standards also

Data types

See Data Types section for more references on field formatting.

Information Table#

An Information Table is a special kind of Table that stores field information in a listed format.

Structure rules

  1. [required] file name signature: {filename}_info

  2. [required] file extension: .csv

  3. [required] column separator: semi-colon ;

  4. [required] first row stores field names

  5. [required] decimal separator for numbers: .

  6. [required] no-data convention: empty cell

Basic required fields

Name

Description

Data Type

Units

field

Name of field

str

unitless

value

Value set for field

misc

n.a.

Extra required fields

Extra required fields may be also needed, depending on each input file.

Required horizontal fields

A required horizontal field in Information Table is an expected row with set values for field and value.

Example

field;            value
name;             Mill Creek Model
alias;            MCM-M001
color;            blue
source;           Ipo
description;      Rainfall-Runoff model
file_parameters;
folder_data;     ./data/inputs

Attribute Table#

An Attribute Table is a special kind of Table that stores extra information about Raster maps. Each column represents a field that must be homogeneous. This means that each field stores the same data type.

Structure rules

  1. [required] file name signature: {filename}_attributes

  2. [required] file extension: .csv

  3. [required] column separator: semi-colon ;

  4. [required] first row stores field names;

  5. [required] decimal separator for numbers: .

  6. [required] no-data convention: empty cell

  7. [required] homogeneous data type for on each column

Basic required fields

Name

Description

Data Type

Units

id

Unique numeric code

int

index

name

Unique name

str

n.a.

alias

Unique short nickname or label

str

n.a.

color

Color HEX code or name available in Matplotlib

str

n.a.

Extra required fields

Extra required fields may be also needed, depending on each input file.

Example

id;     name; alias;    color;  ndvi_mean
 1;    Water;     W;     blue;       -0.9
 2;   Forest;     F;    green;       0.87
 3;    Crops;     C;  magenta;       0.42
 4;  Pasture;     P;   orange;       0.76
 5;    Urban;     U;  #9AA7A3;       0.24

Add non-required fields

Any other fields (columns) other than the required will be ignored so you can add convenient and useful extra non-required fields. For instance, here a description text field was added for holding more information about each land use class:

id;     name; alias;    color;   ndvi_mean                          description
 1;    Water;     W;     blue;       -0.9;              Lakes, rivers and ocean
 2;   Forest;     F;    green;       0.87;     Forests (natural and cultivated)
 3;    Crops;     C;  magenta;       0.42;            Conventional annual crops
 4;  Pasture;     P;   orange;       0.76;  Conventional pasture and grasslands
 5;    Urban;     U;   9AA7A3;       0.24;                      Developed areas

Time Series#

A Time Series in plans is a special kind of Table file that must have a datetime text field (preferably in the first column).

Structure rules

  1. [required] file name signature: {filename}_series[_optional-suffix]

  2. [required] file extension: .csv

  3. [required] column separator: semi-colon ;

  4. [required] first row stores field names

  5. [required] decimal separator for numbers: .

  6. [required] no-data convention: empty cell

  7. [required] homogeneous data type for on each column

  8. [required] datetime text field (preferably in the first column)

  9. [recommended] datetime formatted in ISO 8601: yyyy-mm-dd HH:MM:SS.S

  10. [recommended] homogeneous datetime frequency

  11. [recommended] no gaps or voids in data

Basic required fields

Name

Description

Data Type

Units

datetime

Date and time in ISO 8601: yyyy-mm-dd HH:MM:SS.S

str

datetime

Extra required fields

Extra required fields may be also needed, depending on each input file.

Variable fields

The other fields than datetime generally are fields that stores the state of variables like precipitation ppt and surface air temperature tas.

Datetime frequency

Time Series also have a homogeneous datetime frequency. Recommended frequencies:

  • 15 minutes

  • 20 minutes

  • 30 minutes

  • Hourly

  • Daily

Shorter and longer frequencies

Shorter frequencies than 15 min are not recommended due to processing performance. Longer frequencies than 1 day are not recommended due to effective hydrological process representation.

Example

Time Series files tends to have a large number of rows. The first 10 rows of a daily Time Series file looks like this:

               datetime;  ppt;  tas
2020-01-01 00:00:00.000;  0.0; 20.1
2020-01-02 00:00:00.000;  5.1; 24.3
2020-01-03 00:00:00.000;  0.0; 25.8
2020-01-04 00:00:00.000; 12.9; 21.4
2020-01-05 00:00:00.000;  0.0; 21.5
2020-01-06 00:00:00.000;  0.0; 23.6
2020-01-07 00:00:00.000;  8.6; 20.6
2020-01-08 00:00:00.000;  4.7; 28.3
2020-01-09 00:00:00.000;  0.0; 27.1

Automatic fill of time information

During processing, plans will fill time information (hours, minute and seconds) if only the date is passed (year, month and day), like in the above example.

Small gaps and voids in Time Series

plans will try to fill or interpolate small gaps and voids in a given Time Series. However, be aware that this may cause unnoticed impacts on model outputs. A best practice is to interpolate and fill voids prior to the processing so users can understand what is going on.

For instance, consider the following Time Series that has a gap (missing Jan/3 and Jan/4 dates) and a void for ppt in Jan/8:

               datetime;  ppt;  tas
2020-01-01 00:00:00.000;  0.0; 20.1
2020-01-02 00:00:00.000;  5.1; 24.3
2020-01-05 00:00:00.000;  0.0; 21.5
2020-01-06 00:00:00.000;  0.0; 23.6
2020-01-07 00:00:00.000;  8.6; 20.6
2020-01-08 00:00:00.000;     ; 28.3
2020-01-09 00:00:00.000;  0.0; 27.1

In this case, plans would interpolate temperature tas and fill with 0 the precipitation ppt:

               datetime;  ppt;  tas
2020-01-01 00:00:00.000;  0.0; 20.1
2020-01-02 00:00:00.000;  5.1; 24.3
2020-01-03 00:00:00.000;  0.0; 23.3
2020-01-04 00:00:00.000;  0.0; 22.4
2020-01-05 00:00:00.000;  0.0; 21.5
2020-01-06 00:00:00.000;  0.0; 23.6
2020-01-07 00:00:00.000;  8.6; 20.6
2020-01-08 00:00:00.000;  0.0; 28.3
2020-01-09 00:00:00.000;  0.0; 27.1

Raster#

A Raster in plans is a map of data defined by a matrix or grid of cells storing numbers (int or float) and encoded in way that it can be georeferenced in a given Coordinate Reference System (CRS).

Structure rules

Rule set for a single file:

  1. [required] GeoTIFF file with .tif extension

  2. [recommended] projected CRS so all cells are measured in meters

Rule set for multiple files:

  1. [required] files are aligned for the same spatial extension

  2. [required] files are aligned for the same spatial resolution

Raster grid shape must be the same

The rule set for multiple files implies that all Raster files in a given project must share the same grid shape (number or rows and columns).

GeoTIFF file#

The GeoTIFF file is the standard Raster file in plans. This is a well-known raster file distributed by most of dataset providers.

The advantages of GeoTIFF is that it stores data and metadata together in the same file. plans parse GeoTIFF files using the Rasterio libray.

GDAL reference

More details about the GeoTIFF file is given in GDAL documentation.

Time Raster#

A Time Raster in plans is a special kind of Raster file in which the data refers to a snapshot of the time line.

Structure rules

Rule set for a single file:

  1. [required] GeoTIFF file with .tif extension

  2. [required] file name signature: {filename}_{date}

  3. [recommended] projected CRS so all cells are measured in meters

Rule set for multiple files:

  1. [required] files are aligned for the same spatial extension

  2. [required] files are aligned for the same spatial resolution

Example

For instance, Land Use Land Cover is a spatial data that may require many Time Raster files:

{folder}/
   ├── lulc_2020-01-01.tif       # Raster - Land Use in 2020
   ├── lulc_2021-01-01.tif       # Raster - Land Use in 2021
   └── lulc_2022-01-01.tif       # Raster - Land Use in 2022

Quali Raster#

A Quali Raster in plans is a special kind of Raster file in which data is qualitative (classes or ids), and an auxiliary Attribute Table must be provided.

Structure rules

Rule set for a single file:

  1. [required] GeoTIFF file with .tif extension

  2. [required] an auxiliar Attribute Table with same name of GeoTIFF

  3. [recommended] projected CRS so all cells are measured in meters

Rule set for multiple files:

  1. [required] files are aligned for the same spatial extension

  2. [required] files are aligned for the same spatial resolution

Example

For instance, a Quali Raster for Land Use Land Cover only stores the id code for each land use class. More information and parameters must be stored in the auxiliar Attribute Table.

{folder}/
   ├── lulc_2020-01-01.tif       # Raster - Land Use in 2020
   └── lulc_attributes.csv       # <-- Attribute Table

One Attribute Table can feed multiple maps

The same Attribute Table file can supply the information required of multiple Raster maps. For instance, consider a set of 3 Land Use Land Cover maps, for different years. They all can use the same Attribute Table file:

{folder}/
    ├── lulc_2020-01-01.tif   # <-- multiple Rasters
    ├── lulc_2021-01-01.tif
    ├── lulc_2022-01-01.tif
    └── lulc_attributes.csv   # <-- single Attribute Table

Time Quali Raster#

A Time Quali Raster in plans is a special kind of Raster file that arises when the map is both a Time Raster and a Quali Raster. Land Use maps are the classical example, as shown above. Rules overlap.

Data Types#

Data Type is the encoding of data at the hardware level. For beginners, one may understand data types by this primitive classification:

  • str text string: common text characters

  • int integer numbers: 2, 0, 1000

  • float real numbers: 1.2, -3.44

  • misc miscellaneous, undefined data type

Detailed data types

The data types listed above are very primitive. For instance, int can be int8 or int64, which yield a big difference in memory usage. See below for a comprehensive reference.

No-data value convention#

A nodata value is a convention of what values in data means that there are actually no data (a data void). For tables, this is usually set as empty cells or some text like “N.A.” (not-apply, etc). For raster maps, the GeoTIFF format has a built-in metadata that stores a nodata value.

Enforcement of nodata

Users are not required to set nodata values, but the incoming values may be overwritten to plans standard convention.

Data Types Reference#

Data Types Reference Table#

Name

Label

GDAL Id

Lower

Upper

Decimals

Structure

text string

str

text characters

8-bits integer unsigned

uint8

1

0

255

0

integer number

8-bits integer

int8

14

-128

127

0

integer number

16-bits integer unsigned

uint16

2

0

65535

0

integer number

16-bits integer

int16

3

-32768

32767

0

integer number

32-bits integer unsigned

uint32

4

0

4e9

0

integer number

32-bits integer

int32

5

-2e-9

2e9

0

integer number

64-bits integer unsigned

uint64

12

0

18e15

0

integer number

64-bits integer

int64

13

-9e15

9e15

0

integer number

16-bits float

float16

15

-65504

65504

3

real number

32-bits float

float32

6

-3.4e38

3.4e38

6

real number

64-bits float

float64

7

-1.8e108

1.8e108

15

real number

Note

Hi-order values in the above table are approximations. For example, the exact upper value of int32 is 4,294,967,295.

NumPy Data Types

Check out NumPy Data Types documentation page for mode details for data types in Python arrays.

GDAL Data Types

Check out GDAL Data Types documentation page for mode details for data types in raster maps.