babilonia.tools.parse#
Parse and standardize bank statement CSV files.
This script scans a target directory for bank statement files
(T0 source format), standardizes their structure using the appropriate
CashFlow parser, and writes the processed outputs as new CSV files
(T1 canonical format) in the same yearly subdirectories.
Processing can be restricted to a single year or applied to all available years. When multiple years are present, files are processed year by year with structured terminal output to facilitate log inspection.
Scripts Examples#
The script is intended for command-line execution.
Minimal PowerShell example (Windows)
Save as run_parse.ps1 and execute from PowerShell.
# ! Warning -- change paths and parameters
# Paths
$REPO = "C:\path\to\repo"
$SCRIPT = "$REPO\babilonia\tools\parse.py"
$DATA = "C:\data\bank_statements"
# Parameters
$TYPE = "bb-cc"
$YEAR = 2024
# Run script
python $SCRIPT `
--folder $DATA `
--type $TYPE `
--year $YEAR
Minimal shell example (Linux)
Save as run_parse.sh and execute from a terminal.
#!/usr/bin/env bash
# ! Warning -- change paths and parameters
# Paths
REPO="/path/to/repo"
SCRIPT="$REPO/babilonia/tools/parse.py"
DATA="/data/bank_statements"
# Parameters
TYPE="bb-cc"
YEAR=2024
# Run script
python "$SCRIPT" --folder "$DATA" --type "$TYPE" --year "$YEAR"
Expected Folder Structure#
The input data is expected to follow a simple hierarchical layout:
bb/ # Bank
└── cc/ # Bank account
├── 2022/
│ ├── EXTRATO_BB_CC_2022-01_T0.csv
│ └── EXTRATO_BB_CC_2022-02_T0.csv
├── 2023/
│ └── EXTRATO_BB_CC_2023-01_T0.csv
└── 2024/
├── EXTRATO_BB_CC_2024-01_T0.csv
└── EXTRATO_BB_CC_2024-02_T0.csv
Each *_T0.csv file represents a raw (Tier 0) bank statement.
During execution, the script generates standardized Tier 1 outputs alongside the original files:
bb/ # Bank
└── cc/ # Bank account
└── 2024/
├── EXTRATO_BB_CC_2024-01_T0.csv # original
└── EXTRATO_BB_CC_2024-01_T1.csv # standardized (canonical)
Data Levels#
Tier 0 (T0): Raw statement files as exported by the bank. Column names, formats, and ordering may vary.
Tier 1 (T1): Canonical, standardized CSV files produced by this script, suitable for downstream analysis and aggregation.
The original Tier 0 files are never modified; Tier 1 files are written only when they do not already exist.
Functions
|