(gep-10)= # GEP 10 — Units and Dimensionality ```{list-table} - * Author * [Marvin Immesberger](https://github.com/MImmesberger) - * Status * Draft - * Type * Standards Track - * Created * 2026-06-03 - * Resolution * (none yet) ``` ## Abstract This GEP gives every quantity in GETTSIM a **unit** — Euros, Euros per square meter, etc. — declared on parameters, policy functions, and (optionally) input data. The framework reads those units to do two things: - **Dimensional safety.** It checks that the arithmetic combining quantities is sound, so mixing incompatible kinds — say, a monthly amount and a per-square-meter rent — becomes a loud error when the model is defined, not a silent wrong number far downstream. - **Automatic unit conversion.** It converts compatible quantities to a common unit. For example, parameters denominated in Deutsche Mark can be converted to Euros at build time, so a parameter's history can include values in both currencies and the user can run in either one without hand-converting the numbers. Time conversions of flows work the same way. The existing `_y`/`_q`/`_m`/`_w`/`_d` suffix convention is preserved. The engine is [pint](https://pint.readthedocs.io), and it runs **only while the model is built**: it checks dimensions and converts units, then steps aside. The numeric runtime is unchanged. As in {ref}`GEP 9 `, the checks fire at definition time, catching a whole class of unit bugs before they can reach a result. ### Terminology - **dimension** — the basic kind of a quantity: `[currency]`, `[time]`, `[area]`, or dimensionless. Counting quantities (children, adults, household members) are dimensionless, following the SI and pint convention. - **unit** — a particular way of measuring a dimension, such as Euros for `[currency]` or years for `[time]`. A unit carries a conversion factor to the dimension's base unit, so e.g. `1 month = 1/12 year`. The available units are called **unit tokens**. ## Motivation and Scope Three long-standing problems motivate this GEP. 1. **No dimensional safety.** The DAG carries quantities of many kinds, but a function body may add, subtract, or compare them freely. `betrag_m + miete_pro_qm_m` (a monthly amount plus a monthly rent *per square meter*) is a bug that runs silently today and surfaces, if at all, as an implausible number far downstream. 1. **Hand-converted historical currency.** Every Deutsche-Mark-era parameter is divided by `1.95583` by a maintainer before being written to YAML, with the original value preserved only in a free-text `note`. There is no machine-checkable provenance and no guard against a transcription error. This is both prone to errors and violates GETTSIM's law-to-code approach. 1. **Hand-written time arithmetic.** `ttsim/unit_converters.py` implements ~50 conversion functions (`y_to_m`, `per_y_to_per_m`, …) and their stock/flow duals by hand. The resulting arithmetic has itself been a source of bugs. **Scope.** The GEP covers `ttsim` (the framework) and `gettsim` (the German currencies and the policy annotations). GEP 1's `_y`/`_q`/`_m`/`_w`/`_d` suffix automation is preserved; only the *arithmetic* behind the conversions moves onto the unit engine. ## Usage and Impact Units enter the model through its **data**: every parameter and every input column carries a `unit=` declaration. From there the framework works out the unit of whatever a policy function computes by running the body on its inputs (the *dry-run*); the function still restates that unit in `unit=`, checked against the inferred result so its declaration is a guard rail, not a new source of truth. Flow tokens (`CURRENCY_FLOW`, …) take their period from the {ref}`GEP 1 ` name suffix: ```python @policy_function(unit=Unit.CURRENCY_FLOW) # name betrag_m -> resolved CURRENCY/month def betrag_m(regelsatz_m: float, anzahl: int) -> float: return regelsatz_m * anzahl @policy_function(unit=Unit.CURRENCY) # a stock; a time suffix would be an error def vermögen(aktien: float, immobilien: float) -> float: return aktien + immobilien ``` A policy function names no particular currency, so the same body serves a Euro run and a DM run unchanged; parameters, by contrast, record their legal currency in the token itself (`DM_FLOW`, `EUR_FLOW`). One optional `currency` argument to `main()` picks the currency the model runs in — defaulting to the registered base currency (`"EUR"` for GETTSIM) — and every currency-denominated parameter is converted to it at build time. Tagging input data with units is **optional**, through a dedicated unit-annotated input tree; results can likewise be returned as a unit-annotated tree in precise run-currency units. And every mistake the framework can see — a mistyped token, mixing incompatible quantities, a unit that does not line up across a DAG edge, a missing declaration — surfaces when the model is *defined* (at decoration, load, or environment build), never as a wrong number at run time. `DIMENSIONLESS` is a real declaration — it states that the quantity carries no dimension — not a blank one. The rest of this GEP is the reference: the {ref}`token vocabulary `, the {ref}`period sources `, the {ref}`currency model `, and {ref}`exactly what the checks catch `. ## Backward Compatibility - **User code shape is unchanged.** Bare arrays and the DataFrame/mapper interface keep working; `currency` defaults to `"EUR"` and output stays in Euros. - **The `unit`/`reference_period` metadata is repurposed.** `unit` becomes one member of the token vocabulary and `reference_period` becomes *functional* (it supplies the period for `…_FLOW` parameters that no name can carry) rather than purely descriptive. - **No blanket opt-out.** Unlike the {ref}`GEP 9 ` beartype claw, there is no env-var escape hatch that switches the unit check off wholesale; the only opt-out is per-function and body-only (`verify_units=False`, {ref}`see below `). - **A migration is required.** Every node must declare a unit; suffix-less flow parameters are renamed to carry a time suffix (`arbeitnehmerpauschbetrag` → `arbeitnehmerpauschbetrag_y`), since the suffix is now the period source wherever a name can carry one; and a bare literal of a real dimension is promoted to a parameter or its function body opts out with `verify_units=False`. ## Detailed Description (gep-10-vocabulary)= ### The unit vocabulary A declaration is one member of the **token vocabulary**. Its backbone is a closed core enumeration — a `Unit` `StrEnum` shipped by `ttsim`, spelled identically in code (`Unit.CURRENCY_FLOW`) and in YAML (`unit: CURRENCY_FLOW`): | token | resolves to | typical use | | -------------------------------- | -------------------------------- | ------------------------ | | `CURRENCY_FLOW` | `CURRENCY / period` | wages, claims, benefits | | `CURRENCY` | `CURRENCY` | wealth, asset thresholds | | `DIMENSIONLESS` | `dimensionless` | shares, rates, counts | | `DIMENSIONLESS_FLOW` | `1 / period` | Zugangsfaktor per year | | `YEARS` | `year` | ages, durations | | `HOURS_FLOW` | `hour / period` (dimensionless) | working hours | | `SQUARE_METERS` | `meter ** 2` | dwelling size | | `CURRENCY_PER_SQUARE_METER_FLOW` | `CURRENCY / meter ** 2 / period` | rent caps | A token ending in `…_FLOW` needs a period; every other token is complete as written and takes no period. So the `…_FLOW` suffix is the only flow marker — there is no separate "stock" spelling, a currency stock is the bare `CURRENCY` token. Tokens are not pint syntax: each resolves internally to a pint unit (flow tokens after the period is filled in), but pint expressions never appear in a declaration. `HOURS_FLOW` is the one flow token that resolves to a *dimensionless* quantity: hours and the period are both `[time]`, so hours per week is a time-over-time ratio. It is kept as a distinct token so the time-suffix and time-conversion bookkeeping still apply to working hours, but dimensionally it cannot be told apart from a bare `DIMENSIONLESS` quantity. Likewise, a *per-period* dimensionless quantity is `DIMENSIONLESS_FLOW`, not `DIMENSIONLESS`: the pension Zugangsfaktor moves by a fixed factor per year of earlier or later retirement (`zugangsfaktor_veränderung_y`, § 77 SGB VI) — a pure number, but *per year* it is `1/year`, and multiplied by the gap in `YEARS` the years cancel to the dimensionless adjustment. **Counting quantities, booleans, and identifiers are dimensionless** (`DIMENSIONLESS`), following SI and pint convention. A per-person parameter declares the same token as any other amount (`EUR_FLOW` for a monthly Regelsatz); scaling it by a head count is a plain multiplication that preserves the unit. A boolean is a `{0, 1}` value, and an identifier (`p_id`, `*_id`, `p_id_*`) carries no dimension — both spell that out rather than being silently waved through. ```yaml beitragssatz: unit: DIMENSIONLESS # a rate is dimensionless reference_period: null type: scalar 2024-01-01: value: 0.013 ``` **There are no exemptions** — every active node has a unit; only its *source* differs. Most nodes declare it. Derived nodes get one auto-assigned ({ref}`see below `); the framework-injected date nodes get theirs from the framework (`policy_year` is in years, etc.). So `UNSET_UNIT` has a single meaning — *no declaration was made* — which the mandatory-units check always reports as an error, with no second "legitimately blank" reading to disambiguate. Beyond the core enumeration, the full vocabulary adds one set of **concrete currency tokens** per registered currency ({ref}`see Currency `); the currency-dimensioned rows of the table above are the *agnostic* tokens. The core enumeration lives in `ttsim`, is shared by all downstream packages, and grows only by an upstream PR; each package's params JSON schema stays statically enumerable, listing the core tokens plus its own currency tokens. ### pint runs at build time only The foundational constraint is that pint never wraps a live array. A `pint.Quantity` is not a JAX pytree and does not trace under `jit`; wrapping runtime columns would fight both JAX and the GEP-9 `FloatColumn` vocabulary. Instead, pint is used in two build-time roles: - to compute conversion **factors** (time and currency), which are baked into the compiled workers as plain numeric constants; and - to run the **dry-run** dimensionality check on representative `Quantity`s. The numeric runtime path stays pure arrays, single currency, and JAX-safe. Time is a first-class pint dimension here: the conversion factors are sourced from pint (`Quantity(1, "year").to("month")`), while the suffix auto-generation and naming follow the {ref}`GEP 1 ` conventions. (gep-10-periods)= ### Units, suffixes, and periods A flow token is completed by exactly one period source; complete tokens admit none. The period comes from the **name suffix wherever a name or key can carry one**, and from `reference_period` only where it cannot: | node kind | flow period from | `reference_period` | | ----------------------------------------- | ----------------------------- | ------------------ | | column / policy function | name suffix `_y/_q/_m/_w/_d` | forbidden | | scalar parameter / string-keyed dict leaf | name (or key) suffix | forbidden | | integer-keyed dict leaf | dict-level `reference_period` | required | | mapping parameter axis | `reference_period` | required | Where the suffix supplies the period it is also *mandatory and exclusive*: a time suffix requires a `…_FLOW` token and a `…_FLOW` token requires a time suffix, so a complete token on a suffixed name — or a flow token on an unsuffixed one — fails at build. This makes the {ref}`GEP 1 ` convention machine-checked: a node named `…_m` whose body computes a stock cannot be declared. Because `reference_period` is forbidden there, there is nothing to reconcile; only where no name carries a suffix (integer keys, schedule axes) is `reference_period` functional. ### Dict parameters with heterogeneous leaves A dict parameter whose leaves carry different units declares `unit:` as a **mapping from leaf keys to tokens** (or `DIMENSIONLESS` for a dimensionless leaf). A flow leaf with a string key takes its period from the key's own time suffix; an integer-keyed flow leaf, which has no suffix to carry, takes it from the dict-level `reference_period`: ```yaml schedule: unit: child_amount_y: EUR_FLOW # string key -> period from its own _y max_age: YEARS type: dict 2024-01-01: child_amount_y: 3000.0 max_age: 18 ``` ```yaml satz_nach_kindanzahl: unit: EUR_FLOW # uniform: one token for all leaves reference_period: Month # integer keys carry no suffix -> dict-level period type: dict 2024-01-01: 1: 250.0 2: 250.0 ``` Where a leaf key carries a suffix *and* the dict also sets a `reference_period`, the two must coincide — there is no precedence order. Mixed-period dicts are legal when each flow leaf carries its own suffix (`base_amount_m` next to `annual_bonus_y`). **Leaves that change name across the parameter's history.** The `unit:` mapping is a **union over all dated entries**: the mandatory-units check looks only at the leaves present in the value active at the policy date and ignores mapping entries for leaves that exist only at other dates. So a leaf renamed across a reform is covered by listing both names (`child_amount_y` before, `base_amount_y` after). A value leaf with no entry in the mapping is a *missing* declaration and is flagged, so a mistyped key cannot pass silently. When the renamed leaves share a token, the simpler **uniform** form — a single scalar `unit: EUR_FLOW` with the period read from each leaf's own suffix — makes the rename irrelevant; the mapping is only for genuinely heterogeneous leaves. A leaf whose *currency* changes across dates is a changeover ({ref}`see Currency `). In the dry-run, dict parameters become dicts of representative `Quantity`s (uniform for a scalar `unit:`, per-leaf for a mapping), so bodies that subscript them are verifiable. ### Mapping parameters: one token per axis A schedule or lookup table is not a quantity — it is a *function between quantities*, with a domain and a codomain. The mapping parameter types (the `piecewise_*` family, the lookup tables, the phase-in/out types) therefore declare `input_unit:` and `output_unit:` instead of `unit:`; a `unit:` on them is an error, and the JSON schema enforces the split per `type:`: ```yaml tarif: input_unit: EUR_FLOW # taxable income per year in ... output_unit: EUR_FLOW # ... tax per year out reference_period: Year type: piecewise_quadratic ... ``` Each axis token follows the same kind rules as a scalar declaration; per-axis declarations are single tokens (or `DIMENSIONLESS`), never mappings. The single `reference_period` supplies the period of *every* flow axis; a `reference_period` that no flow axis consumes is dangling and fails; a time suffix on the parameter's *name* must coincide with the **output** axis — the suffix names what the parameter yields. (gep-10-currency)= ### Currency Currencies live in the framework as a `[currency]` dimension, with concrete currencies registered by downstream packages via `register_currency(name, *, base=False, definition=None)`. `gettsim` registers `EUR` (base) and `DM = EUR / 1.95583`. Registration does two things: it provides the **conversion factors**, with pint as the single source of truth for the rate; and it derives the currency's **declaration tokens** — one concrete variant per currency-dimensioned core token (`DM`, `DM_FLOW`, `DM_PER_SQUARE_METER_FLOW`, `EUR_*`, …) — spelled by replacing the agnostic `CURRENCY` prefix with the upper-cased currency name. **Agnostic and concrete tokens.** A **currency-agnostic token** (`CURRENCY`, `CURRENCY_FLOW`, …) is a placeholder for any registered currency: it declares the unit of a function or column for which it does not matter which currency the model runs in. A **concrete currency token** (`DM_FLOW`, `EUR`) names one specific currency; what it adds over its agnostic counterpart is **denomination** — it names the currency a parameter's stored numbers are written in, which the build-time conversion reads off the declaration. For every *dimensionality* check a concrete token means exactly what its agnostic counterpart means: the dry-run and the edge check compare at the dimension level and never see a concrete currency, so a DM-denominated parameter feeds a currency-agnostic function without further ado, while adding Euros to Euros per square meter is still caught. **Parameters must be concrete; functions must be agnostic.** A parameter's numbers are written in *some* currency, so once a concrete currency is registered, an agnostic `CURRENCY_*` token on a parameter is a build error — the declaration must name the denomination (`DM_FLOW`, not `CURRENCY_FLOW`). Columns and functions may *only* declare agnostic tokens. **The run currency.** The `currency` argument to `main()` defaults to the registered base currency; it is the currency the input data is taken to be in and that the outputs come out in. At environment build, every currency-denominated *parameter* is converted from its declared denomination to the run currency: scalar values, dict parameters leaf by leaf (each currency leaf by its own token), schedules axis by axis, and lookup-table values. The factors are baked in at build time; the numeric runtime path stays single-currency. **A changeover within one parameter's history.** A dated entry may restate the unit field(s), overriding the top-level declaration for that entry's numbers. This is how the DM→Euro switch is written — entries before the reform denominated in the legacy currency, entries from the reform date in the new one: ```yaml arbeitnehmerpauschbetrag_y: unit: DM_FLOW type: scalar 1990-01-01: value: 2000 2002-01-01: unit: EUR_FLOW # the changeover: denominated in Euro from here on value: 1044 ``` `updates_previous` cannot cross a changeover: an entry that restates the unit declaration must restate the full value, because a merged value would mix numbers denominated in different currencies. (gep-10-checks)= ### Build-time checks and boundary conversion The checks run in two layers, both at build time, neither needing a fabricated dataset: | | **Layer 1 — DAG validity** | **Layer 2 — boundary** | | ------ | ------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------- | | when | `fail_if` on the assembled environment | GEP-9 canonicalisation boundary | | input | none — representative `Quantity`s | the user's unit-annotated input tree | | checks | inferred body unit vs. declaration; `+`/`−`/ordering operands equivalent; producer↔consumer edges agree | tag currency → run currency; period vs. suffix; unknown token rejected; every tag's dimension vs. resolved unit | **Layer 1** runs each scalar body in NumPy+pint, infers the unit that falls out, and checks it against the declaration; an edge-consistency pass then confirms each producer's unit equals its consumer's declared expectation. The mechanics are below. **Layer 2** is offered through the unit-annotated input tree (a sibling of the ordinary input tree in which every leaf is a pint `Quantity`). Only the tree interface carries tags — a DataFrame column has nowhere to hang a per-column unit, so the DataFrame modes and the bare tree stay tag-free and are taken to already be in the run currency. When the mode is used **every** leaf must be tagged, including identifiers and other dimensionless columns (tagged `dimensionless`) — a full-coverage contract that is what lets the dimension check be exhaustive. The dimension check reads the extracted input units against the resolved environment units; it feeds no node, so it adds no back-edge to the boundary and needs no declared unit threaded through `processed_data`. Symmetrically, the **unit-annotated result tree** relabels each output leaf with its precise run-currency unit (`euro/month`, not the agnostic `CURRENCY_FLOW`) — pure naming, since results are already computed in the run currency. **How the dry-run checks one body.** The check *runs the function body*, but with **units in place of numbers**. Each input becomes a stand-in carrying its resolved unit and a throwaway magnitude of `1`; pint carries the units through the body's arithmetic, and the unit that falls out of the `return` is compared to the declaration. Because the magnitude is never used, no real data is needed: ```python @policy_function(unit=Unit.CURRENCY_FLOW) # -> CURRENCY / month def betrag_m( einkommen_m: float, satz: float, mindestbetrag_m: float, befreit: bool ) -> float: if befreit: return 0.0 if einkommen_m > mindestbetrag_m: return einkommen_m * satz return mindestbetrag_m ``` Here `einkommen_m` and `mindestbetrag_m` enter as `EUR/month`, `satz` as a dimensionless `1`, and `befreit` as a boolean stand-in. `einkommen_m * satz` is a flow times a dimensionless number, so it stays `EUR/month` — matching the declaration; the `mindestbetrag_m` arm matches too. **Every branch is covered, by re-running.** To evaluate `if befreit:` Python needs a yes/no, but a unit stand-in has no value to compare. So the stand-in intercepts the *truth test* itself (Python's `__bool__`) and hands it to a small driver — the **path explorer** — that decides which way to go, re-running the body and steering the open branches differently each time (a depth-first walk of the decision tree, in the style of *concolic* execution) until every reachable path is driven. The number of runs is the number of *reachable* paths, not `2^n`: when `befreit` is true the body returns before the income test, so that comparison never even becomes a question. Each run's result is checked on its own, so a unit slip on a single arm — say, returning a yearly figure where `_m` was declared — is caught even though the other arms are clean. A `return 0.0` arm yields a dimensionless result and falls back to the declaration, so the ubiquitous `if befreit: return 0.0` guard never raises a false alarm. **What the dry-run catches:** - a body whose inferred unit disagrees with its declaration, on any reachable branch — a stock times a per-year rate labelled as a stock, or a `_m` flow returned where `_y` is declared; - an addition or subtraction of two non-equivalent quantities — a monthly flow plus a yearly one (`betrag_m + freibetrag_y`), or a stock plus a flow. At run time the assembled DAG computes on bare arrays with no pint, so such a combination is unit-blind and silently wrong; the dry-run rejects it rather than letting pint's build-time auto-conversion of same-dimension operands paper over it; - an ordering comparison (`<`, `<=`, `>`, `>=`) of two non-equivalent quantities; - a missing unit, and malformed declarations: a flow token without a period, a currency-agnostic token on a parameter, disagreeing period sources, or a boolean node carrying a concrete unit. **What it cannot catch:** - **wrong magnitudes** — a coefficient, rate, or constant with the correct unit but the wrong value (a 2.5% rate written as `0.25`); units are not values; - **a result that comes out dimensionless** — a body inferring a dimensionless value (an early `return 0.0`, or arithmetic that cancels) falls back to the declaration; - **equality comparisons** — `==` and `!=` are not unit-screened, unlike ordering. **A body the dry-run cannot evaluate must opt out explicitly.** The dry-run executes a *scalar* body symbolically, so a body it cannot trace must opt out: vectorized functions (`vectorization_strategy="not_required"`, no scalar form for pint to walk), piecewise polynomials and lookup tables (evaluated by table machinery), and bodies calling `join` or a raw `xnp` op. Rather than silently trusting such a body, the check **rejects** it unless the author marks it `verify_units=False` on the decorator. The opt-out is of body *inference only*: the declared unit still stands and the body's edges are still checked. Because the flag is explicit, every un-verified body is greppable — a deliberate choice, and a ready-made worklist should the dry-run later learn to evaluate these operations. (gep-10-auto)= ### Auto-generated nodes Auto-generated nodes receive auto-assigned units: time-conversion variants inherit the source's base token and read the variant's period off its own suffix; auto-aggregations derive their token from the source and the aggregation type, paralleling how {ref}`GEP 4 ` resolves their types. `SUM`/`MEAN`/`MIN`/`MAX` preserve the source token; `COUNT` is a head count and is `DIMENSIONLESS`; `ANY`/`ALL` yield a boolean (a dimensionless quantity) and so auto-assign `DIMENSIONLESS` (as does a `SUM` over a boolean column — a head count). A `@group_creation_function` group id is auto-assigned `DIMENSIONLESS` (it is an identifier, and the decorator exposes no `unit=`). Where the source's token pins down a concrete currency (a parameter), the derived node inherits the **agnostic counterpart**. ### Literals The dry-run executes a body on representative `Quantity`s, so a bare numeric literal combined *additively* with a unit-carrying value raises (pint refuses to add a dimensionless number to a currency). A literal that is only a multiplicative factor (`betrag * 0.5`) is fine — multiplying by a dimensionless number preserves the unit. Most apparent cases dissolve once the quantities are declared correctly: an ordinal such as `geburtsmonat` (the month 1–12) is `DIMENSIONLESS`, so `geburtsmonat - 1` is dimensionless arithmetic and needs no tag. For a genuine constant of a real dimension, either **promote it to a parameter** (the norm — it then gets the same provenance, currency conversion, and checking as any other parameter, and the body becomes dry-runnable), or **opt the body out** with `@policy_function(verify_units=False)` for genuine code-level constants where a parameter would be artificial (the same body-level opt-out as above). ## Related Work - {ref}`GEP 9 `: runtime type checking via beartype; this GEP follows its build-boundary philosophy and its "loud at the boundary you wrote" goal. - {ref}`GEP 1 `: the time/aggregation suffix conventions this GEP preserves. - [pint](https://pint.readthedocs.io): the unit registry, dimensionality analysis, and NumPy (NEP-18) support relied on here. ## Implementation Delivered as several PRs, with the framework proven on `mettsim` before any German annotation. The tracking issues are: - ttsim [#117](https://github.com/ttsim-dev/ttsim/issues/117) — framework core + tracer bullet - ttsim [#118](https://github.com/ttsim-dev/ttsim/issues/118) — full dimension model - ttsim [#119](https://github.com/ttsim-dev/ttsim/issues/119) — mandatory units + edge-consistency - ttsim [#120](https://github.com/ttsim-dev/ttsim/issues/120) — currency knob + Layer-2 boundary - ttsim [#121](https://github.com/ttsim-dev/ttsim/issues/121) — annotate mettsim, switch check on, CI test - gettsim [#1191](https://github.com/ttsim-dev/gettsim/issues/1191) — register EUR/DM - gettsim [#1192](https://github.com/ttsim-dev/gettsim/issues/1192) — gettsim rollout Each package's params schema enumerates its own token vocabulary: the core tokens minus the agnostic currency tokens (the schema governs parameters, which must be concrete) plus the concrete variants of that package's registered currencies. It also enforces the `unit:` XOR `input_unit:`/`output_unit:` split per parameter `type:` and admits the per-entry overrides in dated entries. The schema shipped with ttsim (listing mettsim's `CASTAR_*`/`SILVER_PENNY_*` tokens) is the template; the copy at `docs/geps/params-schema.json` (the validation target for all German parameter YAMLs) is migrated together with the YAML files in #1192, adding the `DM_*`/`EUR_*` tokens. ## Alternatives ### Runtime pint Quantities flowing through the DAG Rejected. `Quantity` is not a JAX pytree, breaks tracing, contradicts the GEP-9 column vocabulary, and adds hot-path cost. Units in a tax-transfer model are static structural properties of nodes, not of data, so runtime wrapping buys nothing the build-time check does not already provide. ### Inference-only (no declared units) Rejected in favour of mandatory declarations. Inference alone localises a bug only where dimensions clash downstream; a mandatory declared return unit localises it at the offending function and is self-documenting, at the cost of annotation churn the codebase largely already absorbs for types. ### Keep hand-written time conversions; use pint only for checks Possible, but the stock/flow duality is exactly what a unit engine encodes for free. Sourcing the factors from pint removes a class of hand-maintained arithmetic without touching the naming. ### A `[count]` dimension for head counts Considered, prototyped, and rejected. An earlier draft promoted counting quantities to a custom `[count]` dimension, making per-person parameters `CURRENCY / count` and head counts `count`. The intended payoff was catching a forgotten per-capita scaling. It was dropped because: - the protection is weaker than it looks: a single generic `[count]` cannot distinguish per-child from per-adult from per-household, so scaling by the *wrong* count still type-checks — only the forgot-entirely case is caught; - the annotation tax lands on every per-capita parameter in the system (Regelsätze, Kindergeld, Freibeträge, …), which would read `CURRENCY / count` where the law and every practitioner say "Euros per month"; - SI and pint treat counting quantities as dimensionless; deviating surprises anyone who knows either. The accepted cost is that a missing per-capita scaling is no longer a unit error. If that bug class accumulates in practice, the closed token vocabulary makes a future amendment with genuinely distinct dimensions (`[person]`, `[child]`, …) a clean retrofit. ### Make functions time-agnostic Rejected. Collapsing `betrag_m` and `betrag_y` into one node would erase the law-to-code correspondence GEP 1 is built on. ## Discussion (Open. To be resolved on Zulip.) Known points for debate: the strictness of literal tagging; whether per-capita scaling should ever get dedicated dimensions (see the rejected `[count]` alternative — revisit if missing-scale bugs accumulate); and whether the gettsim rollout should be a single large PR or staged behind a temporary gate. ## References and Footnotes - [gettsim #1174 (the originating DM-values discussion)](https://github.com/ttsim-dev/gettsim/issues/1174) - [pint](https://pint.readthedocs.io) - [NEP 18 (NumPy `__array_function__`)](https://numpy.org/neps/nep-0018-array-function-protocol.html) ## Copyright This document has been placed in the public domain.