GEP 10 — Units and Dimensionality#

Author

Marvin Immesberger

Status

Draft

Type

Standards Track

Created

2026-06-03

Resolution

(none yet)

Abstract#

This GEP gives every quantity in GETTSIM a unit — Euros, Euros per square meter, etc. — declared on parameters, policy functions, and (optionally) input data. The framework reads those units to do two things:

  • Dimensional safety. It checks that the arithmetic combining quantities is sound, so mixing incompatible kinds — say, a monthly amount and a per-square-meter rent — becomes a loud error when the model is defined, not a silent wrong number far downstream.

  • Automatic unit conversion. It converts compatible quantities to a common unit. For example, parameters denominated in Deutsche Mark can be converted to Euros at build time, so a parameter’s history can include values in both currencies and the user can run in either one without hand-converting the numbers. Time conversions of flows work the same way. The existing _y/_q/_m/_w/_d suffix convention is preserved.

The engine is pint, and it runs only while the model is built: it checks dimensions and converts units, then steps aside. The numeric runtime is unchanged. As in GEP 9, the checks fire at definition time, catching a whole class of unit bugs before they can reach a result.

Terminology#

  • dimension — the basic kind of a quantity: [currency], [time], [area], or dimensionless. Counting quantities (children, adults, household members) are dimensionless, following the SI and pint convention.

  • unit — a particular way of measuring a dimension, such as Euros for [currency] or years for [time]. A unit carries a conversion factor to the dimension’s base unit, so e.g. 1 month = 1/12 year. The available units are called unit tokens.

Motivation and Scope#

Three long-standing problems motivate this GEP.

  1. No dimensional safety. The DAG carries quantities of many kinds, but a function body may add, subtract, or compare them freely. betrag_m + miete_pro_qm_m (a monthly amount plus a monthly rent per square meter) is a bug that runs silently today and surfaces, if at all, as an implausible number far downstream.

  2. Hand-converted historical currency. Every Deutsche-Mark-era parameter is divided by 1.95583 by a maintainer before being written to YAML, with the original value preserved only in a free-text note. There is no machine-checkable provenance and no guard against a transcription error. This is both prone to errors and violates GETTSIM’s law-to-code approach.

  3. Hand-written time arithmetic. ttsim/unit_converters.py implements ~50 conversion functions (y_to_m, per_y_to_per_m, …) and their stock/flow duals by hand. The resulting arithmetic has itself been a source of bugs.

Scope. The GEP covers ttsim (the framework) and gettsim (the German currencies and the policy annotations). GEP 1’s _y/_q/_m/_w/_d suffix automation is preserved; only the arithmetic behind the conversions moves onto the unit engine.

Usage and Impact#

Units enter the model through its data: every parameter and every input column carries a unit= declaration. From there the framework works out the unit of whatever a policy function computes by running the body on its inputs (the dry-run); the function still restates that unit in unit=, checked against the inferred result so its declaration is a guard rail, not a new source of truth. Flow tokens (CURRENCY_FLOW, …) take their period from the GEP 1 name suffix:

@policy_function(unit=Unit.CURRENCY_FLOW)  # name betrag_m -> resolved CURRENCY/month
def betrag_m(regelsatz_m: float, anzahl: int) -> float:
    return regelsatz_m * anzahl


@policy_function(unit=Unit.CURRENCY)  # a stock; a time suffix would be an error
def vermögen(aktien: float, immobilien: float) -> float:
    return aktien + immobilien

A policy function names no particular currency, so the same body serves a Euro run and a DM run unchanged; parameters, by contrast, record their legal currency in the token itself (DM_FLOW, EUR_FLOW). One optional currency argument to main() picks the currency the model runs in — defaulting to the registered base currency ("EUR" for GETTSIM) — and every currency-denominated parameter is converted to it at build time.

Tagging input data with units is optional, through a dedicated unit-annotated input tree; results can likewise be returned as a unit-annotated tree in precise run-currency units. And every mistake the framework can see — a mistyped token, mixing incompatible quantities, a unit that does not line up across a DAG edge, a missing declaration — surfaces when the model is defined (at decoration, load, or environment build), never as a wrong number at run time. DIMENSIONLESS is a real declaration — it states that the quantity carries no dimension — not a blank one.

The rest of this GEP is the reference: the token vocabulary, the period sources, the currency model, and exactly what the checks catch.

Backward Compatibility#

  • User code shape is unchanged. Bare arrays and the DataFrame/mapper interface keep working; currency defaults to "EUR" and output stays in Euros.

  • The unit/reference_period metadata is repurposed. unit becomes one member of the token vocabulary and reference_period becomes functional (it supplies the period for …_FLOW parameters that no name can carry) rather than purely descriptive.

  • No blanket opt-out. Unlike the GEP 9 beartype claw, there is no env-var escape hatch that switches the unit check off wholesale; the only opt-out is per-function and body-only (verify_units=False, see below).

  • A migration is required. Every node must declare a unit; suffix-less flow parameters are renamed to carry a time suffix (arbeitnehmerpauschbetragarbeitnehmerpauschbetrag_y), since the suffix is now the period source wherever a name can carry one; and a bare literal of a real dimension is promoted to a parameter or its function body opts out with verify_units=False.

Detailed Description#

The unit vocabulary#

A declaration is one member of the token vocabulary. Its backbone is a closed core enumeration — a Unit StrEnum shipped by ttsim, spelled identically in code (Unit.CURRENCY_FLOW) and in YAML (unit: CURRENCY_FLOW):

token

resolves to

typical use

CURRENCY_FLOW

CURRENCY / period

wages, claims, benefits

CURRENCY

CURRENCY

wealth, asset thresholds

DIMENSIONLESS

dimensionless

shares, rates, counts

DIMENSIONLESS_FLOW

1 / period

Zugangsfaktor per year

YEARS

year

ages, durations

HOURS_FLOW

hour / period (dimensionless)

working hours

SQUARE_METERS

meter ** 2

dwelling size

CURRENCY_PER_SQUARE_METER_FLOW

CURRENCY / meter ** 2 / period

rent caps

A token ending in …_FLOW needs a period; every other token is complete as written and takes no period. So the …_FLOW suffix is the only flow marker — there is no separate “stock” spelling, a currency stock is the bare CURRENCY token. Tokens are not pint syntax: each resolves internally to a pint unit (flow tokens after the period is filled in), but pint expressions never appear in a declaration.

HOURS_FLOW is the one flow token that resolves to a dimensionless quantity: hours and the period are both [time], so hours per week is a time-over-time ratio. It is kept as a distinct token so the time-suffix and time-conversion bookkeeping still apply to working hours, but dimensionally it cannot be told apart from a bare DIMENSIONLESS quantity. Likewise, a per-period dimensionless quantity is DIMENSIONLESS_FLOW, not DIMENSIONLESS: the pension Zugangsfaktor moves by a fixed factor per year of earlier or later retirement (zugangsfaktor_veränderung_y, § 77 SGB VI) — a pure number, but per year it is 1/year, and multiplied by the gap in YEARS the years cancel to the dimensionless adjustment.

Counting quantities, booleans, and identifiers are dimensionless (DIMENSIONLESS), following SI and pint convention. A per-person parameter declares the same token as any other amount (EUR_FLOW for a monthly Regelsatz); scaling it by a head count is a plain multiplication that preserves the unit. A boolean is a {0, 1} value, and an identifier (p_id, *_id, p_id_*) carries no dimension — both spell that out rather than being silently waved through.

beitragssatz:
  unit: DIMENSIONLESS   # a rate is dimensionless
  reference_period: null
  type: scalar
  2024-01-01:
    value: 0.013

There are no exemptions — every active node has a unit; only its source differs. Most nodes declare it. Derived nodes get one auto-assigned (see below); the framework-injected date nodes get theirs from the framework (policy_year is in years, etc.). So UNSET_UNIT has a single meaning — no declaration was made — which the mandatory-units check always reports as an error, with no second “legitimately blank” reading to disambiguate.

Beyond the core enumeration, the full vocabulary adds one set of concrete currency tokens per registered currency (see Currency); the currency-dimensioned rows of the table above are the agnostic tokens. The core enumeration lives in ttsim, is shared by all downstream packages, and grows only by an upstream PR; each package’s params JSON schema stays statically enumerable, listing the core tokens plus its own currency tokens.

pint runs at build time only#

The foundational constraint is that pint never wraps a live array. A pint.Quantity is not a JAX pytree and does not trace under jit; wrapping runtime columns would fight both JAX and the GEP-9 FloatColumn vocabulary. Instead, pint is used in two build-time roles:

  • to compute conversion factors (time and currency), which are baked into the compiled workers as plain numeric constants; and

  • to run the dry-run dimensionality check on representative Quantitys.

The numeric runtime path stays pure arrays, single currency, and JAX-safe. Time is a first-class pint dimension here: the conversion factors are sourced from pint (Quantity(1, "year").to("month")), while the suffix auto-generation and naming follow the GEP 1 conventions.

Units, suffixes, and periods#

A flow token is completed by exactly one period source; complete tokens admit none. The period comes from the name suffix wherever a name or key can carry one, and from reference_period only where it cannot:

node kind

flow period from

reference_period

column / policy function

name suffix _y/_q/_m/_w/_d

forbidden

scalar parameter / string-keyed dict leaf

name (or key) suffix

forbidden

integer-keyed dict leaf

dict-level reference_period

required

mapping parameter axis

reference_period

required

Where the suffix supplies the period it is also mandatory and exclusive: a time suffix requires a …_FLOW token and a …_FLOW token requires a time suffix, so a complete token on a suffixed name — or a flow token on an unsuffixed one — fails at build. This makes the GEP 1 convention machine-checked: a node named …_m whose body computes a stock cannot be declared. Because reference_period is forbidden there, there is nothing to reconcile; only where no name carries a suffix (integer keys, schedule axes) is reference_period functional.

Dict parameters with heterogeneous leaves#

A dict parameter whose leaves carry different units declares unit: as a mapping from leaf keys to tokens (or DIMENSIONLESS for a dimensionless leaf). A flow leaf with a string key takes its period from the key’s own time suffix; an integer-keyed flow leaf, which has no suffix to carry, takes it from the dict-level reference_period:

schedule:
  unit:
    child_amount_y: EUR_FLOW   # string key -> period from its own _y
    max_age: YEARS
  type: dict
  2024-01-01:
    child_amount_y: 3000.0
    max_age: 18
satz_nach_kindanzahl:
  unit: EUR_FLOW            # uniform: one token for all leaves
  reference_period: Month   # integer keys carry no suffix -> dict-level period
  type: dict
  2024-01-01:
    1: 250.0
    2: 250.0

Where a leaf key carries a suffix and the dict also sets a reference_period, the two must coincide — there is no precedence order. Mixed-period dicts are legal when each flow leaf carries its own suffix (base_amount_m next to annual_bonus_y).

Leaves that change name across the parameter’s history. The unit: mapping is a union over all dated entries: the mandatory-units check looks only at the leaves present in the value active at the policy date and ignores mapping entries for leaves that exist only at other dates. So a leaf renamed across a reform is covered by listing both names (child_amount_y before, base_amount_y after). A value leaf with no entry in the mapping is a missing declaration and is flagged, so a mistyped key cannot pass silently. When the renamed leaves share a token, the simpler uniform form — a single scalar unit: EUR_FLOW with the period read from each leaf’s own suffix — makes the rename irrelevant; the mapping is only for genuinely heterogeneous leaves. A leaf whose currency changes across dates is a changeover (see Currency).

In the dry-run, dict parameters become dicts of representative Quantitys (uniform for a scalar unit:, per-leaf for a mapping), so bodies that subscript them are verifiable.

Mapping parameters: one token per axis#

A schedule or lookup table is not a quantity — it is a function between quantities, with a domain and a codomain. The mapping parameter types (the piecewise_* family, the lookup tables, the phase-in/out types) therefore declare input_unit: and output_unit: instead of unit:; a unit: on them is an error, and the JSON schema enforces the split per type::

tarif:
  input_unit: EUR_FLOW    # taxable income per year in ...
  output_unit: EUR_FLOW   # ... tax per year out
  reference_period: Year
  type: piecewise_quadratic
  ...

Each axis token follows the same kind rules as a scalar declaration; per-axis declarations are single tokens (or DIMENSIONLESS), never mappings. The single reference_period supplies the period of every flow axis; a reference_period that no flow axis consumes is dangling and fails; a time suffix on the parameter’s name must coincide with the output axis — the suffix names what the parameter yields.

Currency#

Currencies live in the framework as a [currency] dimension, with concrete currencies registered by downstream packages via register_currency(name, *, base=False, definition=None). gettsim registers EUR (base) and DM = EUR / 1.95583. Registration does two things: it provides the conversion factors, with pint as the single source of truth for the rate; and it derives the currency’s declaration tokens — one concrete variant per currency-dimensioned core token (DM, DM_FLOW, DM_PER_SQUARE_METER_FLOW, EUR_*, …) — spelled by replacing the agnostic CURRENCY prefix with the upper-cased currency name.

Agnostic and concrete tokens. A currency-agnostic token (CURRENCY, CURRENCY_FLOW, …) is a placeholder for any registered currency: it declares the unit of a function or column for which it does not matter which currency the model runs in. A concrete currency token (DM_FLOW, EUR) names one specific currency; what it adds over its agnostic counterpart is denomination — it names the currency a parameter’s stored numbers are written in, which the build-time conversion reads off the declaration. For every dimensionality check a concrete token means exactly what its agnostic counterpart means: the dry-run and the edge check compare at the dimension level and never see a concrete currency, so a DM-denominated parameter feeds a currency-agnostic function without further ado, while adding Euros to Euros per square meter is still caught.

Parameters must be concrete; functions must be agnostic. A parameter’s numbers are written in some currency, so once a concrete currency is registered, an agnostic CURRENCY_* token on a parameter is a build error — the declaration must name the denomination (DM_FLOW, not CURRENCY_FLOW). Columns and functions may only declare agnostic tokens.

The run currency. The currency argument to main() defaults to the registered base currency; it is the currency the input data is taken to be in and that the outputs come out in. At environment build, every currency-denominated parameter is converted from its declared denomination to the run currency: scalar values, dict parameters leaf by leaf (each currency leaf by its own token), schedules axis by axis, and lookup-table values. The factors are baked in at build time; the numeric runtime path stays single-currency.

A changeover within one parameter’s history. A dated entry may restate the unit field(s), overriding the top-level declaration for that entry’s numbers. This is how the DM→Euro switch is written — entries before the reform denominated in the legacy currency, entries from the reform date in the new one:

arbeitnehmerpauschbetrag_y:
  unit: DM_FLOW
  type: scalar
  1990-01-01:
    value: 2000
  2002-01-01:
    unit: EUR_FLOW   # the changeover: denominated in Euro from here on
    value: 1044

updates_previous cannot cross a changeover: an entry that restates the unit declaration must restate the full value, because a merged value would mix numbers denominated in different currencies.

Build-time checks and boundary conversion#

The checks run in two layers, both at build time, neither needing a fabricated dataset:

Layer 1 — DAG validity

Layer 2 — boundary

when

fail_if on the assembled environment

GEP-9 canonicalisation boundary

input

none — representative Quantitys

the user’s unit-annotated input tree

checks

inferred body unit vs. declaration; +//ordering operands equivalent; producer↔consumer edges agree

tag currency → run currency; period vs. suffix; unknown token rejected; every tag’s dimension vs. resolved unit

Layer 1 runs each scalar body in NumPy+pint, infers the unit that falls out, and checks it against the declaration; an edge-consistency pass then confirms each producer’s unit equals its consumer’s declared expectation. The mechanics are below.

Layer 2 is offered through the unit-annotated input tree (a sibling of the ordinary input tree in which every leaf is a pint Quantity). Only the tree interface carries tags — a DataFrame column has nowhere to hang a per-column unit, so the DataFrame modes and the bare tree stay tag-free and are taken to already be in the run currency. When the mode is used every leaf must be tagged, including identifiers and other dimensionless columns (tagged dimensionless) — a full-coverage contract that is what lets the dimension check be exhaustive. The dimension check reads the extracted input units against the resolved environment units; it feeds no node, so it adds no back-edge to the boundary and needs no declared unit threaded through processed_data. Symmetrically, the unit-annotated result tree relabels each output leaf with its precise run-currency unit (euro/month, not the agnostic CURRENCY_FLOW) — pure naming, since results are already computed in the run currency.

How the dry-run checks one body. The check runs the function body, but with units in place of numbers. Each input becomes a stand-in carrying its resolved unit and a throwaway magnitude of 1; pint carries the units through the body’s arithmetic, and the unit that falls out of the return is compared to the declaration. Because the magnitude is never used, no real data is needed:

@policy_function(unit=Unit.CURRENCY_FLOW)  # -> CURRENCY / month
def betrag_m(
    einkommen_m: float, satz: float, mindestbetrag_m: float, befreit: bool
) -> float:
    if befreit:
        return 0.0
    if einkommen_m > mindestbetrag_m:
        return einkommen_m * satz
    return mindestbetrag_m

Here einkommen_m and mindestbetrag_m enter as EUR/month, satz as a dimensionless 1, and befreit as a boolean stand-in. einkommen_m * satz is a flow times a dimensionless number, so it stays EUR/month — matching the declaration; the mindestbetrag_m arm matches too.

Every branch is covered, by re-running. To evaluate if befreit: Python needs a yes/no, but a unit stand-in has no value to compare. So the stand-in intercepts the truth test itself (Python’s __bool__) and hands it to a small driver — the path explorer — that decides which way to go, re-running the body and steering the open branches differently each time (a depth-first walk of the decision tree, in the style of concolic execution) until every reachable path is driven. The number of runs is the number of reachable paths, not 2^n: when befreit is true the body returns before the income test, so that comparison never even becomes a question. Each run’s result is checked on its own, so a unit slip on a single arm — say, returning a yearly figure where _m was declared — is caught even though the other arms are clean. A return 0.0 arm yields a dimensionless result and falls back to the declaration, so the ubiquitous if befreit: return 0.0 guard never raises a false alarm.

What the dry-run catches:

  • a body whose inferred unit disagrees with its declaration, on any reachable branch — a stock times a per-year rate labelled as a stock, or a _m flow returned where _y is declared;

  • an addition or subtraction of two non-equivalent quantities — a monthly flow plus a yearly one (betrag_m + freibetrag_y), or a stock plus a flow. At run time the assembled DAG computes on bare arrays with no pint, so such a combination is unit-blind and silently wrong; the dry-run rejects it rather than letting pint’s build-time auto-conversion of same-dimension operands paper over it;

  • an ordering comparison (<, <=, >, >=) of two non-equivalent quantities;

  • a missing unit, and malformed declarations: a flow token without a period, a currency-agnostic token on a parameter, disagreeing period sources, or a boolean node carrying a concrete unit.

What it cannot catch:

  • wrong magnitudes — a coefficient, rate, or constant with the correct unit but the wrong value (a 2.5% rate written as 0.25); units are not values;

  • a result that comes out dimensionless — a body inferring a dimensionless value (an early return 0.0, or arithmetic that cancels) falls back to the declaration;

  • equality comparisons== and != are not unit-screened, unlike ordering.

A body the dry-run cannot evaluate must opt out explicitly. The dry-run executes a scalar body symbolically, so a body it cannot trace must opt out: vectorized functions (vectorization_strategy="not_required", no scalar form for pint to walk), piecewise polynomials and lookup tables (evaluated by table machinery), and bodies calling join or a raw xnp op. Rather than silently trusting such a body, the check rejects it unless the author marks it verify_units=False on the decorator. The opt-out is of body inference only: the declared unit still stands and the body’s edges are still checked. Because the flag is explicit, every un-verified body is greppable — a deliberate choice, and a ready-made worklist should the dry-run later learn to evaluate these operations.

Auto-generated nodes#

Auto-generated nodes receive auto-assigned units: time-conversion variants inherit the source’s base token and read the variant’s period off its own suffix; auto-aggregations derive their token from the source and the aggregation type, paralleling how GEP 4 resolves their types. SUM/MEAN/MIN/MAX preserve the source token; COUNT is a head count and is DIMENSIONLESS; ANY/ALL yield a boolean (a dimensionless quantity) and so auto-assign DIMENSIONLESS (as does a SUM over a boolean column — a head count). A @group_creation_function group id is auto-assigned DIMENSIONLESS (it is an identifier, and the decorator exposes no unit=). Where the source’s token pins down a concrete currency (a parameter), the derived node inherits the agnostic counterpart.

Literals#

The dry-run executes a body on representative Quantitys, so a bare numeric literal combined additively with a unit-carrying value raises (pint refuses to add a dimensionless number to a currency). A literal that is only a multiplicative factor (betrag * 0.5) is fine — multiplying by a dimensionless number preserves the unit.

Most apparent cases dissolve once the quantities are declared correctly: an ordinal such as geburtsmonat (the month 1–12) is DIMENSIONLESS, so geburtsmonat - 1 is dimensionless arithmetic and needs no tag. For a genuine constant of a real dimension, either promote it to a parameter (the norm — it then gets the same provenance, currency conversion, and checking as any other parameter, and the body becomes dry-runnable), or opt the body out with @policy_function(verify_units=False) for genuine code-level constants where a parameter would be artificial (the same body-level opt-out as above).

Implementation#

Delivered as several PRs, with the framework proven on mettsim before any German annotation. The tracking issues are:

  • ttsim #117 — framework core + tracer bullet

  • ttsim #118 — full dimension model

  • ttsim #119 — mandatory units + edge-consistency

  • ttsim #120 — currency knob + Layer-2 boundary

  • ttsim #121 — annotate mettsim, switch check on, CI test

  • gettsim #1191 — register EUR/DM

  • gettsim #1192 — gettsim rollout

Each package’s params schema enumerates its own token vocabulary: the core tokens minus the agnostic currency tokens (the schema governs parameters, which must be concrete) plus the concrete variants of that package’s registered currencies. It also enforces the unit: XOR input_unit:/output_unit: split per parameter type: and admits the per-entry overrides in dated entries. The schema shipped with ttsim (listing mettsim’s CASTAR_*/SILVER_PENNY_* tokens) is the template; the copy at docs/geps/params-schema.json (the validation target for all German parameter YAMLs) is migrated together with the YAML files in #1192, adding the DM_*/EUR_* tokens.

Alternatives#

Runtime pint Quantities flowing through the DAG#

Rejected. Quantity is not a JAX pytree, breaks tracing, contradicts the GEP-9 column vocabulary, and adds hot-path cost. Units in a tax-transfer model are static structural properties of nodes, not of data, so runtime wrapping buys nothing the build-time check does not already provide.

Inference-only (no declared units)#

Rejected in favour of mandatory declarations. Inference alone localises a bug only where dimensions clash downstream; a mandatory declared return unit localises it at the offending function and is self-documenting, at the cost of annotation churn the codebase largely already absorbs for types.

Keep hand-written time conversions; use pint only for checks#

Possible, but the stock/flow duality is exactly what a unit engine encodes for free. Sourcing the factors from pint removes a class of hand-maintained arithmetic without touching the naming.

A [count] dimension for head counts#

Considered, prototyped, and rejected. An earlier draft promoted counting quantities to a custom [count] dimension, making per-person parameters CURRENCY / count and head counts count. The intended payoff was catching a forgotten per-capita scaling. It was dropped because:

  • the protection is weaker than it looks: a single generic [count] cannot distinguish per-child from per-adult from per-household, so scaling by the wrong count still type-checks — only the forgot-entirely case is caught;

  • the annotation tax lands on every per-capita parameter in the system (Regelsätze, Kindergeld, Freibeträge, …), which would read CURRENCY / count where the law and every practitioner say “Euros per month”;

  • SI and pint treat counting quantities as dimensionless; deviating surprises anyone who knows either.

The accepted cost is that a missing per-capita scaling is no longer a unit error. If that bug class accumulates in practice, the closed token vocabulary makes a future amendment with genuinely distinct dimensions ([person], [child], …) a clean retrofit.

Make functions time-agnostic#

Rejected. Collapsing betrag_m and betrag_y into one node would erase the law-to-code correspondence GEP 1 is built on.

Discussion#

(Open. To be resolved on Zulip.) Known points for debate: the strictness of literal tagging; whether per-capita scaling should ever get dedicated dimensions (see the rejected [count] alternative — revisit if missing-scale bugs accumulate); and whether the gettsim rollout should be a single large PR or staged behind a temporary gate.

References and Footnotes#