SYS/03 — ARCHITECTURE RESEARCH
ARIA
Recurrent reasoning architecture with adaptive pondering, grafted onto a frozen transformer backbone.
- TOTAL
- ~146M params
- TRAINABLE
- ~22M
- BACKBONE
- frozen GPT-2
- PRECISION
- bfloat16
ARIA (Adaptive Recurrent Intelligence Architecture) tests a single thesis: whether a recurrent reasoning loop over frozen transformer knowledge can add test-time reasoning depth without retraining the backbone.
ARCHITECTURE
A frozen GPT-2 Small (124M) acts as the knowledge store. A trained Recurrent Reasoning Core (~20M) iterates over it step by step, and a Halting Controller (~2M) decides when to stop thinking. Roughly 146M total parameters, ~22M of them trainable. The design borrows a loose brain-region mapping — neocortex, prefrontal cortex, basal ganglia.
TRAINING & FINDINGS
Trained across multiple Google Colab runs in bfloat16 on ARC, GSM8K, and PIQA. The work surfaced practical lessons on tuning the ponder cost — the penalty that controls how long the loop runs — and on output-head design, both of which materially affect whether adaptive depth helps or hurts.