Observing trajectories of intelligence.

Trajectory Observatory

Intelligence may be a trajectory problem. Researching the dynamics of cognition, active inference, and open futures.

RN-001 · v0.3
Same Output, Different Trajectory
Matched-output experiment across three architectures (GPT-2, Llama-3.2-3B, Gemma-4-E2B). Truthful and hallucinated generations with identical output statistics show systematically divergent hidden-state trajectories. Decoder-only models exhibit dynamical rigidity; multimodal architectures exhibit trajectory fragmentation. The signal reverses across architectures — hallucination is not one failure mode, it's at least two.
Experiment Observability Gap Signal Reversal
RN-002 · v0.1
Recurrence Is Scale-Dependent
STR (Spatiotemporal Recurrence) is not a single number — it's a function of the measurement window. Different temporal scales reveal different dynamics, even within the same model on the same data. Defines the scale-selective diagnostic framework that makes trajectory-level observability precise.
Methodology Scale Selectivity STR
RN-003 · v0.1
Trajectory Geometry Is a Causal Degree of Freedom
STR regularization induces depth-wide geometric reorganization, with peak plasticity in intermediate layers — not at the output boundary. Bidirectional control confirmed: pushing STR up increases coherence, pushing down induces decoherence. Trajectory structure is not a passive consequence of training. It's steerable.
Experiment Causal Intervention Trajectory Steering
RN-004 · v0.1
Inference Is a Controlled Dynamical System
LLMs do not lack a "System-2" brain; they lack a gyroscope. By injecting a real-time PD controller (Dynamic Operator Mixing) into the residual stream, we achieve closed-loop inference. It stabilizes long-horizon trajectories with near-zero friction, while proving that dynamical stability and semantic intent are formally decoupled.
Theory Closed-Loop Inference System-2
RN-005 · v0.1
Agents Don't Need Prompts, They Need Information-Directed Control
The intelligence ceiling of an agent is dictated by its Epistemic Architecture, not just parameter count. By replacing ReAct prompting with an Information-Directed Engine, we mathematically enforce when an agent explores (Epistemic Foraging) and when it exploits, smashing performance ceilings on SWE-bench.
Theory Information-Directed Engine Entropy-Gated
RN-006 · v0.1
The Scale-Dependence of Cognitive Control and Topological Circuit Breakers
Micro-managing LLM trajectories (e.g., token-level PRMs) is doomed by observational uncertainty. By shifting control to the macroscopic scale, we introduce the Topological Circuit Breaker—a Cognitive Lyapunov Function based on rolling-window STR that detects hallucinatory loops in ~10 steps and physically severs divergent generation.
Theory Topological Circuit Breaker Domain of Validity
RN-007 · v0.1
When to Trust Your Diagnostic: The Domain of Validity for Trajectory Recurrence
Not all recurrence is meaningful. We identify two classes of diagnostic failure—correctable measurement conditions (Type I) and intrinsic structural absence (Type II)—and introduce Structured Connectivity Coherence (SCC), the property that separates genuine recurrence from artifacts. Stability ≠ Validity.
Epistemology SCC Type I / Type II
RN-008 · v0.1
Some Things Are Forever Unknowable: The Information-Theoretic Limits of Trajectory Recurrence Identification
Using Girsanov's Theorem and Le Cam's Inequality, we establish the absolute limits of trajectory recurrence identification, proving a quadratic scaling law for the required observation time. Passive safety has an event horizon.
Information Theory Recurrence Identification Le Cam's Inequality
RN-009 · v0.1
Uncertainty Can Only Be Redistributed: The Intervention Uncertainty Law
We prove that active intervention under epistemic uncertainty incurs a fundamental tradeoff. No control policy can simultaneously trade off missed rescues, over-interventions, and energy. All systems collapse to a parameter-free universal curve.
Active Inference Intervention Law Universal Collapse
X-007
Click to expand ↓
Not All "Recurrence" Is Trustworthy — Where Does Your Diagnostic Fail?
Your STR diagnostic can produce maximally stable readings while stably measuring something that does not exist. We introduce Structured Connectivity Coherence (SCC) — the legitimacy certificate for the entire STR framework.

From RN-001 through RN-006, we've showcased a series of remarkable capabilities of Soft Topological Return (STR): detecting hallucination trajectory divergence, driving closed-loop inference control, and tripping a topological circuit breaker within ~10 steps.

But there's a serious question we've never directly addressed: how do you know the STR values aren't deceiving you?

We uncovered the most insidious trap: a set of points drawn i.i.d. from a Gaussian mixture, arranged in arbitrary order — with zero dynamics — produces maximally stable high STR measurements (convergence CV ≈ 0). If you rely solely on "measurement stability" to judge signal trustworthiness, you will be perfectly deceived.

We rigorously classify diagnostic failures into two types: Type I (measurement-correctable) and Type II (structurally intrinsic). Type I means you misconfigured the instrument — adjust the parameters and it works. Type II means the underlying system has no recurrence structure whatsoever — no amount of tuning will extract signal from a phantom.

Chaotic systems (e.g., the Lorenz attractor) constitute an independent boundary case: structure exists, but is incompatible with fixed-scale measurement.

From this analysis, a structural property naturally emerges: when the system's trajectory maintains a persistent connected component under its own dynamics — we term this Structured Connectivity Coherence (SCC) — recurrence measurements become trustworthy.

Core conclusion: Stability ≠ Validity. Your instrument can produce maximally stable readings while stably measuring something that does not exist.

Dive into the Type I/II classification and SCC validity conditions: see RN-007 on the Trajectory Observatory.

X-006
Click to expand ↓
Stop Micro-Managing LLMs. You Need a "Topological Circuit Breaker"
Mainstream AI alignment methods are trapped in a physical dead end: trying to "micro-manage" intelligence at the token level. By monitoring the topology of the current instead of the water droplets, we can instantly trip a circuit breaker the moment a hallucination begins.

Mainstream AI alignment methods (like Process Reward Models or token-level Chain-of-Thought monitoring) are trapped in a physical dead end: trying to "micro-manage" intelligence at the token level. This is as computationally doomed as trying to predict ocean currents by tracking the Brownian motion of individual water molecules. It faces extreme observational uncertainty.

We proved this with a "failed" experiment. In early tests, we attempted to deploy the "Information-Directed Engine" to highly microscopic tactical search tasks (Formal Theorem Proving). The result: Complete failure. At the micro-scale of single-step deduction, the signal-to-noise ratio is abysmal, and the reasoning trajectory is dominated by Markovian noise. Forcing information-theoretic interventions at this scale actually disrupted the model's highly efficient greedy pattern matching.

Yet, why did the exact same engine achieve SOTA results on brutally complex codebase repairs (SWE-bench, RN-005)? The answer: Cognitive control is strictly bound by a "Scale-Dependence" and a "Domain of Validity".

Microscopic, short-range predictions (writing a line of code, proposing a math tactic) must be delegated to the greedy autoregressive generation of the LLM. However, macroscopic, long-range decisions (when to forage for information, when to break an infinite loop) must be governed by an external dynamical control engine.

If you can't micro-manage, how do you prevent hallucinatory collapse? Today, we introduce a fundamentally new runtime safeguard: The Topological Circuit Breaker.

We don't interfere with the "water droplets"; instead, we monitor the topological geometry of the "current". We run a Cognitive Lyapunov Function in the background, based on a rolling-window Soft Topological Return (STR). The moment an LLM's thought trajectory gets trapped in a "limit cycle (repetition loop)" or "diverges (hallucination)", the derivative of this function exhibits a sharp level shift within just ~10 time steps.

Fascinatingly, our ablation studies prove that by simply applying a temporal window (Scale-Selectivity), this topological monitoring yields a 10x boost in separating genuine reasoning from noisy hallucinations.

Instead of waiting for the model to finish generating paragraphs of nonsense, the system instantly trips the "circuit breaker", physically severing the generation stream. It then suspends the Agent, forcing it back into "Epistemic Foraging" to gather new evidence and reshape its state space.

Intelligence is not merely a product of massive parameter counts; it is an emergent property of controlled dynamical systems operating at the correct scale.

Dive deep into the Intervention Uncertainty Law and rolling-window STR regime tracking: see RN-006 on the Trajectory Observatory.

X-005
Click to expand ↓
Agents Don't Need Prompts, They Need Physics
The ReAct paradigm is a greedy trap. Cognitive control emerges naturally when you optimize for task utility combined with information gain, forcing the agent to forage when uncertainty is high.

Everyone building AI Agents is making the same mistake: trying to teach a model how to think using prompts (like "Think step by step"). This is as foolish as trying to teach water to flow downward using verbal instructions. You don't teach it; you give it a gravitational field.

The biggest failure of LLM agents is "Premature Exploitation". When they see a bug, they greedily jump straight into writing code, rather than running tests, reading logs, or foraging for information. This macro-level greed is the exact same physical phenomenon as micro-level trajectory collapse (hallucination).

Today, we are completely obsoleting the ReAct paradigm by introducing an Information-Directed Engine and Entropy-Gated Control into the Agent.

The Agent no longer blindly follows a prompt; it maximizes an information-theoretic equation:
Objective = Task Utility + Information Gain

  • When state entropy (uncertainty) is sky-high, the engine "forces" the Agent to suspend execution and engage in "Epistemic Foraging" (exploration).
  • As information accumulates, system entropy drops. When uncertainty falls below a critical threshold (Entropy-Gating), the Agent automatically switches to "Execution" mode.

The timing of "when to look" and "when to act" emerges naturally from information theory. We ran a decisive test on the brutal SWE-bench Lite:

Equipped with the Information-Directed Engine, Claude Sonnet 4.6 achieved a stunning 36.6% Pass@1 (a 22% relative gain over standard ReAct). For the smaller Gemini 3.1 Flash Lite, it skyrocketed by 166%! Interestingly, its naked "Greedy" baseline beat ReAct by 2x, further exposing the limitations of static prompting paradigms in complex tasks.

The intelligence ceiling of an agent is not dictated by the parameter count of the base model, but by its "Epistemic Architecture".

From micro residual stream control (RN-004) to macro agentic entropy-gated control (RN-005), we are proving one unified truth: Cognition is not a property of models — it is a property of controlled dynamical systems. See RN-005.

X-000
Click to expand ↓
Intelligence as a Controlled Dynamical System
Intelligence is not just a parameter scaling problem. It's a physics problem. We are observing the trajectories of cognition to map out the thermodynamic laws governing reasoning and active inference.

In 2026, something strange happened in the AI world. Anthropic built Claude Mythos, and then held it back. No flashy product launch as a new flagship, no enterprise API tier. Instead, it was deployed as an internal capability profile to dramatically extend Claude's performance over long time horizons. Tasks that used to collapse at 3 hours suddenly held for 10, then a full day.

Everyone was asking: how much smarter is it? But almost no one asked: why would a company that built something this powerful choose not to release it?

Here's a possibility that keeps me up at night: maybe it didn't change the system's single-step ceiling, but its long-range "trajectory". When you enter territory where the control layer hasn't caught up, shipping raw capability without the architecture to contain it isn't innovation — it's detonation.

We've been watching the same pattern across every major agent framework. Give a frontier model a 20-second task: brilliant. Give it a 3-hour task: it drifts, planning dissolves, context contaminates itself, hallucinations compound, and the search space collapses into local loops. Intelligence kept scaling. Stability? It didn't.

Think of it this way: we kept upgrading the engine — from 100 horsepower to 1,000. But we forgot to upgrade the brakes, the steering, the suspension. Then we wonder: why does the car flip at high speed? Memory, planning, tool calling, orchestration logic — the existing scaffolds were all designed for weaker models. Models that needed help, models that could be corralled.

  • Before: model capability < scaffold capacity.
  • Now: model capability > scaffold capacity.

The smarter they get, the harder they are to control. Not because intelligence is failing, but because the "control layer" is failing. We are used to measuring intelligence as a static property: benchmarks, accuracy, pass rates. But what if the kind of intelligence that matters for autonomous agents isn't a property at all, but trajectory stability?

Then the most urgent question isn't: can the model produce a more correct next token. It's: can the model maintain a coherent path through a long, uncertain, branching problem space — without degrading.

If this is right, the next scaling law may not be model scaling. It may be control scaling. Not bigger models. But better Trajectory orchestration, Uncertainty gating, State purification, and Phase control.

We've been optimizing the atom. Maybe the real object we need to see is the orbit.

v0.3 · Core Framework
Trajectory Physics of Intelligence
Intelligence reframed as a trajectory problem rather than a parameter problem. A system's capability is bounded by the topological stability of its generative manifold. Recurrence structure carries non-redundant information invisible to output-level metrics.
v0.2 · Control Theory
Information-Directed Control & Phase Transitions
Control is not about predicting the next token, but steering the entire trajectory. Thermodynamic principles applied to trajectory-level autonomy scaling.
v0.1 · Concepts
Future Thickness & Question Ecology
Advanced systems protect open futures rather than minimizing error. The Markov blanket of a civilization is not isolation, but a high-entropy generation zone.
Can trajectory stability explain autonomy scaling?
Does recurrence predict long-horizon collapse?
What is the Markov blanket of a civilization?
Research Queue & Open Responses
Add a perturbation. Challenge this trajectory. What blind spot am I missing?
Please format your responses using one of the following prefixes to enter the Pending Research Queue:
  • Observation: What unexpected dynamics did you see?
  • Challenge: Could this be a memory scaling artifact instead of trajectory control?
  • Related idea: What stranger adjacency does this connect to?
  • Open question: What new door does this observation open?
Submit Perturbation
wutai@haelio.cc