The Greedy Trap of ReAct
The prevailing paradigm for AI Agents is ReAct (Reason + Act). Developers attempt to teach models how to think via prompting (e.g., "Think step by step"). This is like trying to teach water how to flow downward via verbal instructions—it ignores the underlying dynamics.
Current LLM agents suffer from a fatal flaw: they are greedy. When given a goal ("Fix the bug in auth.py"), they default to executing actions that appear to directly solve the task. They write code immediately instead of running tests, reading logs, or gathering information.
At the micro-scale (RN-004), this is known as Trajectory Collapse. At the macro-scale, this is Premature Exploitation. It is the fundamental reason why frontier models hit a performance ceiling on complex engineering tasks like SWE-bench: they lack a mechanism for directed exploration.
Good engineers gather information before acting. Not because they were prompted to "explore first," but because their cognitive architecture follows deeper thermodynamic laws.
Information-Directed Engine
We discarded prompt-based heuristics and introduced an Information-Directed Engine with Entropy-Gated Control. The system no longer "blindly chooses the next action"; it "selects the action that maximizes overall objective utility."
The core control equation perfectly unifies exploration and execution:
Objective = Task Utility + Information Gain
- Task Utility: How much does this action advance my goal? (e.g., writing code, submitting a fix)
- Information Gain: How much does this action reduce the state entropy/uncertainty? (e.g., reading files, running tests, global searching)
When the system faces an unfamiliar codebase (high uncertainty), the weight of the
Information Gain dominates. The engine forces the agent to suspend execution and
engage in "Epistemic Foraging."
As information is gathered, the total system entropy drops. When uncertainty drops below a critical threshold (Entropy-Gating), the maximization objective naturally shifts toward task utility, and the Agent automatically transitions into "execution" mode. No fragile if-else rules are needed. The timing of intelligence emerges from information-theoretic variables.
The Decisive Experiment: SWE-bench Lite
We validated this engine on the most challenging code-repair benchmark available (SWE-bench Lite). The results prove that Entropy-Gated Control shatters the capability ceiling of traditional agents:
| Base Model | Cognitive Architecture | Resolve Rate (Pass@1) | Relative Gain |
|---|---|---|---|
| Claude Sonnet 4.6 | Entropy-Gated (Ours) | 36.6% | +22% (vs ReAct) |
| Claude Sonnet 4.6 | ReAct | 30.0% | |
| Claude Sonnet 4.6 | Greedy | 30.0% | |
| Gemini 3.1 Flash Lite | Entropy-Gated (Ours) | 26.6% | +166% (vs ReAct) |
| Gemini 3.1 Flash Lite | Greedy | 23.3% | |
| Gemini 3.1 Flash Lite | ReAct | 10.0% |
Conclusion: The intelligence ceiling of an agent is not merely determined by the parameter count of its base model, but by its Epistemic Architecture. Information-directed mechanics mathematically enforce the exploration phase, completely avoiding the "greedy trap" that plagues the ReAct paradigm.
The Ultimate Unification: Cognition = Control
With this, the theoretical framework of the Trajectory Observatory comes full circle.
In RN-004, we balanced expansion (exploration operators) and contraction (stabilization operators) at the neuronal scale (residual stream) using DOM. In RN-005, we balance expansion (epistemic exploration) and contraction (pragmatic exploitation) at the macro-behavioral scale (tool usage) using the Information-Directed Engine.
These are fractal manifestations of the exact same physical phenomenon across different scales. This leads to the final proposition of Unified Cognitive Control Theory (U-CCT):
Cognition is not a property of models — it is a property of controlled dynamical systems.
The key to solving runaway AI is not adding more parameters, but introducing the right physics engine.
Key takeaway: The intelligence ceiling of an AI Agent is dictated by its Epistemic Architecture. By abandoning static prompts in favor of an Information-Directed Engine, we can mathematically enforce when an agent explores and when it acts, shattering the performance ceilings on complex tasks like SWE-bench.
This note draws on empirical results from "Information-Directed Control for LLM Agents: Epistemic Foraging via Entropy-Gating" (Haelio Tang, 2026).