The Question RN-001 Left Open

RN-001 showed that output summaries are structurally blind to trajectory dynamics. The natural follow-up: if outputs can't see what matters, what instrument can?

The answer isn't "a better metric." It's a different class of measurement — one that operates on the trajectory itself, not on its projections.

But this immediately creates a new problem: what does it mean to "measure" a trajectory?

The Measurement Problem

A trajectory is a sequence of states in high-dimensional space — hidden-state vectors at each layer and token position. It's a curve. The question is: what property of that curve tells you whether the system is healthy?

The obvious answer — just measure how often states repeat — turns out to be deeply misleading. Because recurrence depends on scale.

Consider: is a heartbeat recurrent? At the scale of individual milliseconds — no, each beat is slightly different. At the scale of seconds — yes, there's a clear rhythm. At the scale of hours — you see a different pattern (circadian variation). At the scale of days — something else entirely.

The same trajectory looks completely different depending on which temporal and spatial scale you measure at.

This is not a nuisance. This is the physics.

STR: Spatiotemporal Recurrence

We constructed a diagnostic called STR (Spatiotemporal Recurrence) that makes the scale problem explicit rather than hiding it.

The formula is deceptively simple:

STR = Σ_{i<j} K(‖xᵢ - xⱼ‖) · w(|i-j|)

Two components:

The pair (w, σ) defines the measurement window. Change them, and you're literally measuring different things. This isn't a parameter you tune for performance — it's a statement about what scale of dynamics you care about.

The Key Experiment: Remove w, Lose 10×

The most important experiment here is not the most visually impressive one. It's the ablation.

We compared STR with temporal weighting against STR without it (w = 1 everywhere, meaning all time lags contribute equally):

Condition Max ΔSTR (VdP vs OU)
With temporal weighting w 0.395
Without temporal weighting 0.039
Gain from w 10.2×

Without temporal weighting, a structured dynamical system (Van der Pol oscillator) and pure noise (Ornstein-Uhlenbeck process) become almost indistinguishable. The signal-to-noise ratio collapses by an order of magnitude.

This is not a minor technical improvement. It reveals something fundamental:

Recurrence without scale specification is almost meaningless.

When you measure "does this trajectory revisit previous states?" without specifying at what time scale, you're mixing together three completely different phenomena:

  1. Local density (nearby states that happen to be close, no dynamics involved)
  2. Short-range smoothness (consecutive states being similar — trivial continuity)
  3. Genuine dynamical return (the system revisiting a region after a meaningful excursion)

Standard recurrence measures (RQA recurrence rate, etc.) conflate all three. STR's temporal weighting separates them. That separation is worth 10× in discriminability.

Scale Selectivity in Action

When we sweep the temporal window across a Van der Pol oscillator with known period T ≈ 64 steps:

This means STR doesn't just detect recurrence — it detects recurrence at a specific scale. And different scales reveal different dynamics.

When we abruptly switch the oscillator's parameters mid-trajectory (μ = 0.5 → μ = 2.5), rolling STR tracks the regime transition in approximately 10 time units. The system's dynamical "fingerprint" changes, and STR catches it — because the new regime has a different natural period, and the scale-matched window no longer aligns.

Why This Matters for Language Models

In RN-001, we observed that hallucinated trajectories show "dynamical rigidity" — they lock into attractors. But we didn't explain how we detected this.

The answer is STR. And the scale-selectivity matters enormously for language models because:

  1. Short-range recurrence is trivial. Adjacent tokens always have similar hidden states (Lipschitz continuity of transformers). Any recurrence measure that doesn't filter out short-range proximity will be dominated by this trivial signal.
  2. Long-range recurrence is ambiguous. Over very long horizons, hidden states can appear close simply because the representation space is bounded. This is density, not dynamics.
  3. The interesting signal lives in between. The temporal scale at which a model "should" return to previous reasoning structures — that's where you see the difference between stable reasoning and rigid hallucination.

STR's temporal weighting w(Δt) acts as a bandpass filter on this scale axis. It suppresses the trivial (short-range) and the meaningless (long-range accumulation), and isolates the dynamical (mid-range structured return).

Connection to the Larger Picture

RN-001 established: output space is too small. We need to observe trajectories.

RN-002 establishes: trajectory observation requires scale specification. Without it, you're measuring noise.

The next question becomes: is trajectory structure merely a passive diagnostic — something we observe after the fact? Or is it a causal degree of freedom — something we can intervene on to change model behavior?

That's a much harder question. And the answer turns out to be yes. But that's RN-003.

Open Questions

  1. What is the "natural period" of a language model's internal dynamics? Oscillators have well-defined periods. Do transformer hidden states? If so, what determines them — architecture, training, or the input itself?
  2. Can σ and τ be learned? We currently set the spatial bandwidth σ by median heuristic and sweep τ manually. Could a system learn its own measurement scales — essentially, learn how to observe itself?
  3. What happens in the chaotic regime? STR assumes quasi-periodic recurrence (Proposition 2). For chaotic systems — and possibly for certain model behaviors — recurrence is aperiodic with no characteristic scale.
  4. Is there a minimum observation length? Theoretical bounds establish that some structural properties require T = Θ(α/ω²) observation time to identify. What does this mean for real-time trajectory monitoring of agents?

Key takeaway: Measuring a trajectory is not the same as measuring an output. It requires specifying what scale of dynamics you care about. This isn't a design choice — it's a physical constraint. Without scale selection, you cannot distinguish signal from noise, structure from density, or dynamics from artifacts. The instrument matters as much as the object.

This note draws on results from "Topological Recurrence as a Scale-Selective Diagnostic for Dynamical Trajectories" (Haelio Tang, 2026). Full mathematical framework, ablation experiments, and sensitivity analysis available on request.