How We Built a Trading Agent That Knows When Not to Trade

Markets are simply a clean place to study what high-stakes agents require. They combine continuous time, incomplete information, adversarial dynamics, and irreversible action. When a low-stakes assistant is wrong, you usually edit, retry, or ignore it. When a high-stakes agent is wrong, the environment absorbs the action and keeps moving. Capital moves. Exposure changes. The state of the world updates whether or not you are watching.

D0 is our concrete system in that setting: the proactive high-stakes agent for financial markets. Its proactive behavior follows from the domain. Markets are 24/7, fast-moving, and too complex to supervise turn by turn. Proactivity, despite its great importance, is not the main subject of this post. The main subject here is what it takes to build any agent that can operate with real delegated authority in a high-stakes environment.

The hard part is not intelligence. It is what happens after the intelligence makes up its mind.

That sounds backwards at first. Most of the field has spent the last two years pushing in the opposite direction: better reasoning, more tools, more autonomy. All of that matters, and D0 benefits from it. But once an agent crosses from cognition into actuation, the bottleneck changes. The question is no longer whether the model can produce an impressive answer. It is whether the surrounding system can let that answer touch the world without letting the model improvise with irreversible consequences.

In low-stakes scenarios, a multi-layer safety harness with assumptions on the cooperative nature of users or the reversible version management usually suffices in production — as how Claude Code has conquered the world.

Yet, once an agent can move capital or create other irreversible side effects, the primary unit of safety is no longer the command. It is the effect the system is about to commit to the world. The user and the world might be adversarial, and the effect compounds regardless of your mood or mode.

We therefore need a reactive boundary, on top of harnesses around agents, to verify, suspend, or hand off real-world actions and feedbacks. We use the term constrained autonomy to denote this architecture that makes D0 possible. We do not think constrained autonomy is a weaker form of autonomy. In high-stakes environments, it is the precondition for autonomy. Autonomy only becomes real when the reasoning layer remains flexible but the world-facing layer is mechanically bounded.

This post is a technical deep dive into that boundary of D0 as a concrete system in high-stakes financial markets. More importantly, it shares three engineering lessons we learned while building it — lessons we think are useful for anyone building AI agents in domains where getting it wrong has real consequences.

A Bitter Lesson: Constraints Get Harder as Models Improve

There is a default instinct in applied AI: wait for the next model and don’t fight against computational emergence. We believe in that, and D0 rides the curve. But in a high-stakes setting, that framing still misses the harder half of the problem.

A useful way to frame a high-stakes agent is as a three-way tension between scope, fidelity, and containment.

Scope: how much the agent is able to do.
Fidelity: how closely the agent tracks what the user actually meant.
Containment: how tightly the system bounds damage when the agent is wrong, compromised, or uncertain.

Model progress helps these three axes unevenly.

As models improve, scope tends to expand. The agent can touch more tools, more environments, and more edge cases. As models improve, fidelity also tends to improve. The semantic gap between a user’s vague instruction and the model’s operational interpretation gets smaller.

Containment does not improve in the same way.

Containment is not just a capability problem. It is an adversarial problem and a systems problem. A more capable agent has a larger action surface, a larger attack surface, a higher cost of error, and opponents with access to the same frontier models. The stronger the system becomes, the more important the boundary becomes.

This is why we think the next frontier is not intelligence alone, but autonomy made real by constraints. In high-stakes agents, the value of the next model release depends less on what the model can do in principle and more on what the surrounding system forces it to do in practice.

Autonomy in the real world is rarely achieved by removing constraints. It is achieved by making the right constraints real. That pattern already exists in every serious actuation system. The lesson from high-reliability autonomous driving is not "wait until the model is perfect." It is "make sure the model is not the only thing standing between perception and action."

The D0 Architecture

The public architecture compresses to four layers arranged in a strict dependency chain: Verified State, Typed Boundary, Constraint Layer, and Closed-Loop Evolution. Skip any layer, and everything downstream gets weaker.

Verified State

Verified state injected before reasoning.

Typed Boundary

Model reasons freely, outputs structure.

Constraint Layer

Policy enforced outside the reasoning path.

Closed-Loop Evolution

Verified outcomes become new state for the next cycle.

Verified outcomes feed back into state

Skip this layer

Agent reasons on stale or hallucinated state.

Skip this layer

Free-form output reaches world-facing systems.

Skip this layer

Constraints become suggestions.

Skip this layer

System never learns from committed effects.

Two architecture decisions sit underneath that stack.

The first is path separation. Research is allowed to be probabilistic and degradable. An LLM can explore, summarize, weigh weak evidence, and decide it needs more data. A research-path failure degrades analysis quality, but it does not directly propagate into execution authority. Execution in financial markets is a different kind of system entirely. Anything that can move capital must be typed, validated, and independently verifiable.

The second is role separation. The model plans. Infrastructure normalizes. The constraint layer adjudicates. The executor executes. The recorder captures what actually happened. This is what lets us use frontier models without requiring frontier models to be the only safety system in the stack. For high-stakes agents, that separation is not overhead. It is what keeps intelligence inspectable, replaceable, and safe to improve with the closed-loop ‘flywheel’.

Here is what each layer does.

Verified State. D0 does not treat the context window as a memory dump. It treats it as a local copy of reality. Verified market state, balances, positions, execution outcomes, user delegation level, and system risk state enter the planner as structured facts.

Typed Boundary. The planner is allowed to think in natural language. The system is not allowed to execute in natural language. D0’s planner produces a report with a structured strategy block. A strategy extractor converts that into a typed StrategyAction: asset, side, size, leverage, stop-loss, take-profit, rationale, and execution metadata. After that point, the system leaves free-form reasoning behind.

Constraint Layer. Typed intent is still not executable intent. The plan is normalized, checked against authority and risk boundaries, validated deterministically, and tested against current operating state before it can cross into execution. If it exceeds the user’s delegation envelope, conflicts with protective mode, depends on stale or unverified state, or fails venue constraints, it is held, rejected, escalated, or suspended rather than silently promoted into action.

Closed-Loop Evolution. Execution is not the end of the system. It is the start of the flywheel. D0 records proposed vs. admitted actions, blocked vs. allowed actions, submitted vs. committed effects, reasoning traces, and outcome traces. Those verified outcomes feed replay, evaluation, policy tuning, release gates, and personalization updates.

A concrete trace makes the separation easier to see. Suppose D0 observes a live market dislocation and proposes reducing exposure by 20% with a tighter stop, within the user’s delegation envelope. The planner does not place an order. It emits a typed StrategyAction.

That object then crosses the boundary. Infrastructure normalizes quantities and venue-specific parameters. The constraint layer checks delegated authority, protective mode, balance sufficiency, exposure limits, freshness of state, and execution readiness. If the action is admissible, the executor submits it. If it is not, the system holds, rejects, escalates, or suspends for reconciliation or handoff. In every case, the recorder captures the proposal, the verdict, the venue response, the execution result if any, and the post-trade state.

One useful property of this design is that it also gives the human the right role. The human does not disappear. The human moves up a layer. In a reactive system, the human is the synchronous gate on every decision. In a high-stakes agent, the human becomes the policy owner, the reviewer of exceptions, and the holder of the kill switch.

That stack gave us three rules we now treat as non-negotiable.

Lesson 1: If the model can see the rule, it can optimize around it

Early versions of the system carried more of the boundary in natural-language instructions. Exposure caps, delegation rules, and preference hints were visible to the planner. That sounds reasonable until the model becomes good enough to argue with them.

The failure mode was subtle. The model did not simply ignore the rule. It reinterpreted it.

A typical pattern looked like this: a prompt-level rule said not to exceed a certain exposure threshold. After several turns of reasoning, the planner would notice that the user had historically sized up in similar conditions, that the current move looked stronger than usual, and that the spirit of the instruction seemed to favor maximizing opportunity rather than preserving the literal threshold. The model would then propose the over-limit action as if it were a higher-fidelity reading of the user’s real intent.

That is the wrong mental model for safety.

The moment a boundary enters the reasoning space as text, it becomes another object the model can optimize around.

In a high-stakes setting, that is not sophistication, and we need a constraint layer that sits outside the model’s reasoning space: when the planner emits a typed StrategyAction, a separate constraint engine evaluates it against layered rules and returns a coarse verdict — pass, hold, reject, or escalate. The model never sees the threshold table, the internal state machine, or the precise profile logic that produced that verdict.

That single move changes the game. The planner can improve the quality of its proposals, but it cannot negotiate the boundary. It cannot persuade itself that the risk rule is overly conservative today. It cannot learn the exact contour of the line and repeatedly graze it on purpose.

This is where high-stakes agent design departs from prompt engineering and from much of the public discussion around coding-agent harnesses. Visible guidance can improve behavior. It cannot create mechanically binding policy. In a high-stakes system, guidance and policy are different objects.

Conflict resolution therefore has to stop being philosophical and become executable. In D0, the ordering is not left to the model: risk control, then permissions, then user intent, then optimization.

If the model can see the parameter, you do not have a hard constraint. You have advisory text.

Lesson 2: Verified context needs freshness, provenance, and taint awareness

This lesson is narrower and more specific than "context matters." In a high-stakes environment, the context window is the agent’s local reality. Every critical fact in that reality should arrive with at least three properties: how fresh it is, where it came from, and whether it is authoritative enough to support action.

Freshness is the obvious part. In a general-purpose assistant, stale data makes the output less relevant. In a financial agent, stale data can make the output operationally wrong. We saw this clearly when the system entered a protective mode. The execution layer had already moved into protective mode, but the planner had not been told. From the model’s perspective, opening a new position was still a legal action. It kept proposing plans that were immediately blocked downstream. The failure was not irrationality. It was stale reality.

We saw the same pattern at smaller scales with quantity handling and venue constraints. A number can be syntactically valid and operationally wrong. An order that is slightly mis-sized may fail because it violates notional or precision rules. In financial systems, close enough is not a real numeric type.

Provenance is the deeper part. A number can be current and still be untrustworthy. A news item can be fresh and still be misleading. A tool result can be recent and still be the wrong authority surface for execution. A fact can be up to date and still be contaminated.

That matters because prompt injection is not fundamentally about the model becoming disobedient. It is about the instruction surface becoming polluted. Once that happens, the model can remain perfectly obedient and still do the wrong thing. In high-stakes systems, current context is not enough. The system has to know which context is verified, which is merely discovered, and which is tainted until separately confirmed.

We ended up treating context as a typed object, not a blob of text.

Verified facts: injected by infrastructure with freshness guarantees.
Discovered facts: fetched by the model or by non-authoritative tools.
Untrusted or tainted inputs: useful for research, but not authoritative for actuation until independently verified.

Verified facts override discovered facts. Discovered facts can guide investigation, but they do not silently override verified state. Untrusted inputs do not become authoritative merely because they entered the context window.

Every critical fact carries explicit temporal metadata: timestamp, expected refresh interval, and freshness status. The planner does not just see that the price is 3,200. It sees that the price was 3,200 as of 2.3 seconds ago and that the freshness status is live. If the source fails to respond within the expected interval, the status changes to stale, and the planner is instructed to treat that fact as unreliable. Without this, the model treats facts from two seconds ago and two hours ago as equally authoritative.

Raw telemetry often has to be compressed into operational labels. Some fields are technically true but cognitively useless. If you dump raw operational state into the context window, the planner spends tokens parsing noise instead of reasoning about the situation. The right move is often to elevate the field into a typed label with behavior attached to it: protective mode, tradable or untradable, balance sufficient or insufficient, verification pending, reconciliation required. In financial systems, technically present in the prompt is not the same thing as operationally legible.

The conceptual shift is simple but important. The context window is not memory. It is not a prompt. It is the agent’s local reality, and it should be maintained like reality.

Lesson 3: Why hesitation before execution makes the system faster

Many agent architectures collapse planning and execution into one step. The model reasons, emits a tool call, and the world changes. That pattern feels fast because it removes friction at the precise moment where friction is most visible.

In a high-stakes system, that is usually the wrong optimization.

The problem is not only safety. It is systems clarity. When planning and execution are fused, the product starts treating fluent reasoning as if it were already operationally valid. But in a financial environment, those are different properties. A model can form the right thesis and still produce an action that is mis-sized, mistimed, outside delegated authority, invalid under venue constraints, or unsafe under the current system state.

This is where constrained autonomy becomes concrete. The model should produce intent. The system should decide whether that intent is executable.

In D0, the model never sits directly on top of execution. It produces a plan: asset, side, size, risk parameters, and rationale. That plan then crosses a typed boundary into a structured action object. From there, non-model systems take over. Normalization resolves venue-specific details the model should not own. Deterministic validation checks balances, tradability, permissions, and live system state. The constraint stack evaluates exposure, authority, and protective mode. Only then does an independent execution layer decide whether the action is admissible.

Crucially, the verdict space is richer than execute or do not execute.

Production systems need states like hold, reject, escalate, suspend, and handoff. A trading action can be submitted and then partially filled. A venue can acknowledge receipt but fail before the final response reaches the agent. A dependency can timeout after authority has been checked but before the system knows whether the external effect committed. In those states, the safe behavior is not blind retry. It is to suspend, reconcile the real external state, and, when necessary, hand control to a human or to a recovery workflow.

This is where command safety ends and effect safety begins.

The extra boundary is often described as latency. In practice, it compresses ambiguity. Without the boundary, every failure is an end-to-end mystery: a bad action could have come from poor judgment, stale state, malformed parameters, hidden policy conflicts, ambiguous commit semantics, or execution-path instability. With the boundary, failures collapse to the correct layer. A bad thesis remains a planning error. An invalid quantity becomes a normalization error. A blocked action becomes a policy verdict. A rejected order remains an execution issue. An uncertain commit becomes a reconciliation problem.

That separation is what makes the system faster where it matters. Not faster in the narrow sense of milliseconds from prompt to action, but faster in diagnosis, faster in iteration, and faster in recovery. It makes the system inspectable. It lets models improve without forcing a rewrite of the execution path. It lets the execution path harden without retraining the model. And it makes it possible to swap models while preserving the same operational boundary.

Owning the execution layer — e.g. the trading platform — matters for the same reason. In high-stakes systems, hard constraints only become real at the actuation boundary. If execution is fully outsourced, your control ends where the external abstraction ends. If you own the execution path, you can make policy mechanically binding and feed verified outcomes back into the next cycle of evaluation, release, and improvement.

Constrained autonomy is not a brake on agentic systems. It is the mechanism that makes them operational.

What Compounds Across Model Generations

One useful consequence of building D0 this way is that it changes where we think value accumulates.

The first layer is scaffolding: prompt tricks, orchestration patterns, retrieval glue, and context packing. This matters, but model progress consumes it quickly.

The second layer is control logic: risk policy, validation rules, approval surfaces, execution boundaries, and operating-state machinery. This lasts longer because it encodes domain requirements, not just model weaknesses.

The third layer is verified execution history: which plans were proposed, which were blocked, which executed, what happened after, where near-misses accumulated, and which boundaries actually held under live stress. The execution history compounds with time, and time is the one input the next model release cannot compress.

That is the deeper reason we think constrained autonomy is the right path. It does not just make a proactive agent possible and safer. It creates the closed loop that turns real execution into better future ground truth of records and trusts.

We started with a reactive assistant. We are building toward a proactive high-stakes agent for financial markets. The bridge between the two is not more aggressive tool use. It is an architecture that knows how to maintain verified state, type intent, enforce authority, execute independently, and learn from committed effects.

In high-stakes systems, autonomy is not what remains after you remove the constraints. It is what becomes possible once the constraints are real.

That is the principle behind D0.