Source Reading 009 · source-level design · June 2026Source Reading 009 · 源码级设计 · 2026-06

Goal Mode Is a Thread State MachineGoal Mode 是线程状态机

A deep read of Codex Goal mode: why it is not just a prompt, how it persists objective state, how the model is allowed to touch it, and how the runtime keeps working without letting the model own pause, budget, or usage-limit semantics.

这是一篇 Codex Goal mode 的深度源码阅读: 它为什么不是简单把目标塞进 prompt, 它怎样持久化 objective, 模型能触碰哪些部分, 以及 runtime 怎样自动续跑但不把暂停、 预算、 usage limit 的控制权交给模型。

Versioncodex-cli 0.137.0
rust-v0.137.0
Sourceopenai/codex
f221438b691b
Core filescore/src/goals.rs
state/runtime/goals.rs
Surface/goal · app server RPC
model tools

00 · Thesis00 · 主论点

The useful mental model is: goal mode is a persistent thread-scoped state machine with an autonomous turn scheduler attached.

最有用的 mental model 是: goal mode 是一个 thread-scoped 的持久状态机, 外面再挂一个自动 turn scheduler。

The public documentation says a goal is a persistent objective and completion criterion. The code makes that concrete by splitting the system into four layers: user controls, model-visible tools, core runtime policy, and SQLite state. This split is the design.

公开文档说 goal 是持久 objective 和完成标准。 源码把这句话落成四层: 用户控制、 模型可见工具、 core runtime policy、 SQLite 状态。 这个拆分本身就是设计重点。

central claim

Codex deliberately gives the model the ability to finish or declare a real block, but keeps pause, resume, budget-limit, usage-limit, and replacement semantics in the host/runtime layer.

Codex 有意让模型可以 完成目标声明真实阻塞, 但把 pause、 resume、 budget-limit、 usage-limit 和 replacement 语义留在 host / runtime 层。

Plate IThe four-layer shape四层结构
Surfaces /goal · app server RPC · model tools User authority set / edit / pause / resume / clear Model authority get · create · complete · blocked System authority budget · usage · continuation core/src/goals.rs GoalRuntimeEvent dispatcher · accounting · continuation goals_1.sqlite · thread_goals

The model is not the owner of the state machine. It can request only narrow transitions; the runtime owns the rest.

模型不是状态机的 owner。 它只能请求很窄的 transition, 其他 transition 由 runtime 控制。

01 · Public Contract01 · 公开语义

The manual describes goal text as both starting prompt and definition of done. The implementation preserves that idea, but turns it into persisted data and lifecycle hooks.

Manual 把 goal text 描述成起始 prompt 和 definition of done。 实现保留了这个语义, 但把它落成持久数据和 lifecycle hooks。

The official manual section says Goal mode is for longer tasks, especially when Codex needs a durable definition of success. It also exposes `/goal` across app, IDE, and CLI surfaces and says `features.goals` controls availability. In the source, that feature is stable and default-enabled.

官方 manual 说 Goal mode 用在长任务里, 特别是 Codex 需要一个持久成功标准的时候。 它也说明 app、 IDE、 CLI 都通过 `/goal` 暴露这个能力, 并由 `features.goals` 控制。 源码里这个 feature 已经是 stable 且默认启用。

# distilled from official manual + features/src/lib.rs
GoalMode =
  persistent_objective
  + completion_criteria
  + feature_flag("goals", stable, default_enabled=true)
Manual: codex-manual.md, Goal mode section, lines 508-548 in the fetched manual.
reading rule

Do not read goal mode as a model prompt trick. The prompt is only one actuator. The durable object is in state, and the scheduler is in core runtime.

不要把 goal mode 理解成一个 prompt trick。 prompt 只是 actuator 之一。 真正持久的是 state object, 自动推进来自 core runtime。

02 · State Model02 · 状态模型

There are two shapes for the same object: a public protocol shape for UI and tools, and an internal state shape with a hidden `goal_id` for concurrency safety.

同一个 object 有两种形状: 给 UI 和 tools 的公开 protocol shape, 以及带隐藏 `goal_id` 的内部 state shape, 后者用于并发安全。

Layer Fields字段 Design point设计点
protocol thread_id, objective, status, budget and usage counters What clients and model tools are allowed to observe. clients 和模型工具允许观察的形状。
state protocol fields + hidden goal_id The runtime uses the hidden id to avoid stale writes. runtime 用隐藏 id 避免 stale write。
sqlite thread_id primary key in thread_goals One current goal per persisted thread. 每个 persisted thread 只有一个当前 goal。
# distilled from protocol.rs and goals_migrations/0001_thread_goals.sql
ThreadGoal {
  thread_id, objective, status,
  token_budget?, tokens_used, time_used_seconds,
  created_at, updated_at
}

thread_goals table:
  primary key: thread_id
  hidden safety key: goal_id
  status in active | paused | blocked | usage_limited | budget_limited | complete

The status set is telling. `blocked` and `complete` are semantic judgments. `paused` is user control. `usage_limited` and `budget_limited` are system control. `active` is the only state that participates in normal autonomous continuation.

状态集合很能说明设计。 `blocked` 和 `complete` 是语义判断。 `paused` 是用户控制。 `usage_limited` 和 `budget_limited` 是系统控制。 `active` 是唯一参与正常自动续跑的状态。

Plate IIStatus machine状态机
active paused blocked complete usage limit budget limit user pause / resume model declares block model proves done system system

The status machine is split by authority. That is why `update_goal` cannot pause or budget-limit a goal.

状态机按 authority 分层。 所以 `update_goal` 不能 pause, 也不能 budget-limit 一个 goal。

03 · Three Front Doors03 · 三个入口

Goal mode has three ingress paths. They all mutate the same state, but each path is given a different permission envelope.

Goal mode 有三个入口。 它们最终都改同一个 state, 但每个入口的权限边界不同。

Door A · `/goal` in the TUI

入口 A · TUI 里的 `/goal`

The TUI parser treats `/goal clear`, `/goal edit`, `/goal pause`, and `/goal resume` as control commands. Any other non-empty argument becomes a candidate objective. If there is no thread yet, it queues the slash command until a session exists.

TUI parser 把 `/goal clear`、 `/goal edit`、 `/goal pause`、 `/goal resume` 当作控制命令。 其他非空参数会变成候选 objective。 如果当前还没有 thread, 它会把 slash command 先排队, 等 session 存在后再处理。

# distilled from tui/src/chatwidget/slash_dispatch.rs
if input is "clear"      -> ClearThreadGoal
if input is "edit"       -> OpenThreadGoalEditor
if input is "pause"      -> SetThreadGoalStatus(Paused)
if input is "resume"     -> SetThreadGoalStatus(Active)
otherwise                -> SetThreadGoalObjective(..., ConfirmIfExists)

Door B · App Server RPC

入口 B · App Server RPC

The app server exposes `thread/goal/set`, `thread/goal/get`, and `thread/goal/clear`. Before mutating, it resolves a materialized thread, reconciles the rollout into state, validates objective and budget, then emits ordered notifications so UI state and runtime state converge.

App server 暴露 `thread/goal/set`、 `thread/goal/get`、 `thread/goal/clear`。 变更前它会解析 materialized thread, 把 rollout reconcile 到 state, 校验 objective 和 budget, 再发有序 notification, 让 UI state 和 runtime state 收敛。

Door C · Model tools

入口 C · 模型工具

The model gets `get_goal`, `create_goal`, and `update_goal`. But `update_goal` only exposes `complete` and `blocked`. This is the most important guardrail in the public tool contract.

模型拿到的是 `get_goal`、 `create_goal`、 `update_goal`。 但 `update_goal` 只暴露 `complete` 和 `blocked`。 这是工具 contract 里最重要的 guardrail。

ActorActor Can do能做什么 Cannot do不能做什么
User / UI Set, replace, edit, pause, resume, clear. 设置、 替换、 编辑、 暂停、 恢复、 清除。 Bypass validation or ephemeral-thread constraints. 绕过校验或 ephemeral thread 约束。
Model Read, explicitly create, mark complete, mark blocked. 读取、 显式创建、 标记完成、 标记阻塞。 Pause, resume, budget-limit, usage-limit, clear. 暂停、 恢复、 budget-limit、 usage-limit、 clear。
Runtime Account usage, trigger continuation, enforce budget and usage limits. 统计用量、 触发续跑、 执行预算和 usage limit。 Decide semantic completion by itself. 自己判断语义完成。

04 · Runtime Dispatcher04 · 运行时调度

The center of the design is `goal_runtime_apply`: a dispatcher that converts session lifecycle events into state mutations, accounting, steering prompts, and new turns.

设计中心是 `goal_runtime_apply`: 它把 session lifecycle events 转成 state mutation、 accounting、 steering prompt 和新的 turn。

# distilled from core/src/goals.rs
on TurnStarted        -> bind current active goal to this turn
on ToolCompleted      -> account tokens/time, maybe inject budget steering
on ToolCompletedGoal  -> account, but suppress budget steering
on TurnFinished       -> final accounting for the turn
on MaybeContinueIdle  -> launch another turn if active goal is still pending
on UsageLimitReached  -> mark current active goal usage_limited
on ExternalSet/Clear  -> reconcile runtime state with UI/RPC mutation
on ThreadResumed      -> restore runtime state and maybe continue

This dispatcher is why Goal mode can survive context compaction and multiple turns. The current objective is not trusted to remain in the model's short-term conversational memory; the runtime can rehydrate it into the next turn as internal context.

这个 dispatcher 是 Goal mode 能跨 context compaction 和多轮运行的原因。 当前 objective 不依赖模型短期对话记忆保存; runtime 可以在下一轮把它重新注入成 internal context。

Plate IIIA single turn under goal modeGoal mode 下的一轮 turn
Turn start token baseline Model work messages · reasoning Tool finish account delta Turn stop final accounting GoalRuntimeState turn snapshot · wall clock snapshot · continuation lock · reported budget id MaybeContinueIfIdle -> next turn

The runtime records the goal against a specific turn, then updates usage at tool and turn boundaries.

runtime 把 goal 绑定到具体 turn, 然后在 tool 和 turn 边界更新用量。

05 · Usage Accounting05 · 用量统计

Goal mode does not merely remember an objective. It measures how much agent budget has been spent pursuing that objective.

Goal mode 不只是记住一个 objective。 它还会统计 agent 为这个 objective 消耗了多少预算。

There are two counters: wall-clock seconds and goal token usage. The runtime tracks a token baseline at turn start, then accounts deltas after tool calls and at turn end. The state layer clamps negative deltas to zero and can flip an active goal into `budget_limited` when tokens cross the configured budget.

这里有两个 counter: wall-clock seconds 和 goal token usage。 runtime 在 turn start 记录 token baseline, 然后在 tool call 后和 turn end 统计 delta。 state 层会把负 delta clamp 到 0, 并在 tokens 超过预算时把 active goal 翻成 `budget_limited`。

# distilled from core/src/goals.rs and state/src/runtime/goals.rs
token_delta =
  current_non_cached_input_plus_output
  - last_accounted_non_cached_input_plus_output

persist:
  time_used_seconds += wall_clock_delta
  tokens_used       += token_delta

if status is active and tokens_used >= token_budget:
  status = budget_limited

The hidden `goal_id` matters here. Accounting is only applied when the expected id still matches. Without that, a late tool finish from an older turn could charge a freshly replaced goal.

隐藏的 `goal_id` 在这里很关键。 accounting 只有在 expected id 仍然匹配时才落库。 否则旧 turn 的迟到 tool finish 可能把用量记到新替换的 goal 上。

Plate IVWhy `goal_id` exists为什么需要 `goal_id`
Turn A active goal_id = g1 Turn B new goal_id = g2 late tool finish SQLite row thread_id primary key expected g1 != current g2 drop stale accounting

The DB has one row per thread, so `goal_id` is the concurrency token that distinguishes logical goals over time.

DB 每个 thread 只有一行, 所以 `goal_id` 是区分不同时刻 logical goal 的 concurrency token。

Runtime accounting: core/src/goals.rs:936-1056.

06 · Continuation Loop06 · 自动续跑

Automatic continuation is conservative. It starts a new turn only when the thread is idle, there is no trigger-turn mailbox input, and the stored goal is still the same active goal.

自动续跑是保守的。 它只有在 thread idle、 没有 trigger-turn mailbox input、 且 DB 中仍然是同一个 active goal 时才启动新 turn。

# distilled from maybe_start_goal_continuation_turn
if not feature_enabled: stop
if mode is Plan: stop
if active_turn exists: stop
if user-triggered input is pending: stop
goal = read_db(thread_id)
if goal.status != active: stop
if goal_id changed before launch: cancel reservation
inject continuation prompt as internal context
start a default turn

The continuation prompt is also carefully worded: it treats the objective as user data, keeps the full objective intact, asks for evidence before completion, and prevents the model from shrinking success to the easiest completed subset.

continuation prompt 也写得很谨慎: 它把 objective 当作 user data, 保留完整 objective, 要求完成前做 evidence audit, 并阻止模型把成功标准缩小到最容易完成的子集。

Plate VContinuation gates续跑 gate
Idle? No input? Goal active? Same id? Start Injected context continuation.md: objective + budget + completion audit any failed gate returns without a turn

The continuation loop is explicit scheduling, not a magical long-running model call.

续跑 loop 是显式 scheduling, 不是一个神奇的长时间模型调用。

07 · Boundaries07 · 边界条件

Most of the interesting engineering is in what Goal mode refuses to do.

Goal mode 最有意思的工程点, 在于它拒绝做什么。

Plan mode ignores goal continuation

Plan mode 忽略 goal continuation

The runtime explicitly ignores goal continuation while the collaboration mode is `Plan`. That avoids turning a planning conversation into an autonomous execution loop.

runtime 在 collaboration mode 是 `Plan` 时显式忽略 goal continuation。 这避免把 planning conversation 变成自动执行 loop。

Ephemeral threads cannot own goals

Ephemeral thread 不能拥有 goal

App server and core both require a materialized thread with a state database. The TUI turns that into a user-facing message: goals need a saved session.

App server 和 core 都要求 materialized thread 加 state database。 TUI 把这个约束转成用户提示: goals 需要 saved session。

Budget limited is not complete

Budget limited 不等于 complete

When the token budget is hit, the runtime injects budget steering that tells the model to wrap up and not start new substantive work. It also says not to call `update_goal` unless the goal is actually complete.

token budget 被打满后, runtime 会注入 budget steering, 要求模型收尾, 不要开始新的实质工作。 它同时强调: 除非 goal 真的完成, 否则不要调用 `update_goal`。

# boundary summary
Plan mode                 -> no automatic continuation
Ephemeral thread          -> no persisted goal
Budget exhausted          -> budget_limited, not complete
Usage limit reached       -> usage_limited, resumable by user
Model calls update_goal   -> only complete or blocked accepted
semantic boundary

The runtime can measure work and enforce stop states, but it cannot prove the user's real-world objective. Completion is still a model claim, made safer by tool restrictions and completion-audit instructions.

runtime 可以衡量工作量并执行停止状态, 但它不能证明用户的真实目标已经达成。 完成仍然是模型声明, 只是通过工具限制和 completion-audit 指令变得更安全。

08 · Design Judgment08 · 设计判断

The implementation is conservative in the right places. It makes the goal durable, but it does not let durability become unlimited agency.

这个实现保守得很到位。 它让 goal 持久, 但没有让持久性变成无限 agency。

Choice选择 Why it matters为什么重要
thread_id primary key A thread has one current goal. Replacing a goal resets usage, which matches user intent better than silently mixing objectives. 一个 thread 只有一个当前 goal。 替换 goal 会重置 usage, 这比静默混合多个 objective 更符合用户预期。
goal_id hidden from protocol Clients do not need it, but runtime writes need it to avoid stale accounting and stale continuation. client 不需要它, 但 runtime write 需要它来避免 stale accounting 和 stale continuation。
Narrow model tool The model can conclude, but it cannot silently pause, resume, clear, or self-extend budgets. 模型可以做结论, 但不能静默 pause、 resume、 clear, 也不能自己扩预算。
Idle continuation Continuation happens only after normal pending work is drained, so user input wins over autonomous progress. 续跑只在普通 pending work drain 之后发生, 所以用户输入优先于自动推进。
Plate VIThe authority splitAuthority split
User Model Runtime set / edit pause / resume clear / replace read create if asked complete blocked account usage budget limit usage limit schedule next turn

The goal system is not about trusting the model more. It is about giving the host enough structure to keep the model moving safely.

goal system 不是“更相信模型”。 它是给 host 足够结构, 让模型能安全地持续推进。

what I would watch

The weakest point is not the state machine; it is semantic completion. The code can protect budgets and stale writes, but the final “done” signal is only as good as the model's evidence audit and the user's goal wording.

最弱的点不是状态机, 而是 semantic completion。 代码能保护预算和 stale write, 但最终 “done” 信号的质量取决于模型 evidence audit 和用户 goal wording。

09 · Reading Map09 · 阅读地图

If you want to inspect this yourself, read the files in this order. Each file answers one design question.

如果你要自己继续看, 建议按这个顺序读。 每个文件回答一个设计问题。

Question问题 File文件
What is a goal? goal 是什么? protocol/src/protocol.rs:3622-3662
Where is it stored? 它存在哪里? state/goals_migrations/0001_thread_goals.sql:1-18
How can UI mutate it? UI 怎样修改它? app-server/src/request_processors/thread_goal_processor.rs:92-248
How can the model mutate it? 模型怎样修改它? core/src/tools/handlers/goal_spec.rs:12-98
Who owns lifecycle policy? 谁拥有 lifecycle policy? core/src/goals.rs:305-395
How does continuation start? 续跑怎样启动? core/src/goals.rs:1243-1395
one-sentence summary

Codex Goal mode is a host-managed loop around a persisted objective, with the model allowed to drive work but not allowed to own the stop-state machinery.

Codex Goal mode 是 host 围绕 persisted objective 管理的 loop, 模型可以推进工作, 但不能拥有 stop-state machinery。