Changelog
Chronological log of significant changes to the main codebase. For full details see each PR on GitHub.
2026-04-21
Batch rollback command reset-to (sprint PR G)
reset-tocommand (reset_to.py): Roll back a graph to any reachable state in a single command.--after <id>accepts a task ID (keep that task + everything before it) or a workstream ID (keep that workstream + everything merged before it). Computes the minimum set of operations (one branch reset per affected branch, not one per task), displays a structured plan, and executes after confirmation.- Plan-then-execute pattern:
build_plan()computes a frozenResetToPlandataclass,format_plan()renders it for display,execute_plan()performs the operations. Composed entirely from existingreset_ops.pyshared utilities. - Transitive dependency resolution: Uses BFS through
TaskGraph.dependent_ids()to find all tasks that transitively depend on removed tasks. Dependent workstreams are torn down; independent in-progress workstreams are untouched. - Rollback log
sourcefield:write_rollback_entry()now records which command triggered each entry ("reset-task","reset-workstream", or"reset-to --after <id>"). - Worktree checkout fixup: After batch branch resets, surviving worktrees checked out on deleted task branches are switched to their integration branch.
- CLI:
agentrelay reset-to <graph> --after <id> [--yes]. - Diagram updates: New
reset-toper-module diagram.
2026-04-17
Shared reset utilities and primitive undo commands (sprint PR F)
- Shared reset utilities (
reset_ops.py): Five composable building blocks for stack-based undo:reset_branch,delete_task_state,delete_workstream_state,find_workstream_tip,workstream_merge_order. Usesgit update-reffor branch resets (works even when the branch is checked out in a worktree). reset-taskcommand (reset_task.py): Peel the tip task from a workstream's execution stack. Rolls back the integration branch for merged tasks (using pre-merge SHA fromresolved.json), deletes signal directories and branches for non-merged tasks. Auto-detects the tip when--workstreamis used instead of--task.teardown-workstreamcommand (reset_workstream.py): Remove workstream infrastructure (worktree, integration branch, signal dir) after all tasks have been peeled back.reset-workstreamcommand (reset_workstream.py): Undo a merged workstream from its target branch. Validates the stack constraint (must be the most recently merged), resets the target branch, closes the integration PR, and removes all task and workstream state.pr_close_by_url(ops/gh.py): New URL-based PR close function.- Diagram updates: Three new per-module diagrams (
reset-ops,reset-task,reset-workstream) and updated module overview.
2026-04-13
Fix tmux kickoff prompt submission
- Double-Enter for Claude Code TUI (PR B2): Claude Code >= 2.1.105
interprets the first Enter as a newline in the TUI input box rather than
a submit.
send_kickoff()now sends a second Enter after a brief delay to reliably submit agent kickoff prompts.
2026-04-09
CLI cleanup and diagram tooling (sprint 2026-04-09 PRs A–G)
- CLI cleanup (PR #180): Fix
--max-concurrencyhelp text, add short options (-a,-d,-T,-A,-C,-S,-W,-I), shorten flag names (--fail-fast-workstream,--fail-fast-internal). - ELK layout engine (PR #181): Switch from TALA to ELK for D2 diagram
rendering. Drop monolith
diagram-detailed.svg(source.d2remains as input to per-module generation). All 19 SVGs re-rendered. - Graph YAML config fields (PR #182):
max_concurrency,max_task_attempts,teardown_modeas graph-level YAML fields with CLI-wins precedence.--keep-panesCLI flag for existing YAML field.OperationalConfigdataclass replaces raw tuple. - OCI container retry fix (PR D): Attempt-indexed container names
(
agentrelay-{graph}-{task_id}-{attempt}) and tmux window names preventdocker runname collisions on retry.SandboxContext.attempt_numfield carries attempt number to sandbox operations. Sandbox instance stored onTaskArtifacts;WorktreeTaskTeardowncallssandbox.teardown()behind the existing_should_teardown()gate. - Default teardown mode → ALWAYS (PR E): Change
OrchestratorConfig.task_teardown_modedefault fromON_SUCCESStoALWAYS. Tmux panes, branches, and containers are now cleaned up on both success and failure. Persistent artifacts (agent.log,summary.md, per-attempt archives) provide the same debugging data.ON_SUCCESSandNEVERremain available via-Tor graph YAML. - Record effective run config (PR F): After CLI > YAML > default
resolution, write
.workflow/<graph>/run_config.jsonwith the effectiveOrchestratorConfigfields plus model, sandbox, credential, keep_panes, and verbose settings. Provides a complete post-mortem record of what values were actually used for a run. - Uniform per-attempt signal directories (PR G): Agent-written
artifacts (
.done,.failed,concerns.log,ops_concerns.log,summary.md,agent.log,gate_last_output.txt) now live undersignal_dir/attempts/<N>/for every attempt, including the current one. Eliminates the split between archived and current-attempt layouts. Removes_archive_attempt_artifacts()and_clear_agent_signals()fromreset_for_retry(). Orchestrator-managed files (manifest.json,instructions.md,status/,outputs.json) stay at signal_dir level.
2026-04-05
Output composition and execution polish (sprint 2026-04-05 PRs A–C)
inputs_fromgraph YAML extension (PR #163):InputsFromdataclass and optionalinputs_fromfield onTask. Downstream tasks reference upstream outputs by task ID and optional category. Orchestrator resolvesinputs_fromat prepare time by reading upstreamoutputs.json, filtering by category, and merging with explicitpaths. Validation rejects references to non-dependency tasks. Resolved input files appear inmanifest.jsonand agent instructions ("Input Files" section). Addedgraphs/smoke/inputs_from_chain.yamle2e graph. Design philosophy:inputs_fromis guidance, not restriction — agents retain full graph awareness and can explore beyond declared inputs.- Integration PR body refinement (PR #164):
TaskSummary.summary_textfield populated fromsummary.mdin each task's signal directory. Integration PR body now includes agent-written summaries as collapsible<details>sections. Long task descriptions truncated. Tasks without descriptions fall back to task ID + role instead of "(no description)". fail_fast_on_internal_errorCLI flag (PR #165):--fail-fast-on-internal-error/--no-fail-fast-on-internal-errorCLI flag andfail_fast_on_internal_errorgraph YAML field. Precedence: CLI overrides YAML, YAML overrides default (True). Completes the fail-fast configuration surface (workstream error flag shipped in PR #159).- All 3 PRs developed in parallel across git worktrees, merged sequentially. 1348 tests (75 new).
2026-04-04
Execution quality and output manifests (sprint 2026-04-04 PRs A–E)
- Capture agent.log on task failure (PR #157): Extracted scrollback
capture from
TmuxAddress.teardown()into a newcapture_log()method onAgentAddress/TmuxAddress. AddedTaskLogCaptureprotocol andWorktreeTaskLogCaptureimplementation as a new task runner lifecycle step.StandardTaskRunner.run()callscapture_log()unconditionally in thefinallyblock before the teardown decision.TmuxAddress.teardown()skips redundant capture ifagent.logalready exists (idempotent). - Retry agent awareness (PR #158): New
## Previous Attemptssection in agent instructions whenattempt_num > 0. Lists archived attempt directories with absolute paths, artifact filenames (agent.log,gate_last_output.txt,summary.md,concerns.log,ops_concerns.log), and guidance to review prior scrollback and identify root cause before writing code. - CLI and graph YAML control for fail-fast (PR #159):
--no-fail-fast-on-workstream-errorCLI flag andfail_fast_on_workstream_errorgraph YAML field. Precedence: CLI overrides YAML, YAML overrides default (True).fail_fast_on_internal_errorstays always-on (no flag). Updatedblocked_downstream.yamle2e graph to use the new field. - Skip empty integration PR (PR #160):
GhWorkstreamIntegratorchecksgit.rev_list_count()before callinggh pr create. If the integration branch has zero commits ahead of the base, transitions the workstream directly to MERGED — no integration PR created. Addedrev_list_count()toops/git.pyand "skipped_integration" event toConsoleListener. - Output manifests and
agentrelay-declare(PR #161):OutputManifest/OutputEntrydata models inagent_sdk/output_manifest.py.agentrelay-declareCLI (--path,--action created|modified,--category) appends tosignal_dir/outputs.json.TaskHelper.declare_output()convenience method.outputs.jsonadded to Graph Awareness artifact listing in instructions.agentrelay-declareguidance added to "Submitting Your Work" section. Foundation for output-driven task composition (docs/discussions/OUTPUT_DRIVEN_COMPOSITION.md). - All 5 PRs developed in parallel across git worktrees, merged sequentially. 1273 tests (39 new).
2026-04-03
Agent graph awareness (sprint 2026-04-03 PR A)
- Graph YAML delivery (
run_graph.py): Single copy of the source graph YAML written to.workflow/<graph>/graph.yamlat startup, alongside the existingrun_info.json. Byte-for-byte copy preserves comments and formatting. Agents derive the path from their signal directory (navigate up two levels). - Graph Awareness instructions (
templates.py): New conditional## Graph Awarenesssection in agent instructions. Includes: graph YAML absolute path, signal directory formula (<signals_base>/<task-id>/), per-task artifact names (summary.md,concerns.log,ops_concerns.log,.done), guidance on reading upstream summaries before starting work, and guidance on tailoring own summary for downstream consumers visible in the graph YAML. - OCI read-only mount (
oci_sandbox.py):.workflow/<graph>/mounted read-only in containers so isolated agents can read peer signal directories and the graph YAML. Agent's own signal directory remains read-write via existing mount (overlays the read-only parent for that subtree). - Isolation section update (
templates.py): "Workflow directory (read-only)" added to "What You Can Access". "Signal directories" removed from "What You Cannot Access". - Task preparer (
task_preparer.py): Passesgraph_yaml_pathandsignals_base_pathtoresolve_instructions(). - Design docs: Context-sharing design moved to
docs/discussions/CONTEXT_SHARING.md. Output-driven composition design added atdocs/discussions/OUTPUT_DRIVEN_COMPOSITION.md. - E2E test graphs:
graphs/graph_awareness/spec_test_impl_{sonnet,haiku,opus}.yaml— 3-task generic pipeline (spec→test→impl) with nopathsfields and no role templates. Agents discover file locations from upstream summaries. - 1234 tests (16 new). E2E validated: all tasks succeeded first attempt with Sonnet; gate passed first try.
2026-04-02
Agent experience improvements (sprint 2026-04-01 PRs A–E)
agentrelay-completeretry fix (PR #149):create_pr()probes for existing open PR before callinggh pr create; reuses it on retry. Updates body viagh apiREST (notgh pr edit) to avoid GraphQL deprecation.COMPLETEDtask status (PR #150): NewTaskStatus.COMPLETEDfor PR-less tasks instead of misusingPR_MERGED.SUCCESS_STATUSESfrozenset centralizes the "task succeeded" check across 8 consumer sites.agentrelay-summaryCLI (PR #151): Newagentrelay-summary --messageCLI andTaskHelper.write_summary()for PR-less task summaries.- Worktree CWD guidance (PR #152):
## Working Directorysection in agent instructions with explicit worktree path to prevent Level 0 agents from navigating out of the worktree. - Retry artifact archive (PR #153):
reset_for_retry()copies agent.log, gate_last_output.txt, summary.md, concerns.log tosignal_dir/attempts/<N>/before clearing signals.
2026-03-31
OAuth and consolidated Anthropic credentials for containers (sprint 2026-03-30 PR A)
- Container startup script (
setup-credentials.py): Generates~/.claude/settings.jsonat container startup instead of pre-seeding at build time. API key mode includesapiKeyHelper; OAuth mode omits it so Claude Code reads.credentials.jsonnatively. Also copies OAuth credentials from read-only mount to writable location for token refresh. - Dockerfile update: Removed baked-in
settings.json. Addedclaude-setup-credentialsto the startup chain (beforeclaude-trust-workdir). - Consolidated credentials YAML: New
anthropicsection with named, typed entries replacing the olddefaultssection.CredentialTypeenum (api_key,oauth) andAnthropicCredentialfrozen dataclass.api_keyentries supportkey(inline) orkey_file(read from file, tilde-expanded, whitespace-stripped) to keep secrets portable. - OciSandbox: Accepts
AnthropicCredential | Noneinstead of raw path. Type-aware injection:_ANTHROPIC_API_KEYenv var for API key mode, read-only volume mount for OAuth mode. Defensive guard rejectsANTHROPIC_API_KEYinSandboxContext.env_vars. - CLI:
--anthropic-credential <name>selects a named credential. Auto-selects when single entry, errors clearly with multiple. Graph YAMLanthropic_credentialoperational key provides a default (CLI overrides). - FileCredentialProvider:
resolve()returns tier-only env vars (no more defaults merge). Newresolve_anthropic()method with name lookup or auto-select.anthropic_namesproperty. - Docs: Credentials YAML schema in
SCHEMA.md, three example credential files (credentials-api-key.yaml,credentials-oauth.yaml,credentials-both.yaml). - 1187 tests (29 new). E2E validated both API key and OAuth modes with
basic_oci.yaml.
2026-03-30
Remaining e2e isolation testing (sprint 2026-03-26 PR F2)
- token_tiers e2e: Verified correct credential scoping across token tiers.
standard_task(standard tier) pushed, created PR, and succeeded.readonly_task(read_only tier) pushed but failed at PR creation (403 — token lackspull_requests: writescope). Agent recorded an ops concern about the missing permission. Graph outcome:COMPLETED_WITH_FAILURES. - permission_boundary e2e: Verified pre-push hook blocks agent push to
main. Agent attempted
git push origin HEAD:refs/heads/main, hook rejected it (IS_AI_AGENT=true), agent recorded an ops concern with the exact hook error message, then recovered by pushing to its task branch and completing normally. Graph outcome:SUCCEEDED(48s). - permission_boundary.yaml fix: Changed
token_tierfromread_onlytostandardso the test exercises the pre-push hook rather than a PAT permission denial.
Container e2e infrastructure fixes (sprint 2026-03-26 PR Fcleanup)
- UID alignment: Docker base image creates
agentuser with UID 1000 (matching typical host UID). Removesubuntuuser, eliminates--group-addworkaround, and fixesPermissionErroron reset cleanup. - Git credential helper: Configured in Docker image so
git pushover HTTPS uses injectedGH_TOKENautomatically. Agents no longer need to construct token URLs manually. - Claude Code first-run suppression: Pre-seeded
~/.claude/settings.jsonin framework image +DISABLE_AUTOUPDATERandCLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFICenv vars inOciSandbox. Prevents onboarding prompts from consuming the orchestrator's kickoff. - reset_graph improvements:
git worktree pruneafter directory removal, local branch cleanup viabranch_list_local()(catches branches missed by remote-only listing),force_rm()for Docker containers (handles exited containers thatstop+rmmissed). - E2e script decoupling:
e2e_run.sh,e2e_reset.sh, ande2e_check.shnow use--manifest-pathto run from agentrelay's pixi env. Target repo no longer needs agentrelay as a dependency. - Removed
--userand--group-addparameters fromdocker.build_run_command().
2026-03-29
Agent boundary instructions, container fixes, and e2e isolation graphs (sprint 2026-03-26 PR F1)
- Isolation boundary instructions:
resolve_instructions()gains asandbox_typeparameter. WhenSandboxType.OCI, a## Isolation Boundarysection is injected describing what the agent can/cannot access, what exists beyond its boundary, and how to report when blocked. IS_AI_AGENT=trueenv var: Injected byOciSandbox.wrap_command()into all containerized agents. Foundation for runtime agent detection.- Git pre-push hook: Baked into Docker base image at
/etc/agentrelay/hooks/pre-push. Blocks pushes tomain/masterwhenIS_AI_AGENT=true. Globalcore.hooksPathconfigured for agent user. - Container execution fixes:
bash -cshell wrapper,--group-addfor host GID file permissions,.gitdir read-write mount,TERMenv var for TUI rendering,chmodfor cross-UID claude binary access,safe.directorywildcard in container git config. - E2E isolation test graphs:
basic_oci.yaml,token_tiers.yaml,permission_boundary.yamlingraphs/isolation/.basic_oci.yamlvalidated end-to-end.
Docker images, network lifecycle, and CLI credentials (sprint 2026-03-26 PR E)
- Three-layer Docker image:
docker/base/Dockerfile(ubuntu:24.04, git, gh, python3, agent SDK),docker/toolchain/python/Dockerfile(adds pixi),docker/framework/claude-code/Dockerfile(adds Claude Code via native installer). Build all layers withpixi run docker-build. - Docker network lifecycle:
run_graph.pycreates a Docker network (agentrelay-<graph>) before orchestration and removes it in a finally block. Only activated when any task uses OCI sandbox. --credentialsCLI flag: Path to credentials YAML file, wiresFileCredentialProviderinto the task runner builder.- Docker label-based container tracking: Containers are labeled with
agentrelay.graphandagentrelay.taskfor programmatic lookups. Container names include graph name:agentrelay-<graph>-<task>. - Reset Docker cleanup:
reset_graph.pystops/removes containers by label and removes the Docker network (best-effort, swallows errors). - OciSandbox validate-not-create:
setup()now validates network exists (raises RuntimeError if missing) instead of creating it. - New
ops/docker.pyfunctions:ps_by_label(),labelsparameter onbuild_run_command(). build_standard_runner()accepts optionalcredential_providerparameter.- 1123 tests (16 new).
2026-03-26
Per-module diagrams, diagram cleanup, and sprint plan (PR #137)
- Per-module UML diagrams: New
tools/generate_module_diagrams.pyextracts 16 focused diagrams fromdiagram-detailed.d2— one per module, showing full types plus simplified external dependency stubs at reduced opacity. Also generatesdiagram-modules.d2inter-module overview. - Removed filtered variants: Deleted
diagram-no-private,diagram-no-impl, anddiagram-standardviews along withtools/generate_diagrams.pyandtools/d2_filters.py. Per-module diagrams replace them as the primary reference for understanding individual modules. - Diagram links in docs: Each API reference page links to its module diagram.
docs/DIAGRAM.mdincludes a table of all 16 module diagrams. - Sprint plan: Added
docs/sprints/2026-03-26-agent-isolation.md— 6-PR sprint for tunable agent isolation (Docker containers, scoped PATs,AgentSandboxprotocol,FrameworkConfigAdapter, credential provisioning). - Backlog cleanup: Marked cross-workstream ordering as resolved (PR #134), auto-merge as resolved (PR #135), removed diagram rendering size limit item.
2026-03-24
Wire ADR production into agent instructions (PR E)
- Rename
AgentVerbosity→AdrVerbosity: clearer naming, consistent with_AdrPolicy. - ADR instruction injection:
resolve_instructions()gains anadr_verbosityparameter. When notNONE, a## Architecture Decision Recordsection is injected between "What to Do" and "Submitting Your Work" with verbosity-scaled guidance and output pathdocs/adr/<task_id>.md. - Three verbosity tiers: STANDARD (Title, Status, Context, Decision, Consequences), DETAILED (+Alternatives Considered, Trade-offs, Implementation Notes), EDUCATIONAL (+annotations explaining each section's purpose).
graphs/adr/test category:adr_standard.yamlgraph with a single task at standard verbosity.
2026-03-22
Restructure instructions.md as a work-order document
- Work-order layout: instructions.md now reads as a natural work order:
Role → Tools → What to Do → Submitting Your Work → Task Details.
Each role gets a descriptive sentence ("You are a SPEC_WRITER tasked
with...") via
_ROLE_SENTENCESdict. - Scope preludes: each role template starts with an explicit scope statement (e.g., "Scope: write API stubs only") so agents know their boundaries before reading the steps.
- Concerns during work: concern recording guidance moved into "What to Do" (record as you work) instead of the submission section.
- Task Details at the end: task description appears last as a reference section. For generic tasks, the description IS the work and goes under "What to Do" directly.
- Spec writer simplified: removed markdown spec step — Python stubs with signatures and docstrings are the single source of truth.
Ops concerns — separate channel for operational issues (PR B4)
- Separate ops concerns channel: Agents can now record operational concerns
(build errors, missing deps, tooling friction) via
agentrelay-ops-concernCLI, distinct from design concerns (agentrelay-concern). Stored inops_concerns.login the signal directory. - End-to-end pipeline:
TaskHelper.record_ops_concern()→ops_concerns.log→SignalCompletionChecker→TaskArtifacts.ops_concerns→TaskSummary.ops_concerns→ integration PR body + console summary. - Separate rendering: Ops concerns appear in their own
## Ops Concernssection in both individual task PRs and integration PRs, and in a separateOps Concerns:block in console output. - Workflow footer updated: Step 2 now distinguishes design vs ops concerns with examples of each category.
2026-03-21
Declared tools + TaskHelper CLI wrapper (PR B2)
- Declared tools in graph YAML: New
toolsfield (list of tool names) at the graph level. The orchestrator validates each tool is available before launch and injects usage guidance into agent instructions. Starts with pixi; extensible viaTOOL_REGISTRYintools.py. - TaskHelper CLI wrapper: Agents can now use shell commands instead of
inline Python:
agentrelay-complete,agentrelay-failed,agentrelay-concern. Eliminates shell-escaping issues with inline Python in zsh. - Workflow footer updated: Shows CLI commands instead of Python snippets. Tools guidance section appears when tools are declared.
Roles test graphs and multi-role pipeline (PR B)
- Roles pipeline graph (
graphs/roles/pipeline.yaml): Four-role pipeline (spec_writer -> test_writer -> test_reviewer -> implementer) building a BoundedQueue class. Exercises role-specific templates, TaskPaths, and the full handoff chain for the first time with real agents. - Organic concern test: The spec_writer description contains a deliberate contradiction (eviction vs OverflowError on full push) to test whether agents discover and report spec inconsistencies without explicit prompting.
- Implementer template fix: Updated
templates/implementer.mdto connect concern documentation tohelper.record_concern()with guidance on what qualifies as a concern (contradictions, ambiguities, impossible requirements). - Backlog: Added structured instruction architecture item (concern definitions as formal data, partially structured instructions.md, default-with-overrides pattern).
2026-03-20
Signal-file-backed TaskStatus (PR A3)
- TaskStatus is now derived from signal files on disk, matching the
WorkstreamStatus pattern from PR #115. Status files live under
.workflow/<graph>/signals/<task_id>/status/with one file per status value (pending,running,pr_created,pr_merged,failed). TaskRuntime.statusis a computed property reading from signal files. Falls back toPENDING(orFAILEDif an error is recorded) when no signal directory has been set.TaskState.statusfield removed — status is no longer stored in memory.- New mark methods on
TaskRuntime:mark_running(),mark_pr_created(),mark_pr_merged(). Existingmark_pending()andmark_failed(error)updated to write signal files. StandardTaskRunner._transition()validates lifecycle edges then delegates to the appropriate mark method. FAILED transitions usemark_failed(error)directly at call sites.reset_for_retry()clears all status signal files before writing a freshpendingfile.
2026-03-19
Docs, demo graphs, and E2E testing (PR P)
- Demo graphs: Replaced outdated
graphs/demo.yamlwith two tested demos (quick_parallel.yaml,quick_chained.yaml) versioned alongside the code. - E2E scripts: Three shell scripts in
tools/for running graphs against external target repos: e2e_run.sh— validates target repo, runs a graph.e2e_reset.sh— resets a graph run in a target repo.e2e_check.sh— preflight check (pixi, agentrelay dependency, Python, gh auth, agent environment, working tree cleanliness, leftover state).- Conflict detection:
run_graphnow errors if.workflow/<graph>or.worktrees/<graph>already exists, preventing corrupted state from overlapping runs.--dry-runreports conflicts as warnings. - Pixi tasks:
e2e,e2e-reset,e2e-check. - Updated docs: GUIDE.md rewritten with current-architecture-first CLI reference and E2E section. WORKFLOW.md expanded with practical run/reset cycle examples.
2026-03-18
Composition and CLI entry point (PR N2)
build_standard_workstream_runner(): New builder inorchestrator/builders.pythat wiresGitWorkstreamPreparer,GhWorkstreamMerger, andGitWorkstreamTeardowninto aStandardWorkstreamRunner. Mirrors thebuild_standard_runner()pattern.run_graph.py: New top-level module providing:run_graph()async composition function: loads YAML, builds graph + runners + orchestrator, and runs to completion.dry_run(): validates graph YAML and prints execution plan (task order, dependencies, workstreams).- CLI via
python -m agentrelay.run_graph <graph.yaml>with flags:--max-concurrency,--max-task-attempts,--teardown-mode,--tmux-session,--model,--dry-run. - Operational YAML keys:
tmux_session,keep_panes,modelare popped from the raw YAML before graph parsing, allowingTaskGraphBuilderto stay unchanged. CLI flags override YAML values. - 19 new tests: unit tests for YAML preprocessing, builder, and dry-run; integration tests verifying full wiring from graph YAML through orchestrator with test doubles.
Reset tool (PR O)
reset_graph.py: New top-level module for resetting a repository to its pre-graph-run state. Reads.workflow/<graph>/run_info.json(written byrun_graph), then: closes open PRs, resets main and force-pushes (with ancestry safety check), deletes remote/local branches, removes worktree and workflow directories.plan_reset()/execute_reset(): separated planning from execution for testability.- CLI via
python -m agentrelay.reset_graph <graph.yaml>with--yesflag. - Out-of-order reset detection: skips main-branch reset if
start_headis not an ancestor of current HEAD, still performs all other cleanup. - Idempotent: re-running after a successful reset is safe.
run_info.json:run_graphnow writesstart_head+started_atto.workflow/<graph>/run_info.jsonbefore orchestrator runs.- New ops primitives:
git.rev_parse_head,git.merge_base_is_ancestor,git.reset_hard,git.push_force_with_lease,git.ls_remote_branches,gh.pr_list,gh.pr_close. - 10 new tests: plan/execute against temp git repos with remotes, out-of-order detection, idempotency, PR closing (mocked), run_info.json integration.
2026-03-16
Add graphical popups to overview diagram
- Per-package mini SVGs:
tools/generate_overview.py --mode package-svgsextracts each top-level package's D2 block fromdocs/diagram.d2, renders it as a standalone SVG viad2, and embeds all 13 mini SVGs (base64-encoded) in the overview HTML. - Click-to-open popups: Clicking a package box opens a centered panel showing that package's D2-rendered class diagram. Clicking an arrow shows both endpoint packages side-by-side with a text list of class-level connections. Click outside or press Escape to close.
- Build pipeline:
pixi run diagramnow runs thepackage-svgsmode after rendering the overview SVG, producingdocs/pkg-detail/*.d2and*.svgfiles. - 21 new tests covering D2 block extraction, per-package file generation, SVG rendering (mocked subprocess), and popup HTML generation.
Add two-tier diagram system with auto-generated overview
- Overview generator: New
tools/generate_overview.pyparsesdocs/diagram.d2, extracts top-level packages and cross-package relationships, deduplicates arrows to one per package pair, and writesdocs/diagram-overview.d2with tooltips listing each package's classes. - Two rendered views:
pixi run diagramnow generates bothdiagram-overview.svg(13 package boxes, ~19 directional arrows) anddiagram.svg(full class detail). - Zero-drift: Overview is derived from the detail diagram — single source of truth.
- 27 new tests covering parsing, deduplication, edge cases, and end-to-end validation against the real diagram.
Migrate design diagram from PlantUML to D2
- Tool change: Replaced PlantUML with D2 using the ELK layout engine for better handling of nested containers and cross-package relationship arrows at scale.
- Faithful translation: All ~80 classes/interfaces/enums across 12+ packages and
~170 relationships preserved in the new
docs/diagram.d2source. - Dependency swap: Replaced
plantumlwithd2in pixi.toml;pixi run diagramnow invokesd2 --layout elk. - Archived original PlantUML source as
docs/diagram.puml.archivedfor reference.
Extract _OrchestratorRun from Orchestrator
- Separation of concerns: Split
Orchestratorinto an immutable config holder and_OrchestratorRun, a module-private execution context that owns all mutable state for onerun()invocation. - Readable main loop:
execute()is ~20 lines — initialize, dispatch, handle deadlock, await, process completions, build result. - Simplified helper signatures: Helpers access
self._task_runtimes,self._orchestrator.graph, etc. directly instead of threading 3-6 parameters. - Deduplicated fail-fast: Extracted
_fail_fast_cancel()to replace the 3 repeated cancel-mark-clear patterns. - Pure refactoring — no behavioral changes, all existing tests pass unchanged.
2026-03-15
Wire WorkstreamRunner into Orchestrator
- WorkstreamRunner as Protocol: Promoted
WorkstreamRunnerto a Protocol (matching theTaskRunnerpattern). Renamed the concrete implementation toStandardWorkstreamRunner. The Orchestrator depends only on the protocol. - Orchestrator lifecycle wiring: Added
workstream_runneras a required field onOrchestrator. The orchestrator now callsprepare()before the first task in a workstream,merge()when all tasks reachPR_MERGED, andteardown()after the main loop exits. - MERGE_READY status: Added
WorkstreamStatus.MERGE_READYas an intermediate state betweenACTIVEandMERGED. Enables future human-approval gates before integration branch merge. - Derived active-task check: Removed
active_task_idfromWorkstreamStateand theactivate()/deactivate()methods fromWorkstreamRuntime. The one-task-per-workstream constraint is now enforced by scanning task runtimes forRUNNING/PR_CREATEDstatus. - fail_fast_on_workstream_error: New
OrchestratorConfigoption (defaultTrue). When a workstream fails duringprepare(), prevents preparing new workstreams but does not cancel in-flight work. - Updated diagram, exports, and all orchestrator tests.
2026-03-14
Flatten WorkstreamRunnerIO into WorkstreamRunner
- Removed
WorkstreamRunnerIOintermediate dataclass.WorkstreamRunnernow holds the three per-step protocol fields (_preparer,_merger,_teardown) directly, matching the pattern established byStandardTaskRunneron the task side. - Deleted
test/workstream/implementations/test_workstream_runner_io.py(tested the removed composition class). - Updated diagram, architecture docs, and exports.
StandardTaskRunner with per-step dispatch
- TaskRunner protocol: Promoted
TaskRunnerLiketoTaskRunner(protocol intask_runner/core/runner.py). The orchestrator depends only on this protocol. - StandardTaskRunner: Renamed the concrete
TaskRunnerclass toStandardTaskRunner. Replacedio: TaskRunnerIOwith sixStepDispatch[T]fields that co-locate step sequencing and implementation dispatch. - StepDispatch[T]: New generic frozen dataclass in
task_runner/core/dispatch.py. Selects per-step protocol implementations based on(AgentFramework, type[AgentEnvironment])dispatch key. Supportsentriesdict +defaultfallback. - Workstream context via TaskState: Added
integration_branchandworkstream_worktree_pathfields toTaskStateandTaskStateView. The orchestrator sets these fromWorkstreamRuntime.statebefore dispatch, removing workstream-specific constructor args fromWorktreeTaskPreparerandGhTaskMerger. - Builder: New
build_standard_runner()factory intask_runner/implementations/standard_runner_builder.pywires the standard worktree + tmux + Claude Code implementations viaStepDispatchdefaults. - TaskRunnerIO: Retained but marked deprecated; no longer the primary composition mechanism.
- Removed
TaskRunnerLikefromorchestrator/— replaced by importedTaskRunnerprotocol fromtask_runner.
2026-03-13
Add workstream-level protocol implementations
- Three concrete classes implementing the
WorkstreamRunnerIOper-step protocols, composingops/primitives: GitWorkstreamPreparer— creates git worktree and integration branch, pushes to origin with upstream trackingGhWorkstreamMerger— creates and merges workstream integration PR via GitHub CLI, updates local merge-target refGitWorkstreamTeardown— removes worktree, deletes local and remote integration branch (best-effort cleanup)- Added
git.push_delete_branch()thin wrapper toops/git.py
Fix task-level workspace model
WorktreeTaskPreparerno longer creates a per-task worktree. Instead it creates a task branch in the shared workstream worktree viagit.branch_create()+git.checkout(). Newworkstream_worktree_pathconfig field points to the worktree owned by the workstream preparer.WorktreeTaskTeardownno longer removes the worktree (owned by workstream teardown). Still deletes the task branch and captures agent logs.- Added
git.checkout()thin wrapper toops/git.py.
Add task-level protocol implementations
- Six concrete classes implementing the
TaskRunnerIOper-step protocols, composingops/primitives and protocol builders from theagent_comm_protocol/package: WorktreeTaskPreparer— creates git worktree, writesmanifest.json,policies.json, andinstructions.mdto the signal directoryTmuxTaskLauncher— delegates toTmuxAgent.from_config()to launch Claude Code in a tmux paneTmuxTaskKickoff— sends kickoff instructions to the launched agentSignalCompletionChecker— async-polls for.done/.failedsignal files and parses them intoTaskCompletionSignalGhTaskMerger— merges task PR viagh, updates local integration branch ref, writes.mergedsignalWorktreeTaskTeardown— captures agent log, kills tmux window, removes worktree and branch (best-effort cleanup)- Completed
TmuxAgentstubs:from_config()creates tmux window and launches Claude Code;send_kickoff()waits for TUI ready then sends prompt - Added
signal_dir: Optional[Path]toTaskStateandTaskStateView - Implementation modules named after their protocol (
task_preparer.pyimplementsTaskPreparer, etc.) with docstrings cross-referencing the protocol - 768 tests pass, 0 pyright errors
Mirror src/agentrelay/ package structure in test/
- Restructured flat
test/directory (29 files at root) into subdirectories matchingsrc/agentrelay/package layout:agent/,agent_comm_protocol/,errors/,ops/,orchestrator/,spec/,task_graph/,task_runner/,task_runtime/,workstream/ - Renamed files where subdirectory makes prefix redundant (e.g.
test_ops_git.py→ops/test_git.py,test_protocol_manifest.py→agent_comm_protocol/test_manifest.py) - Top-level modules (
test_task.py,test_environments.py,test_workspace.py) and cross-cuttingtest_docs_examples.pyremain attest/root - No source edits — all imports are absolute;
conftest.pystays at root; pytest discovers subdirectories recursively - 731 tests pass, 0 pyright errors
Add protocol schemas, builders, and templates
- New
agent_comm_protocol/package implementing Layers 1-3 of the agent communication protocol defined inAGENT_COMM_PROTOCOL.md agent_comm_protocol/manifest.py—TaskManifestfrozen dataclass +build_manifest()builder +manifest_to_dict()serializer (Layer 1: structured task facts). UsesAgentRoleenum andpathlib.Pathfor type safety.agent_comm_protocol/policies.py—WorkflowPoliciesfrozen dataclass +build_policies()builder +policies_to_dict()serializer (Layer 3: composable workflow config). IntroducesWorkflowActionandPrBodySectionenums for type-safe policy actions.agent_comm_protocol/templates.py—resolve_instructions()loads and parameterizes role templates usingstring.Template(Layer 2: work instructions)- New
spec/package withSpecRepresentationprotocol andPythonStubSpecimplementation (spec format abstraction) - Four role templates in
src/agentrelay/templates/:spec_writer.md,test_writer.md,test_reviewer.md,implementer.md TaskPathsfields changed fromstrtopathlib.Pathfor type-safe path handling- Builder functions accept explicit parameters (not TaskGraph/TaskRuntime) to keep the protocol layer decoupled from graph and runtime layers
Define agent communication protocol
- Added
docs/AGENT_COMM_PROTOCOL.md— specification for orchestrator-agent communication, replacing the monolithic instruction builder approach from PR #90 (closed) - Five-layer protocol: task manifest (structured facts), work instructions (natural language, template-driven), workflow policies (composable JSON), signaling contract (abstract), and framework adapter (environment-specific)
- Role templates for formulaic tasks (test_writer, implementer, etc.) avoid duplicating instructions across identical task types
- Abstract workflow step vocabulary (commit_and_push, create_pr, run_completion_gate, etc.) decouples instruction content from framework- specific commands
2026-03-12
Add infrastructure primitives package (ops/)
- New
ops/package with thin, stateless subprocess and filesystem wrappers for the four infrastructure domains: git, tmux, gh CLI, and signal files ops/git.py— 9 functions: worktree, branch, fetch/push, ls-remoteops/tmux.py— 5 functions: window management, keys, capture, TUI readiness pollops/gh.py— 3 functions: PR create, merge, body fetchops/signals.py— 5 functions: signal dir management, JSON/text I/O, async poll- Private implementation detail — not part of public API; protocol implementations (PRs L/M) will compose these primitives
- Added shared
test/conftest.pywith git repo fixtures for real-subprocess tests - 47 new tests (git tests use real temp repos, tmux/gh use subprocess mocks, signals use real filesystem)
Restructure packages into core/ + implementations/ layout
- Split
agent/,task_runner/, andworkstream/intocore/(ABCs, protocols, state machines) andimplementations/(concrete environment- specific code) subpackages - Promoted
orchestrator.pymodule toorchestrator/package - All external import paths unchanged — package-level
__init__.pyre-exports maintain backward compatibility - Updated PlantUML diagram to reflect new package structure
- 624 tests pass, 0 pyright errors
Add OrchestratorListener protocol for real-time event observability
- Added
OrchestratorListenerprotocol with singleon_event(event)method Orchestratoraccepts an optionallistenerfield (defaultNone)- All 6 event emission sites now notify the listener in addition to accumulating events in the result list
- No behavioral change when listener is omitted — existing tests pass unchanged
Remove _transition_to_failed escape hatch
- Removed silent fallback in
TaskRunner._transition_to_failedthat bypassed the transition table when FAILED wasn't a legal target from the current status - The method now delegates to
_transition()unconditionally (after idempotency check), makingALLOWED_TASK_TRANSITIONSthe single authoritative source of legal state edges
2026-03-11
Remove speculative RemoteWorkspaceRef
- Removed
RemoteWorkspaceRef(6 optional placeholder fields for a remote execution model that doesn't exist yet) - Removed
kinddiscriminator fromLocalWorkspaceRef(unnecessary without a union to dispatch over;isinstancesuffices) WorkspaceReftype alias now points toLocalWorkspaceRefalone, with docstring documenting how to extend it to a union when additional backends are needed
Wire integration errors into TaskRunner and orchestrator
- Renamed
integration_errors/→errors/(shorter; class names are self-descriptive) - Removed
WorktreeIntegrationError(unused alias) and customcausefield (use standardraise ... fromand__cause__instead) - Wired
classify_integration_error()intoTaskRunner._record_io_failure()— IO boundary failures are now classified as expected-task-failure vs internal-error - Added
failure_class: Optional[IntegrationFailureClass]toTaskRunResult - Orchestrator now inspects
failure_classto distinguish internal adapter errors (fail-fast, no retry) from expected task failures (retry eligible)
Add mutation methods to TaskRuntime and WorkstreamRuntime
- Added
prepare_for_attempt,mark_failed,reset_for_retry,mark_pendingmethods toTaskRuntime— encapsulate orchestrator-level state transitions - Added
activate,deactivate,mark_failed,mark_mergedmethods toWorkstreamRuntime - Replaced all direct field assignments in
orchestrator.pywith method calls (~15 mutation sites), eliminating the dual-writer pattern identified in the architecture review (concern #4, part B)
Add read-only view protocols for task and workstream runtimes
- Added
TaskStateView,TaskArtifactsView,TaskRuntimeViewprotocols totask_runtime/runtime.py— read-only interfaces structurally satisfied by the mutable dataclasses - Added
WorkstreamStateView,WorkstreamArtifactsView,WorkstreamRuntimeViewprotocols toworkstream/runtime.py concernsfields exposed asSequence[str]on views (prevents.append()through the view whilelist[str]still satisfies it structurally)- Pure additions — no behavioral changes; protocols will be used in a follow-up PR to enforce single-writer discipline in the orchestrator
Remove Agent from TaskRuntime
- Removed live
Agentfield fromTaskRuntime— agent is now a local variable inTaskRunner.run(), not stored on the data record - Added
agent_address: AgentAddress | NonetoTaskArtifactsas an immutable audit trail of where the agent ran - Changed
TaskKickoff.kickoff()to acceptagentas an explicit parameter instead of reading it fromruntime.agent
2026-03-10
Simplify Task.dependencies to store IDs instead of Task objects
- Changed
Task.dependenciesfromtuple[Task, ...]totuple[str, ...], eliminating the dual representation where the builder converted IDs to objects andTaskGraphextracted IDs back out - Removed
validate_task_identity_consistency(no longer needed — string IDs have no object-graph identity to validate) - Simplified
TaskGraphBuilderconstruction loop (no longer requires topological order to thread object references)
Decompose TaskRunnerIO and remove integration_contracts
- Promoted
task_runner.pytotask_runner/package with per-step Protocol interfaces (TaskPreparer,TaskLauncher,TaskKickoff,TaskCompletionChecker,TaskMerger,TaskTeardown) composed into aTaskRunnerIOfrozen dataclass - Added
WorkstreamRunnerandWorkstreamRunnerIOtoworkstream/package with per-step Protocols (WorkstreamPreparer,WorkstreamMerger,WorkstreamTeardown) for the workstream lifecycle - Created
workspace.pymodule forLocalWorkspaceRef,RemoteWorkspaceRef, andWorkspaceReftype alias - Added
concerns: tuple[str, ...]field toTaskCompletionSignal - Deleted
integration_contracts/package — its protocols and data types were absorbed intotask_runner/io.py,workstream/io.py, andworkspace.py - Updated PlantUML diagram to reflect new package structure
PlantUML diagram infrastructure and conventions cleanup
- Replaced Mermaid class diagram with PlantUML source (
docs/diagram.puml) + rendered SVG (docs/diagram.svg), giving full layout control - Added
plantumlconda dependency andpixi run diagramtask - Added CI freshness check in
docs.ymlto keep SVG in sync with.puml - Renamed
task_graph/indexing.py→_indexing.pyandvalidation.py→_validation.py(underscore prefix for internal-only modules) - Updated
CLAUDE.mdcoding conventions: public API uses classes; private submodules may use free functions with<<module>>diagram stereotype - Audited diagram connectors: removed redundant
TaskGraphBuilder → Taskanderror_functions → IntegrationError; added missingTaskRunnerIO ..> TaskRuntime - Simplified
DIAGRAM.mdto link-only SVG view with streamlined PR policy - Fixed mkdocs API docs and nav for renamed modules; added
md_in_htmlandattr_listmarkdown extensions
2026-03-08
Fix Pylance/pyright config for test files — PR #63
Updated [tool.pyright] in pyproject.toml:
- Removed
test/**fromexcludeso VS Code/Pylance resolves imports in test files (previously excluded files are not analysed interactively) - Added
extraPaths = ["src"]for reliable package discovery under thesrc/layout regardless of editable-install detection
Also updated .gitignore.
2026-03-07
Rename archive → prototypes/v01 — PR #58
Renamed src/agentrelay/archive/ → src/agentrelay/prototypes/v01/,
test/archive/ → test/prototypes/v01/, and docs/archive/v1/ →
docs/prototypes/v01/. Updated all Python import paths and documentation
references accordingly. "Prototypes" more accurately describes the role of
this code than "archive".
2026-03-06
Architecture Pivot — PR #51
Promote current architecture to main package, archive prototype, set up mkdocs
The original prototype proved the concept but lacked clean separation between task specifications (immutable) and runtime state (mutable). A complete architectural redesign was created in parallel with cleaner data models, better testability, and design for future extensibility.
This PR completes the transition by:
- Promoting core modules (Task, TaskRuntime, Agent, AgentEnvironment) to root level in src/agentrelay/
- Archiving all prototype modules in src/agentrelay/prototypes/v01/ for reference
- Setting up mkdocs with mkdocstrings for auto-generated API documentation
- Creating comprehensive new documentation structure
Result: Current architecture is now the primary implementation. All new development targets it.
Key files: All core modules at src/agentrelay/. Prototype reference in src/agentrelay/prototypes/v01/.
For historical record of prototype development, see docs/prototypes/v01/HISTORY.md.
Foundation — PRs #48–#50
Build current architecture
Three PRs established the clean data model:
- PR #48 — Core types:
Task(frozen spec),TaskRuntime(mutable envelope),TaskState,TaskArtifacts, addressing types - PR #49 —
Agentclass andTmuxAgentconcrete implementation - PR #50 — Refine
Agentas ABC, introduceAgentEnvironmenttype alias andTmuxEnvironment
Result: 467 comprehensive tests, clean separation of concerns, foundation ready for workflow implementation.
Historical Note
For a detailed history of prototype development (PRs #36–#46), see docs/prototypes/v01/HISTORY.md. The prototype served as a proof-of-concept and informed the design of the current architecture.