Changelog
Chronological log of significant changes to the main codebase. For full details see each PR on GitHub.
2026-03-19
Docs, demo graphs, and E2E testing (PR P)
- Demo graphs: Replaced outdated
graphs/demo.yamlwith two tested demos (quick_parallel.yaml,quick_chained.yaml) versioned alongside the code. - E2E scripts: Three shell scripts in
tools/for running graphs against external target repos: e2e_run.sh— validates target repo, runs a graph.e2e_reset.sh— resets a graph run in a target repo.e2e_check.sh— preflight check (pixi, agentrelay dependency, Python, gh auth, agent environment, working tree cleanliness, leftover state).- Conflict detection:
run_graphnow errors if.workflow/<graph>or.worktrees/<graph>already exists, preventing corrupted state from overlapping runs.--dry-runreports conflicts as warnings. - Pixi tasks:
e2e,e2e-reset,e2e-check. - Updated docs: GUIDE.md rewritten with current-architecture-first CLI reference and E2E section. WORKFLOW.md expanded with practical run/reset cycle examples.
2026-03-18
Composition and CLI entry point (PR N2)
build_standard_workstream_runner(): New builder inorchestrator/builders.pythat wiresGitWorkstreamPreparer,GhWorkstreamMerger, andGitWorkstreamTeardowninto aStandardWorkstreamRunner. Mirrors thebuild_standard_runner()pattern.run_graph.py: New top-level module providing:run_graph()async composition function: loads YAML, builds graph + runners + orchestrator, and runs to completion.dry_run(): validates graph YAML and prints execution plan (task order, dependencies, workstreams).- CLI via
python -m agentrelay.run_graph <graph.yaml>with flags:--max-concurrency,--max-task-attempts,--teardown-mode,--tmux-session,--model,--dry-run. - Operational YAML keys:
tmux_session,keep_panes,modelare popped from the raw YAML before graph parsing, allowingTaskGraphBuilderto stay unchanged. CLI flags override YAML values. - 19 new tests: unit tests for YAML preprocessing, builder, and dry-run; integration tests verifying full wiring from graph YAML through orchestrator with test doubles.
Reset tool (PR O)
reset_graph.py: New top-level module for resetting a repository to its pre-graph-run state. Reads.workflow/<graph>/run_info.json(written byrun_graph), then: closes open PRs, resets main and force-pushes (with ancestry safety check), deletes remote/local branches, removes worktree and workflow directories.plan_reset()/execute_reset(): separated planning from execution for testability.- CLI via
python -m agentrelay.reset_graph <graph.yaml>with--yesflag. - Out-of-order reset detection: skips main-branch reset if
start_headis not an ancestor of current HEAD, still performs all other cleanup. - Idempotent: re-running after a successful reset is safe.
run_info.json:run_graphnow writesstart_head+started_atto.workflow/<graph>/run_info.jsonbefore orchestrator runs.- New ops primitives:
git.rev_parse_head,git.merge_base_is_ancestor,git.reset_hard,git.push_force_with_lease,git.ls_remote_branches,gh.pr_list,gh.pr_close. - 10 new tests: plan/execute against temp git repos with remotes, out-of-order detection, idempotency, PR closing (mocked), run_info.json integration.
2026-03-16
Add graphical popups to overview diagram
- Per-package mini SVGs:
tools/generate_overview.py --mode package-svgsextracts each top-level package's D2 block fromdocs/diagram.d2, renders it as a standalone SVG viad2, and embeds all 13 mini SVGs (base64-encoded) in the overview HTML. - Click-to-open popups: Clicking a package box opens a centered panel showing that package's D2-rendered class diagram. Clicking an arrow shows both endpoint packages side-by-side with a text list of class-level connections. Click outside or press Escape to close.
- Build pipeline:
pixi run diagramnow runs thepackage-svgsmode after rendering the overview SVG, producingdocs/pkg-detail/*.d2and*.svgfiles. - 21 new tests covering D2 block extraction, per-package file generation, SVG rendering (mocked subprocess), and popup HTML generation.
Add two-tier diagram system with auto-generated overview
- Overview generator: New
tools/generate_overview.pyparsesdocs/diagram.d2, extracts top-level packages and cross-package relationships, deduplicates arrows to one per package pair, and writesdocs/diagram-overview.d2with tooltips listing each package's classes. - Two rendered views:
pixi run diagramnow generates bothdiagram-overview.svg(13 package boxes, ~19 directional arrows) anddiagram.svg(full class detail). - Zero-drift: Overview is derived from the detail diagram — single source of truth.
- 27 new tests covering parsing, deduplication, edge cases, and end-to-end validation against the real diagram.
Migrate design diagram from PlantUML to D2
- Tool change: Replaced PlantUML with D2 using the ELK layout engine for better handling of nested containers and cross-package relationship arrows at scale.
- Faithful translation: All ~80 classes/interfaces/enums across 12+ packages and
~170 relationships preserved in the new
docs/diagram.d2source. - Dependency swap: Replaced
plantumlwithd2in pixi.toml;pixi run diagramnow invokesd2 --layout elk. - Archived original PlantUML source as
docs/diagram.puml.archivedfor reference.
Extract _OrchestratorRun from Orchestrator
- Separation of concerns: Split
Orchestratorinto an immutable config holder and_OrchestratorRun, a module-private execution context that owns all mutable state for onerun()invocation. - Readable main loop:
execute()is ~20 lines — initialize, dispatch, handle deadlock, await, process completions, build result. - Simplified helper signatures: Helpers access
self._task_runtimes,self._orchestrator.graph, etc. directly instead of threading 3-6 parameters. - Deduplicated fail-fast: Extracted
_fail_fast_cancel()to replace the 3 repeated cancel-mark-clear patterns. - Pure refactoring — no behavioral changes, all existing tests pass unchanged.
2026-03-15
Wire WorkstreamRunner into Orchestrator
- WorkstreamRunner as Protocol: Promoted
WorkstreamRunnerto a Protocol (matching theTaskRunnerpattern). Renamed the concrete implementation toStandardWorkstreamRunner. The Orchestrator depends only on the protocol. - Orchestrator lifecycle wiring: Added
workstream_runneras a required field onOrchestrator. The orchestrator now callsprepare()before the first task in a workstream,merge()when all tasks reachPR_MERGED, andteardown()after the main loop exits. - MERGE_READY status: Added
WorkstreamStatus.MERGE_READYas an intermediate state betweenACTIVEandMERGED. Enables future human-approval gates before integration branch merge. - Derived active-task check: Removed
active_task_idfromWorkstreamStateand theactivate()/deactivate()methods fromWorkstreamRuntime. The one-task-per-workstream constraint is now enforced by scanning task runtimes forRUNNING/PR_CREATEDstatus. - fail_fast_on_workstream_error: New
OrchestratorConfigoption (defaultTrue). When a workstream fails duringprepare(), prevents preparing new workstreams but does not cancel in-flight work. - Updated diagram, exports, and all orchestrator tests.
2026-03-14
Flatten WorkstreamRunnerIO into WorkstreamRunner
- Removed
WorkstreamRunnerIOintermediate dataclass.WorkstreamRunnernow holds the three per-step protocol fields (_preparer,_merger,_teardown) directly, matching the pattern established byStandardTaskRunneron the task side. - Deleted
test/workstream/implementations/test_workstream_runner_io.py(tested the removed composition class). - Updated diagram, architecture docs, and exports.
StandardTaskRunner with per-step dispatch
- TaskRunner protocol: Promoted
TaskRunnerLiketoTaskRunner(protocol intask_runner/core/runner.py). The orchestrator depends only on this protocol. - StandardTaskRunner: Renamed the concrete
TaskRunnerclass toStandardTaskRunner. Replacedio: TaskRunnerIOwith sixStepDispatch[T]fields that co-locate step sequencing and implementation dispatch. - StepDispatch[T]: New generic frozen dataclass in
task_runner/core/dispatch.py. Selects per-step protocol implementations based on(AgentFramework, type[AgentEnvironment])dispatch key. Supportsentriesdict +defaultfallback. - Workstream context via TaskState: Added
integration_branchandworkstream_worktree_pathfields toTaskStateandTaskStateView. The orchestrator sets these fromWorkstreamRuntime.statebefore dispatch, removing workstream-specific constructor args fromWorktreeTaskPreparerandGhTaskMerger. - Builder: New
build_standard_runner()factory intask_runner/implementations/standard_runner_builder.pywires the standard worktree + tmux + Claude Code implementations viaStepDispatchdefaults. - TaskRunnerIO: Retained but marked deprecated; no longer the primary composition mechanism.
- Removed
TaskRunnerLikefromorchestrator/— replaced by importedTaskRunnerprotocol fromtask_runner.
2026-03-13
Add workstream-level protocol implementations
- Three concrete classes implementing the
WorkstreamRunnerIOper-step protocols, composingops/primitives: GitWorkstreamPreparer— creates git worktree and integration branch, pushes to origin with upstream trackingGhWorkstreamMerger— creates and merges workstream integration PR via GitHub CLI, updates local merge-target refGitWorkstreamTeardown— removes worktree, deletes local and remote integration branch (best-effort cleanup)- Added
git.push_delete_branch()thin wrapper toops/git.py
Fix task-level workspace model
WorktreeTaskPreparerno longer creates a per-task worktree. Instead it creates a task branch in the shared workstream worktree viagit.branch_create()+git.checkout(). Newworkstream_worktree_pathconfig field points to the worktree owned by the workstream preparer.WorktreeTaskTeardownno longer removes the worktree (owned by workstream teardown). Still deletes the task branch and captures agent logs.- Added
git.checkout()thin wrapper toops/git.py.
Add task-level protocol implementations
- Six concrete classes implementing the
TaskRunnerIOper-step protocols, composingops/primitives and protocol builders from theagent_comm_protocol/package: WorktreeTaskPreparer— creates git worktree, writesmanifest.json,policies.json, andinstructions.mdto the signal directoryTmuxTaskLauncher— delegates toTmuxAgent.from_config()to launch Claude Code in a tmux paneTmuxTaskKickoff— sends kickoff instructions to the launched agentSignalCompletionChecker— async-polls for.done/.failedsignal files and parses them intoTaskCompletionSignalGhTaskMerger— merges task PR viagh, updates local integration branch ref, writes.mergedsignalWorktreeTaskTeardown— captures agent log, kills tmux window, removes worktree and branch (best-effort cleanup)- Completed
TmuxAgentstubs:from_config()creates tmux window and launches Claude Code;send_kickoff()waits for TUI ready then sends prompt - Added
signal_dir: Optional[Path]toTaskStateandTaskStateView - Implementation modules named after their protocol (
task_preparer.pyimplementsTaskPreparer, etc.) with docstrings cross-referencing the protocol - 768 tests pass, 0 pyright errors
Mirror src/agentrelay/ package structure in test/
- Restructured flat
test/directory (29 files at root) into subdirectories matchingsrc/agentrelay/package layout:agent/,agent_comm_protocol/,errors/,ops/,orchestrator/,spec/,task_graph/,task_runner/,task_runtime/,workstream/ - Renamed files where subdirectory makes prefix redundant (e.g.
test_ops_git.py→ops/test_git.py,test_protocol_manifest.py→agent_comm_protocol/test_manifest.py) - Top-level modules (
test_task.py,test_environments.py,test_workspace.py) and cross-cuttingtest_docs_examples.pyremain attest/root - No source edits — all imports are absolute;
conftest.pystays at root; pytest discovers subdirectories recursively - 731 tests pass, 0 pyright errors
Add protocol schemas, builders, and templates
- New
agent_comm_protocol/package implementing Layers 1-3 of the agent communication protocol defined inAGENT_COMM_PROTOCOL.md agent_comm_protocol/manifest.py—TaskManifestfrozen dataclass +build_manifest()builder +manifest_to_dict()serializer (Layer 1: structured task facts). UsesAgentRoleenum andpathlib.Pathfor type safety.agent_comm_protocol/policies.py—WorkflowPoliciesfrozen dataclass +build_policies()builder +policies_to_dict()serializer (Layer 3: composable workflow config). IntroducesWorkflowActionandPrBodySectionenums for type-safe policy actions.agent_comm_protocol/templates.py—resolve_instructions()loads and parameterizes role templates usingstring.Template(Layer 2: work instructions)- New
spec/package withSpecRepresentationprotocol andPythonStubSpecimplementation (spec format abstraction) - Four role templates in
src/agentrelay/templates/:spec_writer.md,test_writer.md,test_reviewer.md,implementer.md TaskPathsfields changed fromstrtopathlib.Pathfor type-safe path handling- Builder functions accept explicit parameters (not TaskGraph/TaskRuntime) to keep the protocol layer decoupled from graph and runtime layers
Define agent communication protocol
- Added
docs/AGENT_COMM_PROTOCOL.md— specification for orchestrator-agent communication, replacing the monolithic instruction builder approach from PR #90 (closed) - Five-layer protocol: task manifest (structured facts), work instructions (natural language, template-driven), workflow policies (composable JSON), signaling contract (abstract), and framework adapter (environment-specific)
- Role templates for formulaic tasks (test_writer, implementer, etc.) avoid duplicating instructions across identical task types
- Abstract workflow step vocabulary (commit_and_push, create_pr, run_completion_gate, etc.) decouples instruction content from framework- specific commands
2026-03-12
Add infrastructure primitives package (ops/)
- New
ops/package with thin, stateless subprocess and filesystem wrappers for the four infrastructure domains: git, tmux, gh CLI, and signal files ops/git.py— 9 functions: worktree, branch, fetch/push, ls-remoteops/tmux.py— 5 functions: window management, keys, capture, TUI readiness pollops/gh.py— 3 functions: PR create, merge, body fetchops/signals.py— 5 functions: signal dir management, JSON/text I/O, async poll- Private implementation detail — not part of public API; protocol implementations (PRs L/M) will compose these primitives
- Added shared
test/conftest.pywith git repo fixtures for real-subprocess tests - 47 new tests (git tests use real temp repos, tmux/gh use subprocess mocks, signals use real filesystem)
Restructure packages into core/ + implementations/ layout
- Split
agent/,task_runner/, andworkstream/intocore/(ABCs, protocols, state machines) andimplementations/(concrete environment- specific code) subpackages - Promoted
orchestrator.pymodule toorchestrator/package - All external import paths unchanged — package-level
__init__.pyre-exports maintain backward compatibility - Updated PlantUML diagram to reflect new package structure
- 624 tests pass, 0 pyright errors
Add OrchestratorListener protocol for real-time event observability
- Added
OrchestratorListenerprotocol with singleon_event(event)method Orchestratoraccepts an optionallistenerfield (defaultNone)- All 6 event emission sites now notify the listener in addition to accumulating events in the result list
- No behavioral change when listener is omitted — existing tests pass unchanged
Remove _transition_to_failed escape hatch
- Removed silent fallback in
TaskRunner._transition_to_failedthat bypassed the transition table when FAILED wasn't a legal target from the current status - The method now delegates to
_transition()unconditionally (after idempotency check), makingALLOWED_TASK_TRANSITIONSthe single authoritative source of legal state edges
2026-03-11
Remove speculative RemoteWorkspaceRef
- Removed
RemoteWorkspaceRef(6 optional placeholder fields for a remote execution model that doesn't exist yet) - Removed
kinddiscriminator fromLocalWorkspaceRef(unnecessary without a union to dispatch over;isinstancesuffices) WorkspaceReftype alias now points toLocalWorkspaceRefalone, with docstring documenting how to extend it to a union when additional backends are needed
Wire integration errors into TaskRunner and orchestrator
- Renamed
integration_errors/→errors/(shorter; class names are self-descriptive) - Removed
WorktreeIntegrationError(unused alias) and customcausefield (use standardraise ... fromand__cause__instead) - Wired
classify_integration_error()intoTaskRunner._record_io_failure()— IO boundary failures are now classified as expected-task-failure vs internal-error - Added
failure_class: Optional[IntegrationFailureClass]toTaskRunResult - Orchestrator now inspects
failure_classto distinguish internal adapter errors (fail-fast, no retry) from expected task failures (retry eligible)
Add mutation methods to TaskRuntime and WorkstreamRuntime
- Added
prepare_for_attempt,mark_failed,reset_for_retry,mark_pendingmethods toTaskRuntime— encapsulate orchestrator-level state transitions - Added
activate,deactivate,mark_failed,mark_mergedmethods toWorkstreamRuntime - Replaced all direct field assignments in
orchestrator.pywith method calls (~15 mutation sites), eliminating the dual-writer pattern identified in the architecture review (concern #4, part B)
Add read-only view protocols for task and workstream runtimes
- Added
TaskStateView,TaskArtifactsView,TaskRuntimeViewprotocols totask_runtime/runtime.py— read-only interfaces structurally satisfied by the mutable dataclasses - Added
WorkstreamStateView,WorkstreamArtifactsView,WorkstreamRuntimeViewprotocols toworkstream/runtime.py concernsfields exposed asSequence[str]on views (prevents.append()through the view whilelist[str]still satisfies it structurally)- Pure additions — no behavioral changes; protocols will be used in a follow-up PR to enforce single-writer discipline in the orchestrator
Remove Agent from TaskRuntime
- Removed live
Agentfield fromTaskRuntime— agent is now a local variable inTaskRunner.run(), not stored on the data record - Added
agent_address: AgentAddress | NonetoTaskArtifactsas an immutable audit trail of where the agent ran - Changed
TaskKickoff.kickoff()to acceptagentas an explicit parameter instead of reading it fromruntime.agent
2026-03-10
Simplify Task.dependencies to store IDs instead of Task objects
- Changed
Task.dependenciesfromtuple[Task, ...]totuple[str, ...], eliminating the dual representation where the builder converted IDs to objects andTaskGraphextracted IDs back out - Removed
validate_task_identity_consistency(no longer needed — string IDs have no object-graph identity to validate) - Simplified
TaskGraphBuilderconstruction loop (no longer requires topological order to thread object references)
Decompose TaskRunnerIO and remove integration_contracts
- Promoted
task_runner.pytotask_runner/package with per-step Protocol interfaces (TaskPreparer,TaskLauncher,TaskKickoff,TaskCompletionChecker,TaskMerger,TaskTeardown) composed into aTaskRunnerIOfrozen dataclass - Added
WorkstreamRunnerandWorkstreamRunnerIOtoworkstream/package with per-step Protocols (WorkstreamPreparer,WorkstreamMerger,WorkstreamTeardown) for the workstream lifecycle - Created
workspace.pymodule forLocalWorkspaceRef,RemoteWorkspaceRef, andWorkspaceReftype alias - Added
concerns: tuple[str, ...]field toTaskCompletionSignal - Deleted
integration_contracts/package — its protocols and data types were absorbed intotask_runner/io.py,workstream/io.py, andworkspace.py - Updated PlantUML diagram to reflect new package structure
PlantUML diagram infrastructure and conventions cleanup
- Replaced Mermaid class diagram with PlantUML source (
docs/diagram.puml) + rendered SVG (docs/diagram.svg), giving full layout control - Added
plantumlconda dependency andpixi run diagramtask - Added CI freshness check in
docs.ymlto keep SVG in sync with.puml - Renamed
task_graph/indexing.py→_indexing.pyandvalidation.py→_validation.py(underscore prefix for internal-only modules) - Updated
CLAUDE.mdcoding conventions: public API uses classes; private submodules may use free functions with<<module>>diagram stereotype - Audited diagram connectors: removed redundant
TaskGraphBuilder → Taskanderror_functions → IntegrationError; added missingTaskRunnerIO ..> TaskRuntime - Simplified
DIAGRAM.mdto link-only SVG view with streamlined PR policy - Fixed mkdocs API docs and nav for renamed modules; added
md_in_htmlandattr_listmarkdown extensions
2026-03-08
Fix Pylance/pyright config for test files — PR #63
Updated [tool.pyright] in pyproject.toml:
- Removed
test/**fromexcludeso VS Code/Pylance resolves imports in test files (previously excluded files are not analysed interactively) - Added
extraPaths = ["src"]for reliable package discovery under thesrc/layout regardless of editable-install detection
Also updated .gitignore.
2026-03-07
Rename archive → prototypes/v01 — PR #58
Renamed src/agentrelay/archive/ → src/agentrelay/prototypes/v01/,
test/archive/ → test/prototypes/v01/, and docs/archive/v1/ →
docs/prototypes/v01/. Updated all Python import paths and documentation
references accordingly. "Prototypes" more accurately describes the role of
this code than "archive".
2026-03-06
Architecture Pivot — PR #51
Promote current architecture to main package, archive prototype, set up mkdocs
The original prototype proved the concept but lacked clean separation between task specifications (immutable) and runtime state (mutable). A complete architectural redesign was created in parallel with cleaner data models, better testability, and design for future extensibility.
This PR completes the transition by:
- Promoting core modules (Task, TaskRuntime, Agent, AgentEnvironment) to root level in src/agentrelay/
- Archiving all prototype modules in src/agentrelay/prototypes/v01/ for reference
- Setting up mkdocs with mkdocstrings for auto-generated API documentation
- Creating comprehensive new documentation structure
Result: Current architecture is now the primary implementation. All new development targets it.
Key files: All core modules at src/agentrelay/. Prototype reference in src/agentrelay/prototypes/v01/.
For historical record of prototype development, see docs/prototypes/v01/HISTORY.md.
Foundation — PRs #48–#50
Build current architecture
Three PRs established the clean data model:
- PR #48 — Core types:
Task(frozen spec),TaskRuntime(mutable envelope),TaskState,TaskArtifacts, addressing types - PR #49 —
Agentclass andTmuxAgentconcrete implementation - PR #50 — Refine
Agentas ABC, introduceAgentEnvironmenttype alias andTmuxEnvironment
Result: 467 comprehensive tests, clean separation of concerns, foundation ready for workflow implementation.
Historical Note
For a detailed history of prototype development (PRs #36–#46), see docs/prototypes/v01/HISTORY.md. The prototype served as a proof-of-concept and informed the design of the current architecture.