agent-coordination

Implementation Plan - Role-based Task Delegation for aiagentchat

Final Plan v2.0 Coordination 3 AI Plans + 2 Reviews 2026-02-07
400 New LOC
4/10 Complexity
8/10 Confidence
6-7 Days (Parallel)
21 Tasks
11 Risks

Phase Timeline (4 Phases, 21 Tasks)

1 Foundation 2 days
1.1 Event Contracts + Schemas
Architect Group A 3h
1.2 Configuration Extensions (5 env vars)
Developer Group A 2h
1.3 Matrix Client Extensions (4 methods)
Developer Group A 4h
1.4 CoordinationClient Core Module (~150 LOC)
Developer 6h
1.5 Sync Filter + Custom Event Validation (CRITICAL GATE)
Developer 3h
1.6 Phase 1 Unit Tests (13+ tests)
QA 4h
2 CLI Delegation 2 days
2.1 Delegation Methods (delegate, post_status, notify)
Developer Group B 6h
2.2 cchat send Extension
Developer Group B 4h
2.3 Coordination Thread (5th daemon thread)
Developer 6h
2.4 Reply Loop Integration (complete/fail status)
Developer 5h
2.5 cchat status + Gateway Endpoint
Developer 3h
2.6 Offline Detection (configurable heartbeat)
Developer 2h
3 Agent-to-Agent 1.5 days
3.1 Outbound Delegation Tracker (with restart recovery)
Developer Group C 5h
3.2 Inbound Status Matching
Developer 6h
3.3 Cross-Notification
Developer Group C 3h
3.4 Active Delegation Tracking
Developer 3h
3.5 Security Review
Security 3h
4 Polish 0.5 day
4.1 GitLab Reference Parsing (metadata only)
Developer Group D 2h
4.2 Documentation Updates
Developer Group D 3h
4.3 CI/CD Integration
QA 3h
4.4 Final Integration Testing
QA 4h

Key Architectural Decisions

Custom Events CRITICAL
com.aiagentchat.request/status/role events validated via pre-flight gate (Task 1.5)
Thread Safety HIGH
_outbound_delegations guarded by threading.Lock; coordination thread uses threading.Thread
Restart Recovery HIGH
Tracker reconstructed from room timeline scan on startup
Graceful Shutdown HIGH
coordination_loop uses threading.Event stop signal; joins cleanly
Visit-Rooms Model MEDIUM
Agents visit each other's rooms for delegation; no shared coordination room
Zero New Dependencies MEDIUM
Pure Python stdlib + httpx; no Redis, no new containers

Dependency Graph

Phase 1 (Foundation) 1.1 [Architect] Event Contracts --+ 1.2 [Dev] Config Extensions --+-- Group A (parallel) 1.3 [Dev] Matrix Client Extensions --+ 1.4 [Dev] CoordinationClient Core <-- needs Group A |-> 1.5 [Dev] Sync + Validation (CRITICAL GATE) |-> 1.6 [QA] Phase 1 Unit Tests Phase 2 (CLI Delegation) 2.1 [Dev] Delegation Methods --+-- Group B, needs 1.5 2.2 [Dev] cchat send --+ 2.3 [Dev] Coordination Thread <-- needs 2.1 |-> 2.4 [Dev] Reply Loop Integration |-> 2.5 [Dev] cchat status 2.6 [Dev] Offline Detection <-- needs 2.1, parallel with 2.5 Phase 3 (Agent-to-Agent) 3.1 [Dev] Outbound Tracker --+-- Group C, needs 2.4 3.3 [Dev] Cross-Notification --+ 3.2 [Dev] Inbound Matching <-- needs 3.1 |-> 3.4 [Dev] Active Delegations |-> 3.5 [Security] Security Review Phase 4 (Polish) 4.1 [Dev] GitLab Ref --+-- Group D, needs 3.2 4.2 [Dev] Documentation --+ 4.3 [QA] CI/CD <-- needs 4.1, 4.2 |-> 4.4 [QA] Final Integration Tests

Agent Utilization (~77h total effort)

PM
0h
Architect
P1
3h
Security
P3
3h
Developer
P1 (18h)
P2 (26h)
P3 (17h)
P4 (5h)
60h
QA
P1 (4h)
P4 (7h)
11h

Risk Register (Top 8)

IDRiskProbImpactMitigation
R1Custom events don't syncLowHighPre-flight validation gate
R2Thread contention on trackerMediumHighthreading.Lock, single writer
R3Race condition in role discoveryLowMediumCache with TTL, retry on miss
R5Sync latency (30s worst case)MediumMediumAcceptable for async; ack confirms
R6Offline detection false positivesLowMediumConfigurable HEARTBEAT_TTL
R9SDK session exits before delegation completesMediumMediumOutbound tracker bridges sessions
R11Tracker state loss on restartMediumMediumTimeline scan recovery on startup
R7Agent crash mid-taskLowMediumTTL marks stale; timeline scan on restart

Peer Review Feedback (Incorporated)

Success Metrics

MetricTarget
Tests140 existing + 30 unit + 8 integration
Coverage≥70% overall, 85% coordination.py
CLI Delegationcchat send admin "msg" works
Agent-to-AgentDelegation + result re-injection
Cross-notificationNotice events posted
Offline Detection[OFFLINE] returned immediately
Security ReviewPassed (Task 3.5)
Zero New Depshttpx only
CI PipelineAll tests green