How AI agents and GitLab work together to automate software delivery
This is a fully automated software development pipeline that takes a GitLab issue (a feature request, bug report, or enhancement) and moves it through an 11-stage process from triage to deployment. AI agents do the heavy lifting at each stage, while humans retain control at critical gates.
Think of it as an assembly line: each station (pipeline stage) has a specialist (AI agent) that performs one job, passes artifacts forward, and updates the tracking board so everyone can see progress.
The system is organized in five layers, each with a distinct responsibility. Understanding these layers is key to understanding how everything connects.
LAYER 4: VISIBILITY +---------------------------------------------------------------+ | GitLab Kanban Board | | Scoped labels: status::, type::, priority::, agent:: | | 6-column workflow: backlog > ready > in-progress > review > | | blocked > done | +---------------------------------------------------------------+ | push-based label updates v LAYER 3: ORCHESTRATION +---------------------------------------------------------------+ | GitLab CI/CD Pipeline (11 stages) | | Trigger: issue event, manual, or API call | | Resource groups: 1 job per issue at a time | | Group configs: administrators (blocking) / developers (adv.) | +---------------------------------------------------------------+ | structured methodology v LAYER 2: METHODOLOGY +---------------------------------------------------------------+ | Spec-driven workflow: | | clarify > specify > checklist > plan > tasks > | | analyze > implement | | Templates: spec.md, plan.md, tasks.md, analysis.md, etc. | | Artifacts stored: specs/issue-{IID}/ | +---------------------------------------------------------------+ | communicates via v LAYER 1.5: INTEGRATION +---------------------------------------------------------------+ | glab CLI + GL_TOKEN (Claude <-> GitLab communication) | | GitLab API endpoints (label updates, comments, issues) | +---------------------------------------------------------------+ | executed by v LAYER 1: EXECUTION +---------------------------------------------------------------+ | Claude Code CLI (5 agents) | | PM | Architect | Developer | Security | QA | +---------------------------------------------------------------+ | runs on v LAYER 0: INFRASTRUCTURE +---------------------------------------------------------------+ | GitLab | PostgreSQL | Keycloak | Traefik | Matrix | Grafana | +---------------------------------------------------------------+
The GitLab Kanban board is the single source of truth for issue status. Scoped labels (status::triage, status::planning, etc.) move cards across columns automatically as the pipeline progresses.
GitLab CI/CD drives the pipeline. It sequences the 11 stages, enforces manual gates, manages resource groups (so the same issue isn't processed twice simultaneously), and loads group-specific security configs.
The spec-driven methodology provides the "process brain." Each stage follows a structured step: clarify, specify, checklist, plan, tasks, analyze, implement. Quality gates catch problems before code is written.
Five Claude Code CLI agents do the actual work. Each agent has a specialty (PM triages, Architect specs, Developer codes, Security audits, QA validates). They read issue context and produce artifacts.
Each issue travels through these stages in order. Auto stages run immediately; manual stages wait for human approval.
Green = automatic | Amber = manual human gate
| # | Stage | Agent | Duration | Gate | What Happens |
|---|---|---|---|---|---|
| 0 | Verify | — | ~1 min | Auto | Smoke test: runner alive, tools installed, no AI in runner (architectural constraint) |
| 1 | Triage | PM | ~5 min | Auto | Fetch issue, auto-apply labels (type::feature, status::triage) |
| 2 | Clarification | PM | ~10 min | Auto | Generate clarification questions if issue is vague; post as comment |
| 3 | Specification | Architect | ~15 min | Manual | Generate formal spec (WHAT + WHY). Human must approve before planning starts. |
| 4 | Spec-Checklist | QA | ~5 min | Auto | Validate spec: overview, requirements, user stories, acceptance criteria, security, TBDs |
| 5 | Planning | Architect | ~15 min | Auto | Create implementation plan with tasks, dependencies, files to modify |
| 6 | Task Generation | Developer | ~10 min | Auto | Break plan into 3-5 granular tasks with acceptance criteria |
| 7 | Task Analysis | QA | ~10 min | Auto | Validate task completeness, check dependencies, generate dependency graph |
| 8 | Implementation | Developer | ~30 min | Manual | Create branch, generate code, commit, push. Human triggers this stage. |
| 9 | Security | Security | ~10 min | Auto | Scan for hardcoded secrets, privileged containers, eval(), network exposure |
| 10 | Testing | QA | ~20 min | Auto | Auto-detect project type, run appropriate test suite, generate report |
| 11 | Deployment | Developer | ~15 min | Manual | Record pre-deploy state, execute deploy.sh, verify health, auto-rollback on failure |
The three manual gates (Specification, Implementation, Deployment) ensure humans review AI-generated specs before planning starts, approve implementation before code is written, and confirm deployment before changes go live. This keeps humans in the decision loop for high-impact actions.
Each agent is a specialized Claude Code CLI invocation with a focused prompt. They don't share state directly - instead they communicate through GitLab artifacts and issue comments.
Triages new issues, applies labels, generates clarification questions. First to touch every issue. Decides if requirements are clear enough to proceed.
Writes specifications (WHAT + WHY), creates implementation plans (HOW), defines data models and API contracts. Translates requirements into technical blueprints.
Breaks plans into tasks, writes code, creates feature branches, generates merge requests. The "hands on keyboard" agent.
Scans for hardcoded credentials, privileged containers, secrets path violations, eval() usage, network exposure. Blocking for admins, advisory for devs.
Validates spec completeness (6-point checklist), analyzes task dependencies, auto-detects test frameworks, runs test suites, generates reports.
Every job uses resource_group: issue-$ISSUE_IID. This GitLab feature ensures that for a given issue, only one pipeline stage runs at a time. Issue #5's specification can't start while its clarification is still running. But Issue #5 and Issue #12 can run their stages in parallel.
Each issue has its own conveyor belt (resource group). Items on different belts move independently, but items on the same belt go one at a time. This prevents race conditions where two stages try to modify the same issue simultaneously.
Each stage produces markdown artifacts stored in specs/issue-{IID}/. The specification stage writes spec.md. The planning stage reads spec.md and writes plan.md. The task stage reads plan.md and writes tasks.md. Each stage builds on the previous one's output.
The runner image (Python 3.12-slim with git, curl, jq) deliberately excludes Claude Code CLI. Runners execute deterministic jobs only. If AI is needed, it's invoked as a separate service, not embedded in the runner. The verify stage actually checks for this and fails if Claude is found installed.
The same pipeline behaves differently depending on which group triggers it. Administrators get blocking security scans (pipeline fails on issues). Developers get advisory scans (pipeline continues with warnings). This is controlled by conditional include:rules in the CI config.
glab ci run --branch main --variables "ISSUE_IID:42"
curl -X POST ".../trigger/pipeline" -F "ref=main" -F "variables[ISSUE_IID]=42"
Run Pipeline > Add variable ISSUE_IID = issue number
Most pipeline stages require the ISSUE_IID variable to know which issue to process. Without it, only the verify stage runs (it's designed as a standalone connectivity test).
Detailed walkthrough of each stage: what happens, what artifacts are produced, how labels are updated, and where the handoffs occur.
How the Kanban board drives the workflow. Label taxonomy, the "Read comment" checkbox protocol, DECISION/FEATURE/INFO card types, and the AI self-approval pattern.
How multiple issues run concurrently, resource group isolation, group-specific runners, multi-tenant dashboard, and scaling characteristics.
| Resource | URL / Path |
|---|---|
| GitLab Project | gitlab.ai-servicers.com/administrators/cicd |
| Kanban Board | Board #2 |
| Grafana Dashboard | grafana.ai-servicers.com/d/cicd/ |
| ADR Documentation | nginx.ai-servicers.com/cicd/decisions/ |
| Matrix Notifications | #cicd-notifications:ai-servicers.com |
| Runner Image | registry.gitlab.ai-servicers.com/administrators/cicd/cicd-runner:latest |
| CI Config | .gitlab-ci.yml (1543 lines, 11 stages) |