How multiple issues run concurrently, resource group isolation, group-specific runners, and scaling
GitLab's resource groups are the foundation of how this pipeline handles concurrent work. Every job in the pipeline declares:
resource_group: issue-$ISSUE_IID
This one line creates a powerful isolation model.
Issue #5 and Issue #12 have different resource groups (issue-5 vs issue-12). They can run their stages simultaneously on the same runner. 10 issues can be in-flight at once.
Issue #5's specification can't start while its clarification is still running. Jobs for the same issue queue up and execute one at a time. This prevents race conditions on labels and artifacts.
Time →→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→ Issue #5: [triage] [clarify] [WAIT: spec] [spec] [checklist] ... ^ human gate Issue #12: [triage] [clarify] [spec-ok] [checklist] [plan] [tasks] ... (well-specified, no clarification needed) Issue #23: [triage] [needs-clarif] [waiting for human...] Issue #41: [triage] [clarify] [spec] [checklist] [plan] [tasks] [analysis] ... green = automatic blue = manual-triggered amber = waiting
A well-specified bug fix might fly through all 11 stages in 90 minutes. A vague feature request might sit in clarification for days waiting for human answers. The pipeline handles both cases naturally - fast issues don't wait for slow ones.
| Dimension | Current | Bottleneck | Scaling Path |
|---|---|---|---|
| Issues in parallel | Multiple (limited by runner) | Runner CPU/memory | Add more runners |
| Same issue concurrency | 1 (resource group lock) | By design | N/A - intentional constraint |
| Runner instances | 1 | Single runner architecture | Register additional runners with same tags |
| API rate limits | ~60 req/min (GitLab) | GitLab API throttle | Request caching, batch operations |
| AI token limits | ~100k tokens/min | Anthropic API throttle | Stagger requests, use smaller models for triage |
| Artifact storage | 30-day retention | Disk space | S3 backend, shorter retention for non-critical |
The pipeline supports multiple GitLab groups (tenants) with shared tooling but isolated execution environments. Currently: administrators and developers.
SHARED: Pipeline Library (administrators/cicd) +---------------------------------------------------------------+ | .gitlab-ci.yml (main pipeline definition) | | .gitlab-ci/templates/ (base job templates) | | .gitlab-ci/groups/ (group-specific overrides) | | scripts/ (utility scripts) | | specs/templates/ (spec templates) | +---------------------------------------------------------------+ | | | include:rules | include:rules | if: namespace == administrators | if: namespace == developers v v ADMIN CONTEXT DEV CONTEXT +----------------------------+ +----------------------------+ | Security: BLOCKING | | Security: ADVISORY | | Staging: Manual approval | | Staging: Auto-deploy | | Runner: protected + locked | | Runner: not protected | | Token: GITLAB_TOKEN_ADMIN | | Token: GITLAB_TOKEN_DEV | | Use case: Infrastructure | | Use case: Applications | +----------------------------+ +----------------------------+ | | v v +----------------------------+ +----------------------------+ | admin-runner | | dev-runner | | Tags: [administrators] | | Tags: [developers] | | Volumes: | | Volumes: | | /opt/shared/skills:ro | | /opt/shared/skills:ro | +----------------------------+ +----------------------------+
When a project includes the shared pipeline, GitLab evaluates include:rules at pipeline creation time. The variable $CI_PROJECT_NAMESPACE determines which group config is loaded.
# In the shared .gitlab-ci.yml:
include:
- local: '.gitlab-ci/groups/administrators.yml'
rules:
- if: $CI_GROUP == "administrators"
- if: $CI_GROUP == null # Default if not specified
- local: '.gitlab-ci/groups/developers.yml'
rules:
- if: $CI_GROUP == "developers"
You might expect include: 'groups/${CI_GROUP}.yml' to work. It doesn't - GitLab doesn't support variable interpolation in include paths. The include:rules pattern with fixed paths is the correct approach.
| Aspect | Administrators | Developers |
|---|---|---|
| Security scans | Blocking (pipeline fails) | Advisory (pipeline continues) |
| Staging deploy | Manual approval required | Auto-deploy on success |
| Production deploy | Manual (protected branch) | Manual (protected branch) |
| Runner protection | Protected + Locked | Not protected |
| API token scope | Full access (admin) | Read-only (scoped) |
| Typical projects | Infrastructure, core services | Applications, user-facing |
| Error tolerance | Zero tolerance for security | Warnings acceptable |
Runners are isolated using three mechanisms working together. Tags alone are insufficient - a misconfigured runner could pick up jobs from the wrong group.
Jobs specify tags: [administrators] or tags: [developers]. Runners only pick up jobs with matching tags. Both runners have run_untagged = false.
Admin runner is protected: true - it only runs on protected branches (main). Dev runner is unprotected and can run on any branch.
Admin runner is locked: true - it can't be shared to other projects outside its group. Dev runner is unlocked for flexibility.
admin-runner dev-runner +------------------------------+ +------------------------------+ | name = "admin-runner" | | name = "dev-runner" | | executor = "docker" | | executor = "docker" | | run_untagged = false | | run_untagged = false | | locked = true | | locked = false | | protected = true | | protected = false | | tags = [administrators] | | tags = [developers] | | | | | | [docker] | | [docker] | | privileged = false | | privileged = false | | volumes = [ | | volumes = [ | | docker.sock, | | docker.sock, | | /opt/shared/skills:ro | | /opt/shared/skills:ro | | ] | | ] | +------------------------------+ +------------------------------+
Both runners and both users need access to the same Claude Code skills. But Docker containers can't follow symlinks from the host. The solution: a shared directory mounted as a read-only volume.
flowchart TB
subgraph "Host Filesystem"
A["/opt/shared/claude-skills/"]
end
subgraph "Docker Runners"
B["admin-runner container"]
C["dev-runner container"]
end
subgraph "User Home Dirs"
D["/home/administrator/.claude/skills/"]
E["/home/websurfinmurf/.claude/skills/"]
end
A -->|"Volume mount :ro"| B
A -->|"Volume mount :ro"| C
A -.->|"Symlink"| D
A -.->|"Symlink"| E
B --> F["/opt/skills/ inside container"]
C --> G["/opt/skills/ inside container"]
style A fill:#10b981,color:#fff
style B fill:#ef4444,color:#fff
style C fill:#3b82f6,color:#fff
Runners access skills via Docker volume mount (/opt/shared/claude-skills:/opt/skills:ro). Users access the same skills via filesystem symlinks from their ~/.claude/skills/ directory. Updates to /opt/shared/claude-skills/ propagate to everyone.
Each group has its own GitLab access token and CI variables. A compromised developer token cannot access administrator projects.
| Secret | Scope | Storage |
|---|---|---|
GL_TOKEN | Per-group | GitLab CI variable (masked, protected) |
GITLAB_TOKEN_ADMIN | Administrators only | Dashboard env var |
GITLAB_TOKEN_DEV | Developers only | Dashboard env var |
MATRIX_BOT_TOKEN | Shared (notifications) | GitLab CI variable (masked) |
MATRIX_ROOM_ID | Shared (notifications) | GitLab CI variable |
The admin token has access to administrators/* projects only. The dev token has access to developers/* projects only. Even if a developer's runner is compromised, it cannot access infrastructure project secrets or code.
The multi-tenant architecture is designed to scale. Adding a new group (e.g., contractors) requires these steps:
| # | Step | What To Do |
|---|---|---|
| 1 | GitLab | Create contractors group, add members |
| 2 | Keycloak | Create matching group, configure group mapper in JWT |
| 3 | Pipeline | Add .gitlab-ci/groups/contractors.yml with group-specific settings |
| 4 | Runner | Register new runner with contractors tag |
| 5 | Labels | Run scripts/replicate-labels.sh administrators contractors |
| 6 | Dashboard | Add contractors to allowed list + add GITLAB_TOKEN_CONTRACTORS |
| 7 | Secrets | Create group-specific GitLab access token |
The dashboard only needs the allowed array updated in extractGroup(). The pipeline only needs a new group YAML file. Everything else is configuration.
Pipeline health is monitored through three channels that work together to provide complete visibility.
Visual metrics: pipeline duration, success rate, stage times, failure trends. URL: grafana.ai-servicers.com/d/cicd/
Centralized logging. All container logs are auto-discovered by Promtail, shipped to Loki, and queryable in Grafana. No per-service configuration needed.
Real-time alerts to #cicd-notifications. Color-coded: green (success), red (failure), orange (manual gate). Bot: @cicd-bot:ai-servicers.com
Pipeline Jobs → Container Logs → Promtail (auto-discovery) → Loki → Grafana
|
Pipeline Events → notify-matrix.sh → Matrix Bot → #cicd-notifications |
v
Unified Dashboard
Major design decisions are tracked as ADRs using log4brains and published to nginx. Current Phase 4 decisions:
| ID | Decision | Status |
|---|---|---|
| T4.1 | Parallel vs Sequential CI Jobs | Proposed |
| T4.2 | SAST Tool Selection | Proposed |
| T4.3 | Test Environment Strategy | Proposed |
| T4.4 | Deployment Rollout Strategy | Proposed |
| T4.5 | Rollback Triggers | Proposed |
| T4.6a | Pilot Scope Selection | Proposed |
| T4.6b | Pilot Success Metrics | Proposed |
ADRs auto-publish to nginx.ai-servicers.com/cicd/decisions/ when changes are pushed to the docs/adr/ directory. They can also be managed as DECISION cards on the GitLab board.
| Risk | Impact | Mitigation |
|---|---|---|
| JWT missing groups claim | Dashboard access fails | Keycloak group mapper + fallback to realm_access.roles |
| Runner picks wrong jobs | Cross-group security breach | Tags + protected refs + locked runners (3 layers) |
| Label drift between groups | Inconsistent boards | Periodic replicate-labels.sh or shared template |
| Shared skills break | Both groups blocked | Git version control; test before merge to /opt/shared |
| Token leak | Cross-group access | Per-group tokens; 90-day rotation; masked CI variables |
| Deployment failure | Service downtime | Pre-deploy snapshot + automatic rollback + health checks |