Parallel Workloads & Multi-Tenant

How multiple issues run concurrently, resource group isolation, group-specific runners, and scaling

Resource Groups: The Parallelism Model

GitLab's resource groups are the foundation of how this pipeline handles concurrent work. Every job in the pipeline declares:

resource_group: issue-$ISSUE_IID

This one line creates a powerful isolation model.

What resource groups do

Different issues = parallel

Issue #5 and Issue #12 have different resource groups (issue-5 vs issue-12). They can run their stages simultaneously on the same runner. 10 issues can be in-flight at once.

Same issue = sequential

Issue #5's specification can't start while its clarification is still running. Jobs for the same issue queue up and execute one at a time. This prevents race conditions on labels and artifacts.

Visual: How parallel execution works

Time →→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→

Issue #5:  [triage] [clarify] [WAIT: spec] [spec] [checklist] ...
                                      ^
                                    human gate

Issue #12: [triage] [clarify] [spec-ok] [checklist] [plan] [tasks] ...
           (well-specified, no clarification needed)

Issue #23: [triage] [needs-clarif] [waiting for human...]

Issue #41: [triage] [clarify] [spec] [checklist] [plan] [tasks] [analysis] ...

green = automatic    blue = manual-triggered    amber = waiting

Key insight: issues move at different speeds

A well-specified bug fix might fly through all 11 stages in 90 minutes. A vague feature request might sit in clarification for days waiting for human answers. The pipeline handles both cases naturally - fast issues don't wait for slow ones.

Scaling Characteristics

Parallel Issues

Jobs Per Issue

Runner Instance

Max Retries

Dimension	Current	Bottleneck	Scaling Path
Issues in parallel	Multiple (limited by runner)	Runner CPU/memory	Add more runners
Same issue concurrency	1 (resource group lock)	By design	N/A - intentional constraint
Runner instances	1	Single runner architecture	Register additional runners with same tags
API rate limits	~60 req/min (GitLab)	GitLab API throttle	Request caching, batch operations
AI token limits	~100k tokens/min	Anthropic API throttle	Stagger requests, use smaller models for triage
Artifact storage	30-day retention	Disk space	S3 backend, shorter retention for non-critical

Multi-Tenant Architecture

The pipeline supports multiple GitLab groups (tenants) with shared tooling but isolated execution environments. Currently: administrators and developers.

SHARED: Pipeline Library (administrators/cicd)
+---------------------------------------------------------------+
|  .gitlab-ci.yml              (main pipeline definition)       |
|  .gitlab-ci/templates/       (base job templates)             |
|  .gitlab-ci/groups/          (group-specific overrides)       |
|  scripts/                    (utility scripts)                |
|  specs/templates/            (spec templates)                 |
+---------------------------------------------------------------+
        |                                    |
        | include:rules                      | include:rules
        | if: namespace == administrators    | if: namespace == developers
        v                                    v
ADMIN CONTEXT                           DEV CONTEXT
+----------------------------+   +----------------------------+
| Security: BLOCKING         |   | Security: ADVISORY         |
| Staging: Manual approval   |   | Staging: Auto-deploy       |
| Runner: protected + locked |   | Runner: not protected      |
| Token: GITLAB_TOKEN_ADMIN  |   | Token: GITLAB_TOKEN_DEV    |
| Use case: Infrastructure   |   | Use case: Applications     |
+----------------------------+   +----------------------------+
        |                                    |
        v                                    v
+----------------------------+   +----------------------------+
| admin-runner               |   | dev-runner                 |
| Tags: [administrators]     |   | Tags: [developers]         |
| Volumes:                   |   | Volumes:                   |
|   /opt/shared/skills:ro    |   |   /opt/shared/skills:ro    |
+----------------------------+   +----------------------------+

How Group Selection Works

When a project includes the shared pipeline, GitLab evaluates include:rules at pipeline creation time. The variable $CI_PROJECT_NAMESPACE determines which group config is loaded.

# In the shared .gitlab-ci.yml:
include:
  - local: '.gitlab-ci/groups/administrators.yml'
    rules:
      - if: $CI_GROUP == "administrators"
      - if: $CI_GROUP == null    # Default if not specified

  - local: '.gitlab-ci/groups/developers.yml'
    rules:
      - if: $CI_GROUP == "developers"

Why not variable interpolation?

You might expect include: 'groups/${CI_GROUP}.yml' to work. It doesn't - GitLab doesn't support variable interpolation in include paths. The include:rules pattern with fixed paths is the correct approach.

Group Configuration Comparison

Aspect	Administrators	Developers
Security scans	Blocking (pipeline fails)	Advisory (pipeline continues)
Staging deploy	Manual approval required	Auto-deploy on success
Production deploy	Manual (protected branch)	Manual (protected branch)
Runner protection	Protected + Locked	Not protected
API token scope	Full access (admin)	Read-only (scoped)
Typical projects	Infrastructure, core services	Applications, user-facing
Error tolerance	Zero tolerance for security	Warnings acceptable

Runner Isolation

Runners are isolated using three mechanisms working together. Tags alone are insufficient - a misconfigured runner could pick up jobs from the wrong group.

1. Tags

Jobs specify tags: [administrators] or tags: [developers]. Runners only pick up jobs with matching tags. Both runners have run_untagged = false.

2. Protected Status

Admin runner is protected: true - it only runs on protected branches (main). Dev runner is unprotected and can run on any branch.

3. Locked Status

Admin runner is locked: true - it can't be shared to other projects outside its group. Dev runner is unlocked for flexibility.

Runner Configuration Comparison

admin-runner                          dev-runner
+------------------------------+   +------------------------------+
| name = "admin-runner"        |   | name = "dev-runner"          |
| executor = "docker"          |   | executor = "docker"          |
| run_untagged = false         |   | run_untagged = false         |
| locked = true                |   | locked = false               |
| protected = true             |   | protected = false            |
| tags = [administrators]      |   | tags = [developers]          |
|                              |   |                              |
| [docker]                     |   | [docker]                     |
|   privileged = false         |   |   privileged = false         |
|   volumes = [                |   |   volumes = [                |
|     docker.sock,             |   |     docker.sock,             |
|     /opt/shared/skills:ro    |   |     /opt/shared/skills:ro    |
|   ]                          |   |   ]                          |
+------------------------------+   +------------------------------+

Shared Skills Architecture

Both runners and both users need access to the same Claude Code skills. But Docker containers can't follow symlinks from the host. The solution: a shared directory mounted as a read-only volume.

flowchart TB
    subgraph "Host Filesystem"
        A["/opt/shared/claude-skills/"]
    end

    subgraph "Docker Runners"
        B["admin-runner container"]
        C["dev-runner container"]
    end

    subgraph "User Home Dirs"
        D["/home/administrator/.claude/skills/"]
        E["/home/websurfinmurf/.claude/skills/"]
    end

    A -->|"Volume mount :ro"| B
    A -->|"Volume mount :ro"| C
    A -.->|"Symlink"| D
    A -.->|"Symlink"| E

    B --> F["/opt/skills/ inside container"]
    C --> G["/opt/skills/ inside container"]

    style A fill:#10b981,color:#fff
    style B fill:#ef4444,color:#fff
    style C fill:#3b82f6,color:#fff

Two access patterns, one source of truth

Runners access skills via Docker volume mount (/opt/shared/claude-skills:/opt/skills:ro). Users access the same skills via filesystem symlinks from their ~/.claude/skills/ directory. Updates to /opt/shared/claude-skills/ propagate to everyone.

Secrets Isolation

Each group has its own GitLab access token and CI variables. A compromised developer token cannot access administrator projects.

Secret	Scope	Storage
`GL_TOKEN`	Per-group	GitLab CI variable (masked, protected)
`GITLAB_TOKEN_ADMIN`	Administrators only	Dashboard env var
`GITLAB_TOKEN_DEV`	Developers only	Dashboard env var
`MATRIX_BOT_TOKEN`	Shared (notifications)	GitLab CI variable (masked)
`MATRIX_ROOM_ID`	Shared (notifications)	GitLab CI variable

No cross-group access

The admin token has access to administrators/* projects only. The dev token has access to developers/* projects only. Even if a developer's runner is compromised, it cannot access infrastructure project secrets or code.

Adding a New Group

The multi-tenant architecture is designed to scale. Adding a new group (e.g., contractors) requires these steps:

#	Step	What To Do
1	GitLab	Create `contractors` group, add members
2	Keycloak	Create matching group, configure group mapper in JWT
3	Pipeline	Add `.gitlab-ci/groups/contractors.yml` with group-specific settings
4	Runner	Register new runner with `contractors` tag
5	Labels	Run `scripts/replicate-labels.sh administrators contractors`
6	Dashboard	Add `contractors` to allowed list + add `GITLAB_TOKEN_CONTRACTORS`
7	Secrets	Create group-specific GitLab access token

No code changes needed

The dashboard only needs the allowed array updated in extractGroup(). The pipeline only needs a new group YAML file. Everything else is configuration.

Observability Stack

Pipeline health is monitored through three channels that work together to provide complete visibility.

Grafana Dashboard

Visual metrics: pipeline duration, success rate, stage times, failure trends. URL: grafana.ai-servicers.com/d/cicd/

Promtail + Loki

Centralized logging. All container logs are auto-discovered by Promtail, shipped to Loki, and queryable in Grafana. No per-service configuration needed.

Matrix Notifications

Real-time alerts to #cicd-notifications. Color-coded: green (success), red (failure), orange (manual gate). Bot: @cicd-bot:ai-servicers.com

Pipeline Jobs  →  Container Logs  →  Promtail (auto-discovery)  →  Loki  →  Grafana
                                                                                     |
Pipeline Events  →  notify-matrix.sh  →  Matrix Bot  →  #cicd-notifications     |
                                                                                     v
                                                                           Unified Dashboard

Architecture Decision Records

Major design decisions are tracked as ADRs using log4brains and published to nginx. Current Phase 4 decisions:

ID	Decision	Status
T4.1	Parallel vs Sequential CI Jobs	Proposed
T4.2	SAST Tool Selection	Proposed
T4.3	Test Environment Strategy	Proposed
T4.4	Deployment Rollout Strategy	Proposed
T4.5	Rollback Triggers	Proposed
T4.6a	Pilot Scope Selection	Proposed
T4.6b	Pilot Success Metrics	Proposed

ADRs auto-publish to nginx.ai-servicers.com/cicd/decisions/ when changes are pushed to the docs/adr/ directory. They can also be managed as DECISION cards on the GitLab board.

Risk Mitigations

Risk	Impact	Mitigation
JWT missing groups claim	Dashboard access fails	Keycloak group mapper + fallback to `realm_access.roles`
Runner picks wrong jobs	Cross-group security breach	Tags + protected refs + locked runners (3 layers)
Label drift between groups	Inconsistent boards	Periodic `replicate-labels.sh` or shared template
Shared skills break	Both groups blocked	Git version control; test before merge to /opt/shared
Token leak	Cross-group access	Per-group tokens; 90-day rotation; masked CI variables
Deployment failure	Service downtime	Pre-deploy snapshot + automatic rollback + health checks