GitLab CI/CD for Infrastructure

How pipelines work, how Ansible runs in CI, how to pass secrets, and how to read failed jobs.

On this page

What CI/CD does for infrastructure
.gitlab-ci.yml structure
Stages and jobs
A real infra pipeline
Running Ansible in a CI job
Passing secrets — CI/CD variables
SSH keys in CI
Ansible Vault in CI
Reading a failed pipeline job
Artifacts
rules — control when jobs run
Manual deployment approval
Runners — what runs your jobs

What CI/CD does for infrastructure

Without CI/CD, deploying infrastructure changes looks like: push to GitLab → someone manually SSH in → run ansible-playbook → hope they used the right inventory and flags.

With CI/CD, pushing a branch triggers automatic jobs that:

Lint the Ansible code (ansible-lint)
Run a syntax check (ansible-playbook --syntax-check)
Optionally run a dry-run in staging
Deploy to production when the MR is merged

This means every change is automatically validated before it can merge, and deployment is consistent and repeatable — not dependent on who runs the command or what flags they remember.

.gitlab-ci.yml structure

The pipeline is defined in a file called .gitlab-ci.yml at the root of the repo.

# .gitlab-ci.yml — basic structure

stages:         # define the order of stages
  - lint
  - check
  - deploy

variables:      # repo-level variable defaults
  ANSIBLE_FORCE_COLOR: "1"

lint-ansible:                  # job name
  stage: lint                  # which stage this belongs to
  image: cytopia/ansible:latest  # Docker image to run in
  script:
    - ansible-lint site.yml
  only:
    - merge_requests
    - main

Key concepts:

stages — jobs in the same stage run in parallel; stages run sequentially
image — the Docker image the job runs inside
script — the commands to run (one per line)
rules — when this job should run (on what branches or events). only: and except: are the legacy syntax; prefer rules: in all new pipelines — it is more expressive and GitLab has soft-deprecated only/except.

Stages and jobs

Multiple jobs can exist within the same stage and run in parallel. Stages are sequential — if a job in the lint stage fails, the check and deploy stages do not run.

stages:
  - lint
  - check
  - deploy

lint-ansible:
  stage: lint
  script:
    - ansible-lint site.yml

lint-yaml:
  stage: lint          # runs in parallel with lint-ansible
  script:
    - yamllint .

syntax-check:
  stage: check         # only runs if lint stage passed
  script:
    - ansible-playbook site.yml --syntax-check

A real infra pipeline

---
stages:
  - lint
  - syntax
  - check
  - deploy

variables:
  ANSIBLE_FORCE_COLOR: "1"
  ANSIBLE_STDOUT_CALLBACK: yaml
  # Do NOT set ANSIBLE_HOST_KEY_CHECKING=False — use SSH_KNOWN_HOSTS below instead

# Shared config applied to all jobs
.ansible-base: &ansible-base
  image: willhallonline/ansible:2.14-ubuntu-22.04
  before_script:
    - eval "$(ssh-agent -s)"
    - echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add -
    - mkdir -p ~/.ssh
    - echo "$SSH_KNOWN_HOSTS" > ~/.ssh/known_hosts

lint:
  <<: *ansible-base
  stage: lint
  script:
    - ansible-lint
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH == "main"

syntax-check:
  <<: *ansible-base
  stage: syntax
  script:
    - ansible-playbook site.yml --syntax-check -i inventories/production/hosts.ini

dry-run:
  <<: *ansible-base
  stage: check
  script:
    - ansible-playbook site.yml --check --diff -i inventories/production/hosts.ini
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"

deploy-production:
  <<: *ansible-base
  stage: deploy
  script:
    - echo "$ANSIBLE_VAULT_PASS" > .vault_pass
    - ansible-playbook site.yml -i inventories/production/hosts.ini --vault-password-file .vault_pass
    - rm -f .vault_pass
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
  when: manual           # requires a human to click "play" in GitLab

Running Ansible in a CI job

The CI runner needs to be able to SSH to your infrastructure hosts. The standard approach:

Generate a dedicated deploy SSH key pair (no passphrase): ssh-keygen -t ed25519 -f deploy_key -N ""
Add the public key to ~/.ssh/authorized_keys on every managed host (or via FreeIPA)
Store the private key as a CI/CD variable named SSH_PRIVATE_KEY
Load it in the job's before_script

before_script:
  - eval "$(ssh-agent -s)"
  - echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add -
  - mkdir -p ~/.ssh
  - chmod 700 ~/.ssh

tr -d '\r' removes Windows-style carriage returns that sometimes appear when copying a key through the browser. Without it, ssh-add may reject the key.

Passing secrets — CI/CD variables

In GitLab: Settings → CI/CD → Variables. Add these for an Ansible infra project:

SSH_PRIVATE_KEY — the deploy private key (masked, not protected unless you want branch restrictions)
SSH_KNOWN_HOSTS — output of ssh-keyscan -H host1 host2 to avoid host key prompts
ANSIBLE_VAULT_PASS — the vault password (masked)

Marking a variable masked prevents it from appearing in job logs. Marking it protected means only protected branches (like main) can access it.

SSH keys in CI

# Generate known_hosts to avoid interactive prompts
ssh-keyscan -H web01.example.com mail01.example.com >> known_hosts_file

# Paste the output into a CI variable: SSH_KNOWN_HOSTS
# Then in before_script:
echo "$SSH_KNOWN_HOSTS" > ~/.ssh/known_hosts
chmod 644 ~/.ssh/known_hosts

Do not set ANSIBLE_HOST_KEY_CHECKING=False in production. It disables SSH host verification, which is a security risk. Use SSH_KNOWN_HOSTS instead to pre-populate the known hosts file.

Ansible Vault in CI

# In the CI job script
- echo "$ANSIBLE_VAULT_PASS" > /tmp/.vault_pass
- ansible-playbook site.yml -i inventories/production/hosts.ini --vault-password-file /tmp/.vault_pass
- rm /tmp/.vault_pass     # clean up

Writing to a temp file and deleting it is preferable to passing the password directly via -e vault_password=..., which would appear in the process list and potentially in logs.

Reading a failed pipeline job

When a pipeline fails, click the failed job (shown in red) to read its log. The log shows every command that ran and their output.

What to look for:

Which command failed — look for $ command lines and the exit code (ERROR: Job failed: exit code 1)
Ansible task failure — look for fatal: lines followed by a JSON error object with a msg field
SSH failure — look for UNREACHABLE followed by an SSH error message
Lint failure — ansible-lint prints rule violations in the format rulename: description [tag]

# Example failed ansible-lint output in CI log
$ ansible-lint
WARNING  Listing 2 violation(s) that are fatal

roles/nginx/tasks/main.yml:12: yaml[truthy] Truthy value should be one of [false, true]
roles/nginx/handlers/main.yml:3: no-handler Use [module] instead of command/shell for service management

Finished with 2 failure(s), 0 warning(s) on 8 files.
ERROR: Job failed: exit code 2

Artifacts

Jobs can save files that persist after the job ends. Useful for saving Ansible output, reports, or files that later jobs need.

dry-run:
  stage: check
  script:
    - ansible-playbook site.yml --check --diff 2>&1 | tee ansible-output.txt
  artifacts:
    name: "ansible-dry-run-${CI_COMMIT_SHORT_SHA}"
    paths:
      - ansible-output.txt
    expire_in: 7 days
    when: always    # save even on failure

rules — control when jobs run

# Run only on merge requests
rules:
  - if: $CI_PIPELINE_SOURCE == "merge_request_event"

# Run only on main branch
rules:
  - if: $CI_COMMIT_BRANCH == "main"

# Run on MRs AND main
rules:
  - if: $CI_PIPELINE_SOURCE == "merge_request_event"
  - if: $CI_COMMIT_BRANCH == "main"

# Skip if commit message contains [skip-ci]
rules:
  - if: $CI_COMMIT_MESSAGE =~ /\[skip-ci\]/
    when: never
  - when: on_success

Manual deployment approval

For production deployments, require a human to click "play" in the GitLab pipeline view:

deploy-production:
  stage: deploy
  script:
    - ansible-playbook site.yml -i inventories/production/hosts.ini
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
  when: manual          # shows a play button in the pipeline view
  allow_failure: false  # pipeline stays "blocked" until triggered

Runners — what runs your jobs

A runner is a service that picks up CI jobs and executes them. There are two types:

Shared runners — provided by GitLab; run in isolated containers; no access to your internal network
Self-hosted runners — you run these on a machine in your network; they can reach internal hosts; needed for Ansible deployments to private infrastructure

Check runner status: Settings → CI/CD → Runners.

If your job shows "Waiting for runner" or "No runners available", the runner is offline or no runner matches the job's tags.

# Specify a job must run on a runner with a specific tag
deploy-production:
  tags:
    - infra         # only run on runners tagged "infra"
  script:
    - ansible-playbook site.yml

Next: GitLab Merge Requests →