GitLab CI/CD for Infrastructure
- What CI/CD does for infrastructure
- .gitlab-ci.yml structure
- Stages and jobs
- A real infra pipeline
- Running Ansible in a CI job
- Passing secrets — CI/CD variables
- SSH keys in CI
- Ansible Vault in CI
- Reading a failed pipeline job
- Artifacts
- rules — control when jobs run
- Manual deployment approval
- Runners — what runs your jobs
What CI/CD does for infrastructure
Without CI/CD, deploying infrastructure changes looks like: push to GitLab → someone manually SSH in → run ansible-playbook → hope they used the right inventory and flags.
With CI/CD, pushing a branch triggers automatic jobs that:
- Lint the Ansible code (
ansible-lint) - Run a syntax check (
ansible-playbook --syntax-check) - Optionally run a dry-run in staging
- Deploy to production when the MR is merged
This means every change is automatically validated before it can merge, and deployment is consistent and repeatable — not dependent on who runs the command or what flags they remember.
.gitlab-ci.yml structure
The pipeline is defined in a file called .gitlab-ci.yml at the root of the repo.
# .gitlab-ci.yml — basic structure
stages: # define the order of stages
- lint
- check
- deploy
variables: # repo-level variable defaults
ANSIBLE_FORCE_COLOR: "1"
lint-ansible: # job name
stage: lint # which stage this belongs to
image: cytopia/ansible:latest # Docker image to run in
script:
- ansible-lint site.yml
only:
- merge_requests
- main
Key concepts:
- stages — jobs in the same stage run in parallel; stages run sequentially
- image — the Docker image the job runs inside
- script — the commands to run (one per line)
- rules — when this job should run (on what branches or events).
only:andexcept:are the legacy syntax; preferrules:in all new pipelines — it is more expressive and GitLab has soft-deprecatedonly/except.
Stages and jobs
Multiple jobs can exist within the same stage and run in parallel. Stages are sequential — if a job in the lint stage fails, the check and deploy stages do not run.
stages:
- lint
- check
- deploy
lint-ansible:
stage: lint
script:
- ansible-lint site.yml
lint-yaml:
stage: lint # runs in parallel with lint-ansible
script:
- yamllint .
syntax-check:
stage: check # only runs if lint stage passed
script:
- ansible-playbook site.yml --syntax-check
A real infra pipeline
---
stages:
- lint
- syntax
- check
- deploy
variables:
ANSIBLE_FORCE_COLOR: "1"
ANSIBLE_STDOUT_CALLBACK: yaml
# Do NOT set ANSIBLE_HOST_KEY_CHECKING=False — use SSH_KNOWN_HOSTS below instead
# Shared config applied to all jobs
.ansible-base: &ansible-base
image: willhallonline/ansible:2.14-ubuntu-22.04
before_script:
- eval "$(ssh-agent -s)"
- echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add -
- mkdir -p ~/.ssh
- echo "$SSH_KNOWN_HOSTS" > ~/.ssh/known_hosts
lint:
<<: *ansible-base
stage: lint
script:
- ansible-lint
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
- if: $CI_COMMIT_BRANCH == "main"
syntax-check:
<<: *ansible-base
stage: syntax
script:
- ansible-playbook site.yml --syntax-check -i inventories/production/hosts.ini
dry-run:
<<: *ansible-base
stage: check
script:
- ansible-playbook site.yml --check --diff -i inventories/production/hosts.ini
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
deploy-production:
<<: *ansible-base
stage: deploy
script:
- echo "$ANSIBLE_VAULT_PASS" > .vault_pass
- ansible-playbook site.yml -i inventories/production/hosts.ini --vault-password-file .vault_pass
- rm -f .vault_pass
rules:
- if: $CI_COMMIT_BRANCH == "main"
when: manual # requires a human to click "play" in GitLab
Running Ansible in a CI job
The CI runner needs to be able to SSH to your infrastructure hosts. The standard approach:
- Generate a dedicated deploy SSH key pair (no passphrase):
ssh-keygen -t ed25519 -f deploy_key -N "" - Add the public key to
~/.ssh/authorized_keyson every managed host (or via FreeIPA) - Store the private key as a CI/CD variable named
SSH_PRIVATE_KEY - Load it in the job's
before_script
before_script:
- eval "$(ssh-agent -s)"
- echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add -
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
tr -d '\r' removes Windows-style carriage returns that sometimes appear when copying a key through the browser. Without it, ssh-add may reject the key.
Passing secrets — CI/CD variables
In GitLab: Settings → CI/CD → Variables. Add these for an Ansible infra project:
SSH_PRIVATE_KEY— the deploy private key (masked, not protected unless you want branch restrictions)SSH_KNOWN_HOSTS— output ofssh-keyscan -H host1 host2to avoid host key promptsANSIBLE_VAULT_PASS— the vault password (masked)
Marking a variable masked prevents it from appearing in job logs. Marking it protected means only protected branches (like main) can access it.
SSH keys in CI
# Generate known_hosts to avoid interactive prompts
ssh-keyscan -H web01.example.com mail01.example.com >> known_hosts_file
# Paste the output into a CI variable: SSH_KNOWN_HOSTS
# Then in before_script:
echo "$SSH_KNOWN_HOSTS" > ~/.ssh/known_hosts
chmod 644 ~/.ssh/known_hosts
Ansible Vault in CI
# In the CI job script
- echo "$ANSIBLE_VAULT_PASS" > /tmp/.vault_pass
- ansible-playbook site.yml -i inventories/production/hosts.ini --vault-password-file /tmp/.vault_pass
- rm /tmp/.vault_pass # clean up
Writing to a temp file and deleting it is preferable to passing the password directly via -e vault_password=..., which would appear in the process list and potentially in logs.
Reading a failed pipeline job
When a pipeline fails, click the failed job (shown in red) to read its log. The log shows every command that ran and their output.
What to look for:
- Which command failed — look for
$ commandlines and the exit code (ERROR: Job failed: exit code 1) - Ansible task failure — look for
fatal:lines followed by a JSON error object with amsgfield - SSH failure — look for
UNREACHABLEfollowed by an SSH error message - Lint failure — ansible-lint prints rule violations in the format
rulename: description [tag]
# Example failed ansible-lint output in CI log
$ ansible-lint
WARNING Listing 2 violation(s) that are fatal
roles/nginx/tasks/main.yml:12: yaml[truthy] Truthy value should be one of [false, true]
roles/nginx/handlers/main.yml:3: no-handler Use [module] instead of command/shell for service management
Finished with 2 failure(s), 0 warning(s) on 8 files.
ERROR: Job failed: exit code 2
Artifacts
Jobs can save files that persist after the job ends. Useful for saving Ansible output, reports, or files that later jobs need.
dry-run:
stage: check
script:
- ansible-playbook site.yml --check --diff 2>&1 | tee ansible-output.txt
artifacts:
name: "ansible-dry-run-${CI_COMMIT_SHORT_SHA}"
paths:
- ansible-output.txt
expire_in: 7 days
when: always # save even on failure
rules — control when jobs run
# Run only on merge requests
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
# Run only on main branch
rules:
- if: $CI_COMMIT_BRANCH == "main"
# Run on MRs AND main
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
- if: $CI_COMMIT_BRANCH == "main"
# Skip if commit message contains [skip-ci]
rules:
- if: $CI_COMMIT_MESSAGE =~ /\[skip-ci\]/
when: never
- when: on_success
Manual deployment approval
For production deployments, require a human to click "play" in the GitLab pipeline view:
deploy-production:
stage: deploy
script:
- ansible-playbook site.yml -i inventories/production/hosts.ini
rules:
- if: $CI_COMMIT_BRANCH == "main"
when: manual # shows a play button in the pipeline view
allow_failure: false # pipeline stays "blocked" until triggered
Runners — what runs your jobs
A runner is a service that picks up CI jobs and executes them. There are two types:
- Shared runners — provided by GitLab; run in isolated containers; no access to your internal network
- Self-hosted runners — you run these on a machine in your network; they can reach internal hosts; needed for Ansible deployments to private infrastructure
Check runner status: Settings → CI/CD → Runners.
If your job shows "Waiting for runner" or "No runners available", the runner is offline or no runner matches the job's tags.
# Specify a job must run on a runner with a specific tag
deploy-production:
tags:
- infra # only run on runners tagged "infra"
script:
- ansible-playbook site.yml