The Infra Change Lifecycle

End-to-end: from ticket to production deployment — what happens at each step and why.

Overview: the full lifecycle

Ticket → Understand → Find file → Branch → Edit → Lint → Dry-run → MR → CI → Review → Merge → Deploy → Verify

Each step catches a different class of problem. Skipping steps is how production outages happen.

1. Understand the change

Before touching any code, make sure you understand:

If the ticket is unclear, ask before starting. A 10-minute conversation is faster than a 2-hour incident.

2. Find what to change

# Search for the variable or config option
grep -r "setting_name" inventories/ roles/

# Find which template generates the config
grep -r "setting_name" roles/*/templates/

# Check the role defaults to understand what is configurable
cat roles/nginx/defaults/main.yml

Common outcomes:

3. Branch and edit

git checkout main
git pull origin main
git checkout -b feature/INF-1234-description-of-change

# Make your changes
# Use $EDITOR or your preferred tool

# Stage only what you intend to change
git add inventories/production/group_vars/webservers.yml

# Review the diff
git diff --staged

# Commit with a meaningful message
git commit -m "Update nginx client_max_body_size for webservers

Ticket: INF-1234
Increasing from 1m to 64m to support large file uploads.
Applies to all hosts in the webservers group."

4. Lint and validate locally

# YAML syntax check
yamllint inventories/production/group_vars/webservers.yml

# Ansible lint — checks for best practice violations
ansible-lint

# Syntax check the playbook
ansible-playbook site.yml --syntax-check -i inventories/production/hosts.ini

Fix any errors before proceeding. Lint failures in CI will block your MR anyway — catch them now.

5. Dry-run against production inventory

# Full dry-run with diff — see exactly what would change
ansible-playbook site.yml \
  --check --diff \
  -i inventories/production/hosts.ini \
  --limit webservers         # only run against webservers group

# Narrow further to a single host if possible
ansible-playbook site.yml \
  --check --diff \
  -i inventories/production/hosts.ini \
  --limit web01.example.com

Review the diff carefully:

Paste the --check --diff output into the MR description or attach it to the ticket.

6. Open the merge request

git push -u origin feature/INF-1234-description-of-change

Open the MR in GitLab. Use a description template like:

## What
[What is being changed]

## Why
Ticket: INF-1234 — [Brief description from ticket]

## Hosts affected
[list of hosts or groups]

## Dry-run output
[paste --check --diff output or attach file]

## Rollback
[How to revert: revert this MR commit, or manual steps]

7. Pipeline runs automated checks

When you push and open an MR, the CI pipeline runs automatically. Typical jobs:

  1. ansible-lint — must pass before anything else runs
  2. syntax-check — verifies all playbooks parse correctly
  3. dry-run (optional) — runs --check --diff against the production or staging inventory; output saved as an artifact

If the pipeline fails: click the failed job in GitLab, read the log, fix the issue, push again. The pipeline re-runs automatically.

8. Peer review

Assign a reviewer who knows the relevant system. What a good reviewer looks at:

Respond to comments, push updates, and mark conversations resolved when addressed.

9. Merge and deploy

Once approved and pipeline is green: merge the MR.

Depending on your pipeline setup:

# Manual deploy (if no CI automation)
git checkout main
git pull origin main
ansible-playbook site.yml \
  -i inventories/production/hosts.ini \
  --limit webservers \
  --tags nginx

# Run everything EXCEPT a specific tag (e.g. skip a long data migration)
ansible-playbook site.yml \
  -i inventories/production/hosts.ini \
  --skip-tags migrate-db
--check mode caveats: Not all modules support check mode. command and shell modules skip execution entirely in check mode, so subsequent tasks that depend on their output will also fail or report incorrectly. template and copy are reliable in check mode; command/shell are not. Always treat --check output as a guide, not a guarantee.

10. Verify in production

After deployment, confirm the change took effect and the service is healthy:

# Check the service is running
systemctl status nginx

# Check the config file has the expected content
# nginx.conf often includes conf.d/ — search there too
grep -r client_max_body_size /etc/nginx/

# Validate the live config
nginx -t

# Test the service responds
curl -v http://app.example.com/health

# Check logs for errors since the deployment
journalctl -u nginx --since "5 minutes ago"

Only close the ticket once you have confirmed the change works as expected.

Rollback

Something went wrong after merge. Act quickly:

Option 1: Revert the MR commit

# Find the merge commit
git log --oneline -5

# Revert it
git revert -m 1 COMMIT_HASH

# Push and open an MR for the revert
git push -u origin revert/fix-bad-change

Option 2: Quick emergency fix without waiting for review

git checkout main && git pull
git checkout -b hotfix/urgent-rollback

# Edit the file back to the previous value manually
git add . && git commit -m "hotfix: revert bad setting — causing nginx errors"
git push -u origin hotfix/urgent-rollback

# Open MR (mark as urgent), deploy, then close MR after the fact

Shortcuts and when to use them

In real-world situations, shortcuts exist. Use them consciously, not by default:

Config drift happens when shortcuts become habits. If you regularly make changes directly on hosts without going through the repo, the repo no longer reflects reality — and the next time Ansible runs, it will undo your manual changes.