Git Workflow for Infrastructure Repos

Branch conventions, how to find what to edit, making clean commits, hotfixes, and staying current with main.

Branch naming conventions

Infra repos usually use a convention like:

feature/INF-1234-add-ntp-server     # ticket-driven change
fix/INF-1235-fix-postfix-relay       # bug fix
hotfix/mail-not-sending              # urgent fix
chore/update-ansible-version         # maintenance, no ticket
docs/update-readme                   # documentation only

The INF-1234 part is the ticket number. GitLab can auto-link commits and branches to issues if the ticket number appears in the branch name. Check your team's conventions — some use PROJ-NUM, some use a shorter format.

Starting a change safely

# Always start from an up-to-date main
git checkout main
git pull origin main

# Create your branch
git checkout -b feature/INF-1234-add-ntp-server

# Confirm you are on the right branch before editing anything
git branch        # shows current branch with *
git status        # should show clean working tree
Never work directly on main. Main is protected — you cannot push to it directly anyway. But working on a stale branch causes painful rebase conflicts later. Always start with a fresh pull.

Finding what file to edit

The most common question when making an infra change: "what file do I actually change?"

Adding a new NTP server to all hosts

# Find where chrony_servers is defined
grep -r "chrony_servers" inventories/

# Expected result:
# inventories/production/group_vars/all.yml:chrony_servers:
# Edit that file

Changing an nginx setting on webservers only

grep -r "nginx_client_max_body_size" inventories/ roles/

# If in group_vars/webservers.yml — change it there
# If only in role defaults/ — add it to group_vars/webservers.yml to override

Adding a new ACL to squid for one host

# Check if squid uses group_vars or host_vars
ls inventories/production/group_vars/
ls inventories/production/host_vars/

# Add to host_vars/squid01.yml if it is host-specific
# Add to group_vars/squidservers.yml if all squid hosts need it

Adding a new task to a role

# Find the role
ls roles/

# Look at tasks/main.yml to understand the structure
cat roles/nginx/tasks/main.yml

# Add your task to the appropriate sub-file
# e.g. roles/nginx/tasks/config.yml

Making a minimal, reviewable commit

A commit should do one thing. In an infra repo, that means:

# Stage only the files you intended to change
git add inventories/production/group_vars/all.yml

# NOT this — blindly stages everything including debug leftovers
git add .

# Review what you are about to commit
git diff --staged

# Commit
git commit -m "Add internal NTP server to chrony config"

Reviewing your own diff before pushing

# See everything you changed vs main
git diff main

# See what is staged
git diff --staged

# See a summary of changed files
git diff main --stat

# Run the Ansible dry-run locally
ansible-playbook site.yml --check --diff -i inventories/production/hosts.ini

Before you push and open an MR, make sure:

Staying current with main

If main has moved since you branched, rebase to incorporate the changes:

git fetch origin
git rebase origin/main

If there are conflicts during rebase:

git status                    # shows conflicted files
# Open each conflicted file and resolve the <<< === >>> markers
git add resolved-file.yml
git rebase --continue         # move to next commit

# If you want to abandon the rebase and start over
git rebase --abort

After a successful rebase, push with force:

git push --force-with-lease

Hotfix workflow

A production service is broken. You need to fix it right now, bypassing the normal MR review time.

# Branch from main (already up to date in this scenario)
git checkout main
git pull origin main
git checkout -b hotfix/mail-queue-blocked

# Make the minimal fix
# Test with --check --diff first even in a hotfix
ansible-playbook site.yml --check --diff --limit mail01 --tags postfix

# Commit and push immediately
git add . && git commit -m "hotfix: clear stuck mail queue — relayhost corrected"
git push -u origin hotfix/mail-queue-blocked

Open an MR. For genuine hotfixes, teams often allow one reviewer instead of two, or allow the senior engineer to self-approve. Check your team's policy.

Apply the fix directly if absolutely necessary — but open the MR anyway so the change is tracked and reviewed after the fact.

Reverting a bad merge

A bad change merged to main and is now causing problems. Revert the merge commit:

# Find the merge commit hash
git log --oneline main | head -5

# Revert the merge commit
# -m 1 tells git which parent to use (1 = main, 2 = the feature branch)
git revert -m 1 abc1234

# This creates a new revert commit — push and create a MR
git push -u origin revert/bad-change
Revert, do not reset. Main has already been pushed and potentially deployed from. A git reset --hard on main would require a force-push and create chaos for anyone who pulled. Revert creates a clean, auditable undo commit.

Commit message conventions

Good commit messages in an infra repo:

# Good — describes what changed and why
Add internal NTP servers to chrony config for all hosts

Ticket: INF-2431
External NTP access blocked by new firewall policy. Using
internal NTP servers ntp1.example.com and ntp2.example.com instead.

# Good — short is fine for small changes
Fix typo in postfix main.cf template

# Bad — tells you nothing
updates
fix
changes yaml

Use the imperative mood for the subject line: "Add", "Fix", "Update", "Remove" — not "Added", "Fixed", "Updated".

Tagging releases

In infrastructure repos, tags mark a known-good state that has passed review and been deployed. Tags let you quickly roll back to a specific deployed version.

Creating annotated tags

# Annotated tag (preferred) — stores tagger name, date, message
git tag -a v1.2.0 -m "Release v1.2.0 — add chrony NTP role"

# Lightweight tag (avoid for releases — no metadata)
git tag v1.2.0

Pushing tags to GitLab

# Push a specific tag
git push origin v1.2.0

# Push all local tags at once
git push origin --tags

Tagging conventions for infra repos

# Semantic versioning — MAJOR.MINOR.PATCH
v1.0.0    # Initial release
v1.1.0    # New feature (new role, new service)
v1.1.1    # Bug fix (template typo, handler fix)
v2.0.0    # Breaking change (inventory restructure, variable rename)

# Environment-prefixed tags
prod-2024-03-15   # Simple date-based tag for deployments
staging-v1.2.0    # Environment-scoped tag

Listing and checking out tags

# List all tags
git tag -l

# List tags matching a pattern
git tag -l "v1.*"

# Show what a tag points to
git show v1.2.0

# Deploy from a specific tag (e.g. to roll back)
git checkout v1.1.1
# Or in a pipeline:
git clone --branch v1.1.1 --depth 1 https://gitlab.internal/infra/config.git

Secrets hygiene

Secrets committed to a Git repository are permanently exposed — even if you delete them in a later commit, they remain in history. Prevention is far cheaper than remediation.

.gitignore for sensitive files

# .gitignore — add these patterns to every infra repo
.vault_pass.txt
*.vault_pass
vault_pass
secrets/
.env
*.pem
*.key
credentials.yml

Use ansible-vault for secrets in group_vars

# Encrypt the vault file — never commit the plaintext version
ansible-vault encrypt group_vars/all/vault.yml

# Check what's in a vault file without decrypting to disk
ansible-vault view group_vars/all/vault.yml

Finding accidentally committed secrets

# Search for a string across ALL commits (not just current files)
git log -S "password123" --all --oneline

# Search for a pattern
git log -G "api[_-]?key" --all --oneline

# Find secrets in the current working tree
grep -rn "password\|secret\|api_key" group_vars/ --include="*.yml" | grep -v vault
If a secret is found in history: Rotate the secret immediately (before cleaning history). Then use git filter-repo (preferred over git filter-branch) to rewrite history, and force-push. Notify your security team.

git blame and git log for investigations

During incidents, you often need to answer "when did this config line change, and who changed it?" These commands trace changes through history.

git blame — who last touched each line

# Show who last changed each line of a file
git blame roles/chrony/templates/chrony.conf.j2

# Show blame with date (easier to read)
git blame --date=short roles/chrony/templates/chrony.conf.j2

# Blame a specific line range
git blame -L 10,25 roles/chrony/templates/chrony.conf.j2

git log — trace the full history of a file

# Show all commits that touched a file (including renames)
git log --follow -p roles/chrony/templates/chrony.conf.j2

# Show just commit titles (no diff)
git log --follow --oneline roles/chrony/templates/chrony.conf.j2

# Find when a specific string was added or removed
git log -S "pool ntp.internal" --follow -p roles/chrony/templates/chrony.conf.j2

Checking what was deployed at a specific time

# See all commits from around the time of an incident
git log --since="2024-03-15 09:00" --until="2024-03-15 12:00" --oneline

# Show what the file looked like at a specific commit
git show abc1234:roles/chrony/templates/chrony.conf.j2

# Compare current state to what was deployed 2 weeks ago
git diff HEAD~14 -- roles/chrony/templates/chrony.conf.j2