Ansible Debugging

Reading error output, printing variables, using --check and --diff, and fixing common failures.

Check connectivity first

Before troubleshooting a playbook failure, confirm Ansible can actually reach the hosts.

# Ping all hosts
ansible all -i inventories/production/hosts.ini -m ping

# Ping one host
ansible web01 -i inventories/production/hosts.ini -m ping

# Check what user Ansible would connect as
ansible web01 -i inventories/production/hosts.ini -m debug -a "var=ansible_user"

A successful ping returns pong. An UNREACHABLE error means SSH cannot connect — wrong host, wrong user, key not in authorized_keys, or firewall blocking port 22.

Verbose mode (-v to -vvv)

Add -v flags to get progressively more output:

ansible-playbook site.yml -v     # task output and return values
ansible-playbook site.yml -vv    # more connection detail
ansible-playbook site.yml -vvv   # full SSH connection info, module args
ansible-playbook site.yml -vvvv  # connection plugin debugging

With -vvv, each task shows:

Start with -v when a task fails but the error message is not clear. Escalate to -vvv for connection and SSH problems.

--check and --diff

The two most important flags for safe production use:

# Dry run — show what would change without changing anything
ansible-playbook site.yml --check

# Show file diffs — what the template would render vs what is on disk
ansible-playbook site.yml --diff

# Both together (most useful)
ansible-playbook site.yml --check --diff

With --diff, when a template or file task would make a change, Ansible shows a unified diff:

--- before: /etc/chrony.conf
+++ after: /etc/chrony.conf
@@ -1,4 +1,5 @@
 server 0.pool.ntp.org iburst
+server ntp1.internal.example.com iburst
 driftfile /var/lib/chrony/drift
 makestep 1.0 3
 rtcsync
Always run --check --diff before a production deployment. Review every diff before proceeding. If you see changes you did not expect, stop and investigate.
--check is not always reliable. If a task's output feeds into a later task, --check may report false failures on the later task because the earlier task did not actually run. This is normal — run for real to confirm.

debug module

Print variable values during a playbook run:

- name: Show what NTP servers will be used
  ansible.builtin.debug:
    msg: "chrony_servers: {{ chrony_servers }}"

- name: Show entire variable dict
  ansible.builtin.debug:
    var: chrony_servers

- name: Show multiple values
  ansible.builtin.debug:
    msg: |
      hostname: {{ inventory_hostname }}
      NTP servers: {{ chrony_servers | join(', ') }}
      TLS enabled: {{ enable_tls }}

Print the output of a registered task result:

- name: Run config check
  ansible.builtin.command: nginx -t
  register: nginx_check
  ignore_errors: true

- name: Show result
  ansible.builtin.debug:
    var: nginx_check

# nginx_check will contain:
# - nginx_check.rc       (return code — 0 = success)
# - nginx_check.stdout   (standard output)
# - nginx_check.stderr   (standard error)
# - nginx_check.changed  (always false for command — use changed_when)

--start-at-task and --step

Skip to a specific task without re-running everything:

# Start from a specific task by name
ansible-playbook site.yml --start-at-task "Deploy nginx config"

# Interactive mode — confirm each task before running
ansible-playbook site.yml --step

--start-at-task is useful when a long playbook fails at step 47 and you have already fixed the problem — you can resume from there instead of starting over.

Common errors and what they mean

These are the errors you will see most often. Each has a predictable cause and a fast diagnostic path.

UNREACHABLE

fatal: [web01]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ...", "unreachable": true}

Ansible could not SSH to the host. Diagnose in order:

# Try SSH manually with the ansible user
ssh ansible@web01

# Check the right key is being used
ansible web01 -m ping -vvv   # look for "identity file" lines

# Confirm the host is in DNS/resolves
dig web01

# Check port 22 is reachable
nc -zv web01 22

MODULE FAILURE

fatal: [web01]: FAILED! => {
    "changed": false,
    "msg": "Could not find the requested service chronyd: host",
    "rc": 1
}

The module ran but the action failed. Read the msg field — it usually tells you exactly what went wrong. Common causes:

Add -v to see the full module output and narrow down the cause.

Undefined variable

fatal: [web01]: FAILED! => {
    "msg": "The task includes an option with an undefined variable.
    The error was: 'chrony_servers' is undefined"
}

The variable is referenced in a task or template but was never defined. Check:

# Is it in the role's defaults?
cat roles/chrony/defaults/main.yml | grep chrony_servers

# Is it in group_vars?
grep -r "chrony_servers" inventories/

# Inspect what Ansible sees for this host
ansible web01 -m debug -a "var=chrony_servers"

If the variable should be optional, use a default filter in the template: {{ chrony_servers | default([]) }}

Template errors

fatal: [web01]: FAILED! => {
    "msg": "AnsibleError: template error while templating string:
    expected token 'end of print statement', got '|'. ..."
}

Syntax error in a Jinja2 template. Common causes:

Test the template in isolation:

# Render the template locally (shows what it would produce)
ansible localhost -m template \
  -a "src=roles/chrony/templates/chrony.conf.j2 dest=/tmp/chrony.conf.test" \
  -e "chrony_servers=['0.pool.ntp.org']"

cat /tmp/chrony.conf.test

ansible-lint for catching mistakes early

# Lint a specific playbook
ansible-lint site.yml

# Lint everything
ansible-lint

Run this before every push. ansible-lint catches:

If ansible-lint gives false positives for rules you disagree with, you can skip specific rules with # noqa: rule-name on the task or configure them in .ansible-lint.

Inventory introspection

Before running any playbook, you can inspect exactly what Ansible sees from your inventory — merged variables, group memberships, and all. This is the fastest way to debug "wrong variable" or "wrong host" issues.

# Show all hosts and groups as JSON (merged variable view)
ansible-inventory -i inventories/production/ --list

# Show host tree (groups → hosts)
ansible-inventory -i inventories/production/ --graph

# Show all merged variables for a specific host
ansible-inventory -i inventories/production/ --host web01

--host is the most useful flag. It shows the final merged variable values Ansible will use for that host — including group_vars, host_vars, and dynamic inventory. If the value here is wrong, the problem is in inventory, not in your playbook.

# Combine with a playbook-specific inventory for dynamic sources
ansible-inventory -i inventories/production/ --graph --vars

Playbook introspection (dry-scope)

These flags let you see the scope of a playbook run without executing anything — essential for confirming tags, hosts, and task order before a maintenance window.

# Validate YAML and role syntax without running tasks
ansible-playbook site.yml --syntax-check

# List all tasks that would run (in order)
ansible-playbook site.yml --list-tasks

# List all tags defined across the playbook
ansible-playbook site.yml --list-tags

# List which hosts would be targeted
ansible-playbook site.yml -i inventories/production/ --list-hosts

# Combine: see what tasks would run with a specific tag on specific hosts
ansible-playbook site.yml -i inventories/production/ \
  --tags chrony \
  --limit webservers \
  --list-tasks

Run --list-tasks before every production change to confirm exactly which tasks will execute. It prevents surprises from when: conditions you forgot about.

assert module — pre-flight checks

Add assert tasks at the start of a role or playbook to fail loudly with a helpful message if required variables are missing or invalid. This catches problems before any change is made.

# Pre-flight checks at the start of a play
- name: Pre-flight — verify required variables
  ansible.builtin.assert:
    that:
      - db_host is defined
      - db_host | length > 0
      - db_port is defined
      - db_port | int > 0 and db_port | int < 65536
      - env in ['staging', 'production']
    fail_msg: >
      Required variable missing or invalid.
      db_host={{ db_host | default('UNDEFINED') }},
      db_port={{ db_port | default('UNDEFINED') }},
      env={{ env | default('UNDEFINED') }}
    success_msg: "Pre-flight checks passed"

Practical patterns:

# Check that a variable matches a pattern (e.g. version string)
- ansible.builtin.assert:
    that:
      - app_version is match('^[0-9]+\.[0-9]+\.[0-9]+$')
    fail_msg: "app_version must be semver (got: {{ app_version }})"

# Check OS is supported
- ansible.builtin.assert:
    that:
      - ansible_os_family in ['RedHat', 'Debian']
    fail_msg: "This role only supports RedHat and Debian families"

# Check a port is open before proceeding
- ansible.builtin.assert:
    that:
      - ansible_facts.services['postgresql.service'].state == 'running'
    fail_msg: "PostgreSQL must be running before applying app config"

Put assert tasks in a block with tags: always so they run even when you use --tags to run only part of a playbook.