Diagnosing Kubernetes DNS Resolution Failures: CoreDNS Upstream Forwarding Broken by resolv.conf Drift

If some pods in your Kubernetes cluster can resolve external DNS names and others can't — especially when both pods run the same image with the same environment variables — the cause is almost always a node-level difference, not a pod configuration issue. Don't compare pod configs. Compare node configs.

The specific failure this guide addresses: CoreDNS upstream forwarding broken on a subset of worker nodes because /etc/resolv.conf drifted after a node pool upgrade. Cluster-internal names resolve fine on all nodes; external names fail only on the affected nodes.

Understanding Kubernetes DNS Architecture

Before running any tests, understanding the forwarding path helps you know exactly where to look.

Kubernetes DNS architecture diagram showing the 5-step resolution flow: Pod sends UDP/53 query to the cluster DNS service IP, CoreDNS receives and checks its cache, CoreDNS forwards to upstream resolvers via the node's /etc/resolv.conf, upstream resolver returns the answer, CoreDNS caches and returns to the pod — Kubernetes DNS resolution flow. Step 3 (CoreDNS → upstream resolver) is where this class of failure occurs — and why it is asymmetric across nodes.

The 5-step DNS resolution path in AKS:

Pod sends query to the cluster DNS service IP (usually 10.0.0.10 — the kube-dns service ClusterIP). This IP is injected into every pod's /etc/resolv.conf by the kubelet at pod startup.
CoreDNS receives the query. For cluster-internal names (.cluster.local suffix), CoreDNS answers directly from its in-cluster knowledge. For external names (api.stripe.com, storage.googleapis.com), CoreDNS forwards to upstream resolvers.
CoreDNS forwards to upstream. The upstream resolver addresses come from the node's /etc/resolv.conf — not from a ConfigMap, not from a Kubernetes object. The kubelet's --resolv-conf flag reads the node's resolver config and propagates it into CoreDNS's Corefile at startup.
Upstream resolver answers. The upstream (Azure DNS at 168.63.129.16 in AKS, or a custom resolver if configured) resolves the external name and returns the answer.
CoreDNS caches and responds to the pod. The pod's DNS client receives the answer and proceeds.

The critical implication: if the node's /etc/resolv.conf is wrong, CoreDNS on that node's pod forwards to the wrong upstream, and all pods scheduled on that node fail to resolve external names. Pods on correctly-configured nodes are fine. This produces exactly the asymmetric failure pattern described above.

Symptoms

Application error pattern:

Error: getaddrinfo ENOTFOUND api.stripe.com
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:66:26)

Pod distribution check:

kubectl get pods -n <namespace> -o wide

NAME                          READY   STATUS    NODE
payment-api-7d9f84b6c-k2x9p   1/1     Running   aks-nodepool1-45821003-vmss000045
payment-api-7d9f84b6c-m7r3t   1/1     Running   aks-nodepool1-45821003-vmss000032

If one pod is failing and the other is healthy — same image, same namespace, same env, different node — the node is the variable. Proceed with the tests below.

Quick replication test:

# Test from the failing pod
kubectl exec -it <failing-pod> -n <namespace> -- nslookup api.stripe.com
# ** server can't find api.stripe.com: SERVFAIL

# Same test from the healthy pod
kubectl exec -it <healthy-pod> -n <namespace> -- nslookup api.stripe.com
# Address: 54.166.130.11 ✓

Both pods query the same kube-dns service IP (10.0.0.10). Both send to CoreDNS. Different results. The problem is inside CoreDNS's forwarding on the affected node.

Phase 1 — Pod-Level DNS Testing

Deploy busybox test pods pinned to specific nodes to isolate the node variable:

# Pin a test pod to the failing node
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: dns-test-failing-node
  namespace: <namespace>
spec:
  nodeName: <failing-node-name>
  containers:
  - name: busybox
    image: busybox:1.36
    command: ["sleep", "3600"]
  restartPolicy: Never
EOF

# Pin a test pod to a healthy node
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: dns-test-healthy-node
  namespace: <namespace>
spec:
  nodeName: <healthy-node-name>
  containers:
  - name: busybox
    image: busybox:1.36
    command: ["sleep", "3600"]
  restartPolicy: Never
EOF

Test 1 — External name resolution:

kubectl exec -n <namespace> dns-test-failing-node -- nslookup api.stripe.com
# Expected on failing node: ** server can't find api.stripe.com: SERVFAIL

kubectl exec -n <namespace> dns-test-healthy-node -- nslookup api.stripe.com
# Expected on healthy node: Address: 54.x.x.x ✓

Test 2 — Cluster-internal resolution (should work on both — not hitting upstream):

kubectl exec -n <namespace> dns-test-failing-node -- \
  nslookup kubernetes.default.svc.cluster.local
# Expected: Address: 10.0.0.1 ✓ (works even on failing node)

kubectl exec -n <namespace> dns-test-healthy-node -- \
  nslookup kubernetes.default.svc.cluster.local
# Expected: Address: 10.0.0.1 ✓

If cluster-internal DNS works on both nodes but external DNS fails only on one, the failure is in CoreDNS's upstream forwarding path — not in CoreDNS itself.

Test 3 — Direct query to Azure DNS bypassing CoreDNS:

# Query Azure DNS directly from inside the failing pod
kubectl exec -n <namespace> dns-test-failing-node -- \
  nslookup api.stripe.com 168.63.129.16
# Expected: Address: 54.x.x.x ✓

If this succeeds, Azure DNS is healthy. The problem is in the path between CoreDNS and its configured upstream, not in the upstream itself.

Test 4 — Check what resolvers the pod sees:

kubectl exec -n <namespace> dns-test-failing-node -- cat /etc/resolv.conf
# Expected output (correct):
# search <namespace>.svc.cluster.local svc.cluster.local cluster.local
# nameserver 10.0.0.10
# options ndots:5

The pod's /etc/resolv.conf points to kube-dns at 10.0.0.10. This is set by the kubelet and is typically correct on all nodes. Pod-level /etc/resolv.conf is not the issue.

Phase 2 — CoreDNS Health Verification

List CoreDNS pods and which nodes they run on:

kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide

If CoreDNS has a pod on the failing node, queries load-balanced to that pod will fail while queries hitting pods on healthy nodes will succeed. This explains intermittent-looking failures that are actually node-specific.

Check CoreDNS logs on the failing node's pod:

kubectl logs -n kube-system <coredns-pod-on-failing-node> --tail=50

Look for lines like:

[ERROR] plugin/errors: 2 api.stripe.com. A: read udp 10.244.x.x:xxxxx->192.168.0.1:53: i/o timeout

The upstream IP in the error (here 192.168.0.1) is what CoreDNS is trying to use. If it's not Azure DNS (168.63.129.16) or your configured custom resolver, the node's /etc/resolv.conf has drifted.

Check the Corefile on the failing CoreDNS pod:

kubectl exec -n kube-system <coredns-pod-on-failing-node> -- cat /etc/coredns/Corefile

The default AKS Corefile uses forward . /etc/resolv.conf. This means CoreDNS reads the node's /etc/resolv.conf at startup to determine upstream resolvers.

Check the CoreDNS pod's /etc/resolv.conf:

kubectl exec -n kube-system <coredns-pod-on-failing-node> -- cat /etc/resolv.conf

If this shows an unexpected nameserver (not 168.63.129.16 or your custom resolver), you have found the root cause. This file is mounted from the node's resolver config via the kubelet's --resolv-conf flag.

Phase 3 — Working vs Failing Node Comparison

Open a privileged shell on each node:

# Failing node
kubectl debug node/<failing-node-name> \
  -it --image=mcr.microsoft.com/cbl-mariner/busybox:2.0 \
  -- chroot /host bash

# On the failing node:
cat /etc/resolv.conf
# If this shows an unexpected nameserver, root cause confirmed

# Healthy node
kubectl debug node/<healthy-node-name> \
  -it --image=mcr.microsoft.com/cbl-mariner/busybox:2.0 \
  -- chroot /host bash

# On the healthy node:
cat /etc/resolv.conf
# Should show nameserver 168.63.129.16

Check where the kubelet reads its resolver config:

# On the failing node
cat /etc/default/kubelet | grep resolv
# --resolv-conf=/run/systemd/resolve/resolv.conf

cat /run/systemd/resolve/resolv.conf
# If this shows the wrong nameserver, systemd-resolved has drifted

Root Cause — Node Pool Upgrade and DNS Config Drift

The most common cause of this asymmetric pattern is a node pool upgrade. When AKS upgrades a node pool, it provisions new VMs from the current VMSS (VM Scale Set) configuration. New VMs inherit the VNet's DHCP-advertised DNS server list.

Check the VNet DNS configuration:

az network vnet show \
  --resource-group <aks-node-resource-group> \
  --name <aks-vnet-name> \
  --query "dhcpOptions.dnsServers"

If this returns a stale or decommissioned IP address, new nodes provisioned during the upgrade will use that IP as their primary nameserver. Older nodes retain whatever DNS configuration they received when first provisioned.

Why DHCP fallback doesn't save you: If the wrong nameserver is unreachable (returning timeouts rather than SERVFAIL), systemd-resolved waits for a timeout on each query rather than immediately falling back to the secondary nameserver. This manifests as slow DNS queries rather than immediate failures, which can make the failure harder to detect during the upgrade window.

The Fix

Step 1 — Remove stale DNS servers from the VNet:

az network vnet update \
  --resource-group <aks-node-resource-group> \
  --name <aks-vnet-name> \
  --dns-servers 168.63.129.16

This is the permanent fix. All future nodes provisioned from this VNet will receive Azure DNS only.

Step 2 — Repair affected nodes without reprovisioning:

# For each affected node, open a privileged shell
kubectl debug node/<affected-node-name> \
  -it --image=mcr.microsoft.com/cbl-mariner/busybox:2.0 \
  -- chroot /host bash

# Overwrite systemd-resolved upstream config
mkdir -p /etc/systemd/resolved.conf.d
cat <<EOF > /etc/systemd/resolved.conf.d/azure-dns.conf
[Resolve]
DNS=168.63.129.16
Domains=~.
EOF

# Restart systemd-resolved
systemctl restart systemd-resolved

# Verify
cat /run/systemd/resolve/resolv.conf
# Should show: nameserver 168.63.129.16

Step 3 — Restart CoreDNS pods on the fixed nodes:

The Corefile uses forward . /etc/resolv.conf with the reload plugin, but reload watches the Corefile for changes — it does not re-read the forwarding target file (/etc/resolv.conf). Restart the CoreDNS pod on the affected node:

# Delete the CoreDNS pod on the affected node — it will be rescheduled
kubectl delete pod -n kube-system <coredns-pod-on-affected-node>

After restart, the new CoreDNS pod reads the corrected /etc/resolv.conf and begins forwarding to the correct upstream.

Step 4 — Verify:

kubectl exec -n <namespace> dns-test-failing-node -- nslookup api.stripe.com
# Expected: Address: 54.x.x.x ✓

No pods need to be restarted — the fix is in the CoreDNS layer.

Identifying All Affected Nodes

Run a cluster-wide audit to find all nodes with incorrect DNS configuration:

# Check /run/systemd/resolve/resolv.conf on every node
for node in $(kubectl get nodes -o name | sed 's|node/||'); do
  echo -n "$node: "
  kubectl debug node/$node -it --quiet \
    --image=mcr.microsoft.com/cbl-mariner/busybox:2.0 \
    -- chroot /host cat /run/systemd/resolve/resolv.conf 2>/dev/null | \
    grep nameserver
done

Apply the fix in Step 2 to all nodes where the nameserver is not the expected value.

Preventive Monitoring

Prometheus Alert — CoreDNS Upstream Failure Rate

groups:
- name: coredns-dns-health
  rules:
  - alert: CoreDNSUpstreamForwardFailures
    expr: |
      rate(coredns_forward_request_failures_total[5m]) > 0.05
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "CoreDNS upstream forward failures on {{ $labels.to }}"
      description: |
        CoreDNS is failing to forward requests to upstream resolver {{ $labels.to }}.
        Failure rate: {{ $value | humanizePercentage }}.
        Check /etc/resolv.conf on nodes running CoreDNS pods.

Prometheus Alert — CoreDNS SERVFAIL Rate

  - alert: CoreDNSHighServfailRate
    expr: |
      rate(coredns_dns_responses_total{rcode="SERVFAIL"}[5m])
      /
      rate(coredns_dns_requests_total[5m]) > 0.01
    for: 3m
    labels:
      severity: critical
    annotations:
      summary: "CoreDNS SERVFAIL rate above 1%"
      description: |
        More than 1% of DNS responses are SERVFAIL.
        This typically indicates upstream forwarding failure.
        Immediate investigation required.

Synthetic DNS Health Check

A lightweight CronJob that runs every 5 minutes across all nodes to verify external resolution is working:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: dns-health-check
  namespace: kube-system
spec:
  schedule: "*/5 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          tolerations:
          - operator: Exists
          affinity:
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchLabels:
                    app: dns-health-check
                topologyKey: kubernetes.io/hostname
          containers:
          - name: dns-check
            image: busybox:1.36
            command:
            - sh
            - -c
            - |
              if nslookup microsoft.com > /dev/null 2>&1; then
                echo "DNS OK on $(hostname -f)"
              else
                echo "DNS FAIL on $(hostname -f): $(nslookup microsoft.com 2>&1)"
                exit 1
              fi
          restartPolicy: Never

The podAntiAffinity with topologyKey: kubernetes.io/hostname ensures each CronJob execution tries to schedule one pod per node. If any node fails the DNS check, the job exits non-zero, triggering an alert via the kube_job_failed Prometheus metric.

Node-Level DNS Audit Script

Add this to your cluster health runbook and run it after every node pool upgrade:

#!/bin/bash
# audit-node-dns.sh — check DNS resolver config on all nodes
EXPECTED_DNS="168.63.129.16"
FAILED_NODES=()

for node in $(kubectl get nodes -o name | sed 's|node/||'); do
  ACTUAL=$(kubectl debug node/$node -it --quiet \
    --image=mcr.microsoft.com/cbl-mariner/busybox:2.0 \
    -- chroot /host sh -c "grep nameserver /run/systemd/resolve/resolv.conf | awk '{print \$2}' | head -1" 2>/dev/null)

  if [ "$ACTUAL" != "$EXPECTED_DNS" ]; then
    echo "FAIL: $node has nameserver $ACTUAL (expected $EXPECTED_DNS)"
    FAILED_NODES+=("$node")
  else
    echo "OK:   $node has nameserver $ACTUAL"
  fi
done

echo ""
echo "Failed nodes: ${#FAILED_NODES[@]}"

Troubleshooting Framework

If you see asymmetric DNS failures in Kubernetes (some pods resolve, others don't):

Step	Command	What you are checking
1. Confirm asymmetry	`kubectl get pods -o wide` + `nslookup` from each	Which nodes are affected
2. Test internal DNS	`nslookup kubernetes.default.svc.cluster.local`	Whether CoreDNS itself is up
3. Test upstream directly	`nslookup external.com <upstream-ip>`	Whether upstream resolver is reachable
4. Check CoreDNS logs	`kubectl logs -n kube-system <coredns-pod>`	Which upstream IP it is trying
5. Check CoreDNS resolv.conf	`kubectl exec <coredns-pod> -- cat /etc/resolv.conf`	What upstream IP CoreDNS is using
6. Check node resolv.conf	`kubectl debug node/<node> … cat /run/systemd/resolve/resolv.conf`	Source of the wrong upstream
7. Audit VNet DNS config	`az network vnet show --query dhcpOptions.dnsServers`	Where the wrong IP originated

Key Takeaways

1. Asymmetric DNS failures always point to a node-level difference.

If two pods with identical configs behave differently for DNS, the variable is the node. Don't spend time comparing pod configs — compare node configs.

2. CoreDNS "forward . /etc/resolv.conf" trusts the node completely.

The Corefile's forward directive with /etc/resolv.conf is a deliberate design choice — CoreDNS inherits whatever the node operator configured. The failure is when the node's resolver config is wrong, not when CoreDNS forwards to it.

3. Node pool upgrades provision new nodes from the current VMSS/VNet configuration.

Upgraded nodes are new VMs. They pick up whatever DNS server configuration the VNet currently advertises via DHCP. Any stale DNS server configuration that exists in the VNet-level settings at upgrade time will be inherited by new nodes.

4. DHCP fallback doesn't save you when the primary DNS times out.

If the wrong nameserver is unreachable rather than actively refusing connections, systemd-resolved times out on each query before considering fallback. This manifests as slow DNS rather than immediate failure.

5. Audit VNet DNS config before every node pool upgrade.

Run az network vnet show --query dhcpOptions.dnsServers before upgrading. Remove any decommissioned DNS server IPs. This takes 30 seconds and prevents this entire class of failure.

6. Add CoreDNS upstream failure metrics to your cluster health dashboard.

coredns_forward_request_failures_total is already being scraped by any standard Prometheus installation. If it is not on your dashboard, add it. A rate above 1% for external queries always warrants an alert.