- Authors
- Name
- 1. What is DNS
- 2. DNS Resolution Process
- 3. Common DNS Issues
- 4. DNS Debugging Tools
- 5. DNS Caching Issues
- 6. resolv.conf and nsswitch.conf Configuration
- 7. CoreDNS Troubleshooting in Kubernetes
- 8. ndots Configuration and Search Domains
- 9. Practical Debugging Scenarios
- 10. Useful DNS Debugging One-Liners
- 11. Summary and Checklist
1. What is DNS
DNS (Domain Name System) is a distributed hierarchical naming system that translates human-readable domain names (e.g., example.com) into IP addresses (e.g., 93.184.216.34) that computers use for communication. Often called the phonebook of the internet, it handles the very first step of almost every network interaction.
1.1 Why DNS Matters
- Nearly every network operation -- HTTP requests, API calls, email delivery -- begins with a DNS lookup.
- A DNS outage can cascade into a full service outage.
- In microservice architectures, DNS is central to service discovery.
2. DNS Resolution Process
When a client enters a domain name, the following steps occur to obtain the corresponding IP address.
2.1 Overall Flow
1. Client → Check local DNS cache (including /etc/hosts)
2. Local cache miss → Query Recursive Resolver (ISP or configured DNS server)
3. Recursive Resolver → Query Root Name Server (.)
4. Root NS → Responds with TLD Name Server (.com, .net, etc.)
5. TLD NS → Responds with Authoritative Name Server
6. Authoritative NS → Returns the final IP address
7. Recursive Resolver → Caches result and responds to client
2.2 Recursive vs Iterative Queries
# Trace recursive resolution with dig +trace
$ dig +trace example.com
; <<>> DiG 9.18.18 <<>> +trace example.com
;; global options: +cmd
. 518400 IN NS a.root-servers.net.
. 518400 IN NS b.root-servers.net.
;; Received 239 bytes from 127.0.0.53#53(127.0.0.53) in 1 ms
com. 172800 IN NS a.gtld-servers.net.
com. 172800 IN NS b.gtld-servers.net.
;; Received 1170 bytes from 198.41.0.4#53(a.root-servers.net) in 23 ms
example.com. 172800 IN NS a.iana-servers.net.
example.com. 172800 IN NS b.iana-servers.net.
;; Received 356 bytes from 192.5.6.30#53(a.gtld-servers.net) in 15 ms
example.com. 86400 IN A 93.184.216.34
;; Received 56 bytes from 199.43.135.53#53(a.iana-servers.net) in 78 ms
2.3 DNS Record Types
| Record | Description | Example |
|---|---|---|
| A | IPv4 address mapping | example.com → 93.184.216.34 |
| AAAA | IPv6 address mapping | example.com → 2606:2800:220:1:... |
| CNAME | Canonical name (alias) | www.example.com → example.com |
| MX | Mail exchange server | example.com → mail.example.com |
| NS | Name server delegation | example.com → ns1.example.com |
| TXT | Text record (SPF, DKIM, etc.) | v=spf1 include:... |
| SRV | Service locator | _http._tcp.example.com |
| PTR | Reverse lookup (IP→domain) | 34.216.184.93 → example.com |
| SOA | Start of Authority | Zone management metadata |
3. Common DNS Issues
3.1 NXDOMAIN (Non-Existent Domain)
This response is returned when the queried domain does not exist.
$ dig nonexistent.example.com
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 12345
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; QUESTION SECTION:
;nonexistent.example.com. IN A
;; AUTHORITY SECTION:
example.com. 900 IN SOA ns1.example.com. admin.example.com. 2024010101 3600 900 604800 86400
Root Cause Analysis:
- Typo in the domain name
- DNS record has not been created yet
- Domain registration has expired
- DNS propagation has not completed
3.2 DNS Timeout
$ dig @10.0.0.1 example.com +timeout=5
; <<>> DiG 9.18.18 <<>> @10.0.0.1 example.com +timeout=5
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached
Root Cause Analysis:
- DNS server is down or unreachable
- Firewall blocking UDP/TCP port 53
- Network connectivity issues
- DNS server overloaded
3.3 Stale or Wrong Records
# Unexpected IP address returned
$ dig api.myservice.com +short
192.168.1.100 # Expected: 10.0.1.50
# Compare results from multiple DNS servers
$ dig @8.8.8.8 api.myservice.com +short
10.0.1.50
$ dig @1.1.1.1 api.myservice.com +short
10.0.1.50
$ dig @192.168.1.1 api.myservice.com +short
192.168.1.100 # Local DNS cache returning stale value
4. DNS Debugging Tools
4.1 dig (Domain Information Groper)
The most powerful and widely used DNS debugging tool.
# Basic lookup
$ dig example.com
# Query specific record types
$ dig example.com MX
$ dig example.com AAAA
$ dig example.com TXT
# Short output
$ dig example.com +short
93.184.216.34
# Query a specific DNS server
$ dig @8.8.8.8 example.com
# Reverse lookup
$ dig -x 93.184.216.34
# Query all record types
$ dig example.com ANY
# Check response time
$ dig example.com | grep "Query time"
;; Query time: 23 msec
# Use TCP instead of UDP
$ dig +tcp example.com
# DNSSEC validation
$ dig +dnssec example.com
4.2 nslookup
Supports both interactive and non-interactive modes.
# Basic lookup
$ nslookup example.com
Server: 127.0.0.53
Address: 127.0.0.53#53
Non-authoritative answer:
Name: example.com
Address: 93.184.216.34
# Use a specific DNS server
$ nslookup example.com 8.8.8.8
# Query specific record type
$ nslookup -type=MX example.com
# Interactive mode
$ nslookup
> set type=NS
> example.com
Server: 127.0.0.53
Address: 127.0.0.53#53
Non-authoritative answer:
example.com nameserver = a.iana-servers.net.
example.com nameserver = b.iana-servers.net.
> exit
4.3 host
A lightweight tool providing concise output.
# Basic lookup
$ host example.com
example.com has address 93.184.216.34
example.com has IPv6 address 2606:2800:220:1:248:1893:25c8:1946
example.com mail is handled by 0 .
# Reverse lookup
$ host 93.184.216.34
34.216.184.93.in-addr.arpa domain name pointer example.com.
# Query specific record type
$ host -t NS example.com
example.com name server a.iana-servers.net.
example.com name server b.iana-servers.net.
# Verbose output
$ host -v example.com
4.4 drill
A tool with enhanced DNSSEC support (from the ldns package).
# Basic lookup
$ drill example.com
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 54321
;; QUESTION SECTION:
;; example.com. IN A
;; ANSWER SECTION:
example.com. 86400 IN A 93.184.216.34
# DNSSEC trace
$ drill -DT example.com
# Query a specific server
$ drill @8.8.8.8 example.com
5. DNS Caching Issues
5.1 TTL (Time To Live)
# Check TTL value
$ dig example.com
;; ANSWER SECTION:
example.com. 86400 IN A 93.184.216.34
# ^^^^^ TTL: 86400 seconds = 24 hours
A long TTL means DNS changes take longer to propagate. Before a DNS migration, lower the TTL in advance.
# TTL strategy example
# 1. 24 hours before migration: Lower TTL to 300 seconds (5 minutes)
# 2. Execute migration: Change IP address
# 3. After confirming propagation: Restore TTL to original value
5.2 Negative Caching
NXDOMAIN responses are also cached. The MINIMUM field in the SOA record determines the negative cache TTL.
$ dig example.com SOA
;; ANSWER SECTION:
example.com. 86400 IN SOA ns1.example.com. admin.example.com. (
2024010101 ; Serial
3600 ; Refresh
900 ; Retry
604800 ; Expire
86400 ) ; Minimum TTL (negative cache TTL)
5.3 Managing Local DNS Cache
# Linux: Check systemd-resolved cache statistics
$ resolvectl statistics
Current Cache Size: 152
Cache Hits: 1234
Cache Misses: 567
# Linux: Flush systemd-resolved cache
$ sudo resolvectl flush-caches
# macOS: Flush DNS cache
$ sudo dscacheutil -flushcache && sudo killall -HUP mDNSResponder
# Windows: Flush DNS cache
> ipconfig /flushdns
6. resolv.conf and nsswitch.conf Configuration
6.1 /etc/resolv.conf
$ cat /etc/resolv.conf
# DNS server configuration (maximum 3)
nameserver 8.8.8.8
nameserver 8.8.4.4
nameserver 1.1.1.1
# Default search domains
search mycompany.com prod.mycompany.com
# Options
options timeout:2 # Timeout in 2 seconds
options attempts:3 # Retry 3 times
options ndots:5 # Threshold for FQDN determination (details below)
options rotate # Round-robin DNS servers
options edns0 # Enable EDNS0
Key Configuration Options:
nameserver: DNS servers to use (tried in order, maximum 3)search: Domains automatically appended to short hostnamesdomain: Similar to search but specifies only a single domainoptions ndots:n: Names with fewer than n dots try search domains first
6.2 /etc/nsswitch.conf
Controls the order of name resolution sources.
$ grep hosts /etc/nsswitch.conf
hosts: files dns myhostname
# files = Check /etc/hosts first
# dns = Query DNS servers
# myhostname = Resolve local hostname (systemd)
# /etc/hosts file example
$ cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 myserver
10.0.1.50 api.internal.mycompany.com api-internal
192.168.1.100 db-master.mycompany.com
6.3 Checking systemd-resolved
# View current DNS configuration
$ resolvectl status
Global
Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub
Link 2 (eth0)
Current Scopes: DNS
Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 8.8.8.8
DNS Servers: 8.8.8.8 8.8.4.4
# Test domain resolution
$ resolvectl query example.com
example.com: 93.184.216.34 -- link: eth0
2606:2800:220:1:248:1893:25c8:1946 -- link: eth0
7. CoreDNS Troubleshooting in Kubernetes
7.1 CoreDNS Architecture
Within a Kubernetes cluster, CoreDNS handles service discovery. All DNS queries from Pods are processed through CoreDNS.
# Check CoreDNS Pod status
$ kubectl get pods -n kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE
coredns-5d78c9869d-abc12 1/1 Running 0 7d
coredns-5d78c9869d-def34 1/1 Running 0 7d
# Check CoreDNS service
$ kubectl get svc -n kube-system kube-dns
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 30d
7.2 Inspecting the CoreDNS Corefile
$ kubectl get configmap coredns -n kube-system -o yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}
7.3 Checking CoreDNS Logs
# View CoreDNS logs
$ kubectl logs -n kube-system -l k8s-app=kube-dns --tail=50
[INFO] 10.244.0.15:45678 - 12345 "A IN my-service.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd 106 0.000234s
[INFO] 10.244.0.15:45679 - 12346 "A IN external-api.com. udp 34 false 512" NOERROR qr,rd,ra 62 0.023456s
# Enable log plugin (add log to Corefile)
# .:53 {
# log
# errors
# ...
# }
7.4 Debugging DNS from Inside a Pod
# Create a debug Pod
$ kubectl run dns-debug --image=nicolaka/netshoot --rm -it --restart=Never -- bash
# Test DNS resolution inside the Pod
bash-5.1# nslookup kubernetes.default.svc.cluster.local
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
# Detailed check with dig
bash-5.1# dig kubernetes.default.svc.cluster.local
;; ANSWER SECTION:
kubernetes.default.svc.cluster.local. 30 IN A 10.96.0.1
;; Query time: 1 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
# Test external domain resolution
bash-5.1# dig example.com +short
93.184.216.34
# Check the Pod's DNS configuration
bash-5.1# cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
8. ndots Configuration and Search Domains
8.1 How ndots Works
The ndots option specifies that if the query name contains fewer dots (.) than this value, the system will try appending search domains first.
# Kubernetes default: ndots:5
# search default.svc.cluster.local svc.cluster.local cluster.local
# Querying "api.example.com" (2 dots < ndots 5)
# Actual query order:
1. api.example.com.default.svc.cluster.local → NXDOMAIN
2. api.example.com.svc.cluster.local → NXDOMAIN
3. api.example.com.cluster.local → NXDOMAIN
4. api.example.com. → Success!
This means external domain lookups generate 3 unnecessary extra DNS queries.
8.2 Optimizing ndots
# Adjust ndots via dnsConfig in Pod spec
apiVersion: v1
kind: Pod
metadata:
name: optimized-pod
spec:
containers:
- name: app
image: myapp:latest
dnsConfig:
options:
- name: ndots
value: '2'
# Use FQDN to avoid unnecessary queries (add trailing dot)
# Inefficient:
$ dig api.example.com # Multiple queries due to ndots
# Efficient:
$ dig api.example.com. # Direct query as FQDN (trailing dot)
8.3 dnsPolicy Options
# ClusterFirst (default): Use CoreDNS first
apiVersion: v1
kind: Pod
spec:
dnsPolicy: ClusterFirst
# Default: Use the node's DNS configuration as-is
spec:
dnsPolicy: Default
# None: Configure DNS manually via dnsConfig
spec:
dnsPolicy: None
dnsConfig:
nameservers:
- 8.8.8.8
- 1.1.1.1
searches:
- my-namespace.svc.cluster.local
- svc.cluster.local
options:
- name: ndots
value: "2"
9. Practical Debugging Scenarios
9.1 Scenario 1: Inter-Service Communication Failure
# Symptom: Pod A cannot connect to Pod B's service
$ kubectl exec pod-a -- curl http://my-service:8080
curl: (6) Could not resolve host: my-service
# Step 1: Check the Pod's DNS configuration
$ kubectl exec pod-a -- cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
# Step 2: Query CoreDNS directly
$ kubectl exec pod-a -- dig @10.96.0.10 my-service.default.svc.cluster.local
;; status: NXDOMAIN
# Step 3: Verify the service exists
$ kubectl get svc my-service -n default
Error from server (NotFound): services "my-service" not found
# Step 4: Check the correct namespace
$ kubectl get svc --all-namespaces | grep my-service
production my-service ClusterIP 10.96.45.123 <none> 8080/TCP 5d
# Solution: Use FQDN with the correct namespace
$ kubectl exec pod-a -- curl http://my-service.production.svc.cluster.local:8080
9.2 Scenario 2: External Domain Resolution Failure
# Symptom: External API calls from Pod are failing
$ kubectl exec my-pod -- curl https://api.external.com
curl: (6) Could not resolve host: api.external.com
# Step 1: Verify CoreDNS is working for internal queries
$ kubectl exec my-pod -- dig @10.96.0.10 kubernetes.default.svc.cluster.local +short
10.96.0.1 # Internal DNS is working fine
# Step 2: Check CoreDNS upstream forwarding
$ kubectl exec my-pod -- dig @10.96.0.10 api.external.com
;; status: SERVFAIL
# Step 3: Check CoreDNS logs
$ kubectl logs -n kube-system -l k8s-app=kube-dns | grep "api.external.com"
[ERROR] plugin/forward: no nameservers found
# Step 4: Inspect the CoreDNS forward configuration
$ kubectl get configmap coredns -n kube-system -o jsonpath='{.data.Corefile}'
# Look for: forward . /etc/resolv.conf
# Step 5: Check CoreDNS Pod's resolv.conf
$ kubectl exec -n kube-system coredns-5d78c9869d-abc12 -- cat /etc/resolv.conf
nameserver 169.254.169.253 # Cloud DNS may be unreachable
# Solution: Explicitly specify forward targets in the Corefile
# forward . 8.8.8.8 8.8.4.4
9.3 Scenario 3: Intermittent DNS Timeouts
# Symptom: DNS lookups intermittently take more than 5 seconds
$ time dig @10.96.0.10 example.com
;; Query time: 5003 msec # 5-second timeout before retry
# Cause: Linux conntrack race condition (DNAT + UDP)
# UDP DNS packets can collide in the conntrack table
# Verify: Check conntrack table statistics
$ sudo conntrack -S
cpu=0 found=0 invalid=1523 insert=0 insert_failed=156 drop=156
# ^^^^^^^^^^^^^ insert failures indicate the issue
# Solution 1: Use TCP for DNS
# Add force_tcp option to CoreDNS Corefile
# Solution 2: Deploy NodeLocal DNSCache
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml
# Solution 3: Use single-request-reopen option in Pod
apiVersion: v1
kind: Pod
spec:
dnsConfig:
options:
- name: single-request-reopen
value: ""
9.4 Scenario 4: Verifying DNS Propagation
# Check propagation across multiple DNS servers after a change
$ for dns in 8.8.8.8 1.1.1.1 9.9.9.9 208.67.222.222; do
echo "=== $dns ==="
dig @$dns api.myservice.com +short +timeout=3
done
=== 8.8.8.8 ===
10.0.1.50
=== 1.1.1.1 ===
10.0.1.50
=== 9.9.9.9 ===
192.168.1.100 # Still returning old record
=== 208.67.222.222 ===
192.168.1.100 # Still returning old record
# Check TTL to estimate when the cache will expire
$ dig @9.9.9.9 api.myservice.com | grep -A1 "ANSWER SECTION"
;; ANSWER SECTION:
api.myservice.com. 1423 IN A 192.168.1.100
# ^^^^ Remaining TTL: cache expires in approximately 24 minutes
10. Useful DNS Debugging One-Liners
# 1. Benchmark DNS response times
$ for i in $(seq 1 10); do dig example.com | grep "Query time"; done
# 2. Bulk lookup for multiple domains
$ for domain in api.example.com web.example.com db.example.com; do
echo "$domain: $(dig +short $domain)"
done
# 3. Monitor DNS record changes
$ watch -n 5 "dig +short api.myservice.com @8.8.8.8"
# 4. Bulk reverse DNS lookups
$ for ip in 10.0.1.{1..10}; do
result=$(dig +short -x $ip)
echo "$ip -> ${result:-NO PTR}"
done
# 5. Check DNSSEC validation status
$ dig +dnssec +short example.com
93.184.216.34
A 13 2 86400 20240315000000 20240301000000 12345 example.com. <base64_signature>
# 6. Verify DNS resolution for all Kubernetes services
$ kubectl get svc --all-namespaces -o jsonpath='{range .items[*]}{.metadata.name}.{.metadata.namespace}.svc.cluster.local{"\n"}{end}' | \
while read fqdn; do
result=$(kubectl exec dns-debug -- dig +short $fqdn 2>/dev/null)
echo "$fqdn -> ${result:-FAILED}"
done
11. Summary and Checklist
When a DNS issue occurs, diagnose in the following order:
- Check
/etc/resolv.confsettings (nameserver, search, ndots) - Test basic DNS queries with
digornslookup - Query a specific DNS server (
dig @8.8.8.8) - Trace the full resolution path with
+trace - Check TTL to determine if caching is the issue
- In Kubernetes environments, check CoreDNS Pod status and logs
- Review the performance impact of
ndotsand search domain settings - For intermittent conntrack-related issues, consider deploying NodeLocal DNSCache
DNS is very often the root cause of network problems. Building systematic debugging habits can significantly reduce incident response times.