- Published on
DNS Deep Dive — Resolution, Caching, DNSSEC, DoH/DoT, Anycast, and 1.1.1.1 Internals (2025)
- Authors

- Name
- Youngju Kim
- @fjvbn20031
TL;DR
- DNS is a hierarchical distributed KV store. Maps domain (
example.com) to IP (93.184.216.34), but actually handles far more record types. - Four-tier participants: Stub resolver (client OS) → Recursive resolver (ISP/Cloudflare) → Root/TLD/Authoritative server.
- Recursive vs iterative: Clients issue recursive queries; the recursive resolver does the iterative work from root down.
- "13 root servers" is a lie. 13 IP addresses, but each uses BGP anycast with hundreds of instances worldwide.
- TTL and caching: Every record has a lifetime. Too long = slow change propagation; too short = traffic explosion.
- DNSSEC: Signs records. Chain: DNSKEY → RRSIG → DS → parent DNSKEY → root DNSKEY. Adoption stuck around 30% due to complexity.
- DoH / DoT: Encrypts DNS to prevent ISP snooping and forgery. Mainstream since 2020s.
- Public resolvers: Cloudflare 1.1.1.1 (Knot Resolver), Google 8.8.8.8, OpenDNS, Quad9.
- CoreDNS: Kubernetes' default DNS, Go plugin-based.
1. Why DNS Exists
Before DNS, SRI-NIC maintained a single HOSTS.TXT file. Every host on the internet downloaded it daily. Unsustainable past a few thousand machines: update storms, name collisions, partition inconsistencies.
Paul Mockapetris designed DNS in RFC 882/883 (1983): hierarchical, distributed, cached. Key ideas: namespace hierarchy, delegation, caching with TTL, UDP-first with TCP fallback. BIND (1984) was the first implementation. Same architecture still runs 40 years later.
2025 scale:
- Daily global queries: ~10 trillion.
- Registered domains: 350 million.
- TLDs: 1,500+.
- Cloudflare 1.1.1.1: 1 trillion queries/day.
- Google 8.8.8.8: 3 trillion queries/day.
All of it runs on distribution, caching, and delegation — no central database.
2. Namespace Structure
2.1 The Tree
DNS is an inverted tree. Read right-to-left toward the root:
. (root)
/ | \
com org uk (TLD)
/ | \
example wikipedia co
/ | \ /
www api mail google (2LD)
www.google.co.uk. ends with a dot (the root).
2.2 Zone and Delegation
A zone is an administrative unit. Root zone ("."), .com (Verisign), example.com (the owner). Within a zone, the Authoritative Name Server is the source of truth.
Delegation uses NS records:
example.com. IN NS ns1.example.com.
example.com. IN NS ns2.example.com.
2.3 Glue Records
Circular dependency: to reach ns1.example.com you need its IP, but that comes from example.com, which requires ns1.example.com. Glue records break this:
example.com. IN NS ns1.example.com.
ns1.example.com. IN A 192.0.2.1 ← glue
The A record sits in the parent's .com zone.
3. Participants
3.1 Stub Resolver
OS-level component. Reads /etc/resolv.conf, issues recursive query, returns IP to the app.
cat /etc/resolv.conf
nameserver 1.1.1.1
nameserver 8.8.8.8
3.2 Recursive Resolver
Provided by ISP or public service (1.1.1.1, 8.8.8.8). Checks cache; on miss, does iterative resolution from root down; caches and returns.
3.3 Root Server
Top-level ("."). Holds only the TLD server list.
The "13 root servers": 13 letters (A–M), each a single IP, but each IP uses BGP anycast. a.root-servers.net (198.41.0.4) is announced from 60+ locations. Total instances today: 1,900+. The 13 limit comes from the original UDP 512-byte priming query.
3.4 TLD Server
.com/.net (Verisign), .org (PIR), .uk (Nominet), .kr (KISA). Holds only 2LD NS records and glue.
3.5 Authoritative Server
The actual DNS data owner. Route 53, Cloud DNS, Cloudflare DNS, NS1, self-hosted BIND/Knot/PowerDNS. No caching — serves only its own data.
4. Resolution Process
4.1 Example: resolving www.example.com
Step 1: App calls getaddrinfo("www.example.com").
Step 2: Stub resolver sends UDP query to 1.1.1.1:
UDP, port 53
Query: www.example.com, type=A, class=IN, id=0x1234, RD=1
RD (Recursion Desired) = 1.
Step 3: Recursive resolver checks cache. On miss, iterates:
3.1 Root:
1.1.1.1 → a.root-servers.net (198.41.0.4)
Query: www.example.com, type=A
Response:
AUTHORITY: com. IN NS a.gtld-servers.net.
ADDITIONAL: a.gtld-servers.net. IN A 192.5.6.30
3.2 TLD:
1.1.1.1 → a.gtld-servers.net
Response:
AUTHORITY: example.com. IN NS a.iana-servers.net.
ADDITIONAL: a.iana-servers.net. IN A 199.43.135.53
3.3 Authoritative:
1.1.1.1 → a.iana-servers.net
Response:
ANSWER: www.example.com. IN A 93.184.216.34
Step 4: Recursive resolver caches and returns.
Step 5: App calls connect(93.184.216.34:80).
4.2 Cache Importance
- 90%+ of queries hit the recursive resolver's cache.
- 7% take 1–2 hops (
.comis usually cached). - Under 3% go full iterative to root.
That's how 10 trillion queries/day only generate tens of thousands of queries/sec at root servers.
5. DNS Message Format
5.1 Header
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|QR| Opcode |AA|TC|RD|RA| Z | RCODE |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| QDCOUNT / ANCOUNT / NSCOUNT / ARCOUNT |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- ID: 16-bit random, matches query to response.
- QR/AA/TC/RD/RA: query/response, authoritative, truncated, recursion desired/available.
- RCODE: 0=NOERROR, 2=SERVFAIL, 3=NXDOMAIN, 5=REFUSED.
Sections: Question, Answer, Authority, Additional.
5.2 EDNS(0)
Base DNS caps UDP at 512 bytes. EDNS(0) (RFC 6891) adds an OPT pseudo-RR advertising 4096-byte payloads plus the DO bit for DNSSEC. UDP fragmentation through firewalls is a common operational pain.
6. Record Types
A: IPv4. example.com. IN A 93.184.216.34
AAAA: IPv6.
CNAME: Alias. Cannot coexist with other records at the same name.
NS: Name server for a zone.
SOA: Zone metadata (serial, refresh, retry, expire, minimum TTL).
MX: Mail exchanger with priority.
TXT: Free text — SPF, DKIM, ownership verification.
SRV: Service location with port.
PTR: Reverse IP-to-name lookup.
CAA: Allowed certificate authorities.
DNSSEC types: DNSKEY (public key), RRSIG (signature), DS (parent-zone hash), NSEC/NSEC3 (authenticated denial).
HTTPS/SVCB (2020s): HTTP/3 connection hints at DNS time.
example.com. IN HTTPS 1 . alpn="h3,h2" port="443"
7. TTL and Caching
Every record has a lifetime in seconds:
www.example.com. 300 IN A 93.184.216.34
Tradeoffs
Short TTL (60s): Fast change propagation, high resolver load. Long TTL (86400s): Low load, changes take a day.
Typical values:
- A records (web): 300–3600s.
- MX: 3600–86400s.
- NS: 172800s.
- CDN traffic management: 30–60s.
Negative caching: NXDOMAIN is also cached; controlled by SOA minimum.
TTL Pitfall on Deploys
Reducing TTL the same day of a deploy is too late — the old TTL is still in caches. Correct order: reduce TTL 2 days before deploy → deploy day → change IP. Caches will then expire within 60s.
8. DNSSEC
8.1 Problem
Plain DNS has no validation. 2008 Dan Kaminsky disclosed cache poisoning by guessing the 16-bit ID. DNSSEC signs records cryptographically.
8.2 Concept
example.com. IN A 93.184.216.34
example.com. IN RRSIG A <signed by example.com ZSK>
RRSIG signatures come from the ZSK (Zone Signing Key), published as a DNSKEY record. The DNSKEY itself is signed by the KSK (Key Signing Key). The KSK's hash lives in the parent zone's DS record.
8.3 Trust Chain
Root KSK (trust anchor, manually distributed)
↓ signs Root ZSK
↓ signs .com DS record
↓ validates .com KSK
↓ signs .com ZSK
↓ signs example.com DS
↓ validates example.com KSK
↓ signs example.com ZSK
↓ signs www.example.com A record
8.4 NSEC / NSEC3
Proving "this name does not exist" without forgery.
NSEC: "between foo.example.com and qux.example.com nothing exists." Enables zone walking — privacy leak.
NSEC3: Stores hashed names instead. Still brute-forceable, but harder.
8.5 Why Only ~30% Adoption
- Operational complexity: key management, signature expiration, rollover.
- Expired signatures = total outage for the domain.
- CPU overhead; larger responses.
- UDP fragmentation issues.
- DDoS amplification risk.
8.6 2018 Root KSK Rollover
First-ever root KSK rotation. Most systems got new trust anchors via OS updates; outdated systems broke entirely. Next rollover scheduled on a 5-year cadence.
9. DoH / DoT
9.1 Problem
Plain DNS is unencrypted UDP — ISPs can log, governments can filter, middleboxes can forge.
9.2 DNS over TLS (DoT, RFC 7858)
Wraps DNS in TLS on port 853. Easy to identify and filter. Android 9+ Private DNS uses DoT.
9.3 DNS over HTTPS (DoH, RFC 8484)
POST https://cloudflare-dns.com/dns-query
Content-Type: application/dns-message
Runs over port 443, indistinguishable from normal HTTPS → very hard to filter. Firefox enables DoH by default; Chrome and Windows 10 22H2+ support it.
9.4 Controversy
Pros: privacy, anti-censorship. Cons: enterprise loses DNS-level control (malware blocking, parental filters).
9.5 ODoH (Oblivious DoH)
Adds a proxy layer so the resolver never sees the client IP. Cloudflare + Apple initial deployment.
10. 1.1.1.1 Internals
10.1 Architecture
Cloudflare 1.1.1.1 deploys across 300+ PoPs. The IP is announced by BGP anycast from every PoP — BGP routes users to the nearest instance.
10.2 Knot Resolver
Built on Knot Resolver (by CZ.NIC), heavily modified. Modular with Lua extensions, multicore scaling, default DNSSEC validation. Cloudflare adds DoH/DoT frontends, DDoS defense, minimal logging, plus variants 1.1.1.2 (malware blocking) and 1.1.1.3 (malware + adult).
10.3 Privacy
Query logs deleted after 24 hours; no PII; third-party audited (KPMG).
10.4 8.8.8.8 (Google)
Proprietary implementation, global anycast, ECS (EDNS Client Subnet) support for CDN geo-routing, DoH/DoT.
10.5 Quad9
Swiss non-profit at 9.9.9.9. Blocks malware domains by default, GDPR-compliant, DNSSEC validating.
11. CoreDNS — Kubernetes DNS
11.1 Why Kubernetes Needs DNS
Pods find services by name:
curl http://my-service.default.svc.cluster.local
11.2 Structure
Go-based, plugin architecture. Sample Corefile:
.:53 {
errors
health
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
11.3 Service Discovery
Each Service gets an automatic A record: my-service.default.svc.cluster.local. IN A 10.96.0.42. The search directive in /etc/resolv.conf enables short names.
11.4 Headless Service
No ClusterIP — DNS returns all pod IPs. Common with StatefulSets.
11.5 NodeLocal DNSCache
Runs CoreDNS on each node as a local cache. Pods query 127.0.0.1:53. Essential at thousand-node scale.
12. DNS-Based Load Balancing
Round-robin DNS: Multiple A records; resolver randomizes. Simple but cache-bound.
GeoDNS: Returns different answers by user location. Route 53 GeoDNS, Cloudflare Load Balancing, NS1. ECS (EDNS Client Subnet) passes the user subnet to the authoritative for finer geo accuracy.
Health-checked DNS: Providers remove unhealthy endpoints automatically; short TTLs enable failover.
Weighted routing: Canary deploys send 5% to a new version and ramp up.
13. Security Threats
13.1 Cache Poisoning (Kaminsky)
Mitigations: source port randomization, DNSSEC, 0x20 encoding (random case in query names).
13.2 Amplification
Tiny spoofed query, huge response (4096 bytes with DNSSEC) → 70x amplification aimed at victim. Mitigations: no open resolvers, BCP38 ingress filtering, Response Rate Limiting.
13.3 Hijacking
ISP NXDOMAIN rewriting, malware modifying /etc/hosts, state censorship. Defense: DoH/DoT, DNSSEC, VPN.
13.4 DGA
Malware generates thousands of daily domains algorithmically. Defense: entropy analysis of query names.
14. Debugging
14.1 dig
dig example.com
dig example.com MX
dig @1.1.1.1 example.com
dig +trace example.com
dig +dnssec example.com
dig +short example.com
+trace simulates iterative lookup from root — crucial for pinpointing where a chain breaks.
14.2 Common Issues
"DNS change not propagating": Check the old TTL. Flush local cache (sudo killall -HUP mDNSResponder, ipconfig /flushdns). Query authoritative directly.
"DNSSEC error": Use dig +dnssec +cd to bypass validation. DNSViz.net for chain inspection. Signature expiration is a frequent cause.
"Intermittent timeouts": Possibly EDNS fragmentation. Test with +noedns.
15. Modern Trends (2024–2025)
- DoH-by-default in browsers and OSes.
- SVCB/HTTPS records for HTTP/3 negotiation at DNS time.
- Encrypted Client Hello (ECH) — SNI is encrypted, public keys distributed via DNS SVCB.
- More TLDs + IDN —
.dev,.aws, plus Korean/Chinese domains like한국.kr. - DNS abuse policy — ICANN strengthening registrar responsibility.
16. Learning Resources
Books: "DNS and BIND" (Albitz & Liu), "Pro DNS and BIND 10", "Managing Mission-Critical Domains". RFCs: 1034/1035 (base), 4033–4035 (DNSSEC), 6891 (EDNS0), 7858 (DoT), 8484 (DoH). Sites: ICANN, IANA root zone, DNS-OARC. Tools: dig, drill, kdig, DNSViz, Wireshark.
17. Quiz
Q1. Why is "13 root servers" misleading?
A. It's 13 IP addresses, but each uses BGP anycast. Over 1,900 physical instances worldwide respond on the same IPs. The 13 limit came from fitting the priming query's UDP response in 512 bytes — not a physical server count.
Q2. Recursive vs iterative resolution?
A. Recursive: client says "go find the answer yourself." Iterative: "tell me what you know, I'll walk the chain." The stub resolver sends recursive queries to the recursive resolver, which sends iterative queries up the chain.
Q3. Why can't CNAME coexist with other records at the same name?
A. CNAME means "alias to another name." If the same name also had an A record, the resolver couldn't decide which to follow. RFC 1034 forbids it. Zone apex requires SOA/NS, so CNAME at the apex is impossible — Route 53 ALIAS/CDN flatteners are workarounds.
Q4. Why has DNSSEC adoption stalled near 30%?
A. High operational complexity and catastrophic failure modes. Expired signatures render a domain unresolvable. Plain DNS degrades gracefully under misconfig; DNSSEC does not. Risk high, reward unclear, so many operators delay.
Q5. Key difference between DoT and DoH?
A. DoT uses dedicated port 853, easy to identify/filter. DoH tunnels DNS in HTTPS on 443 — indistinguishable from normal web traffic, hard to censor. DoH is stronger for privacy; DoT is easier for network admins.
Q6. Why reduce TTL 2 days before a deploy?
A. The TTL-reduction change itself only reaches caches after the old TTL expires. Reduce 2 days ahead so that by deploy day all caches carry the shorter TTL — then the IP swap propagates in 60s.
Q7. How does DNS amplification work?
A. Attacker spoofs the victim's IP and sends small queries to open resolvers. Resolvers send large responses to the "sender" (= victim). 56-byte query → 4000-byte response = 70x amplification. Defenses: no open resolvers, BCP38 ingress filtering, Response Rate Limiting.
Related posts:
- "BGP Routing Deep Dive" — the protocol powering DNS anycast.
- "TLS/SSL Deep Dive" — TLS and ECH behind DoT/DoH.
- "CDN & Edge Caching Strategies" — DNS-based CDN routing context.
- "HTTP/3 & QUIC Deep Dive" — what HTTPS records are hinting at.