Skip to content

필사 모드: What Happens When You Type a URL and Hit Enter: A Full-Stack Tour from Browser to Pixel

English
0%
정확도 0%
💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

Introduction — An Epic That Unfolds in Under a Second

"What happens when you type a URL into the address bar and press Enter?" It is a cliché interview question, but there is a good reason it endures. That single one-line action compresses nearly every layer of the web into one motion: DNS, routing, the transport layer, encryption, an application protocol, and a rendering engine. To explain any one of them well, you have to know a little about all of them.

This post walks that journey in order. From the instant the browser interprets your URL to the moment the first pixel lands on screen, we will trace what actually travels back and forth at each step. Our example address will be https://example.com/store/cart?id=42.

Step 1 — URL Parsing and Preprocessing

The moment you hit Enter, the browser first decides what the text in the address bar even is. Is it a URL, or is it a search query? If it has spaces or no dots, it goes to the default search engine; if it looks like a URL, parsing begins.

A URL breaks into several parts.

  https://example.com:443/store/cart?id=42#top
  └─┬─┘   └────┬────┘└┬┘└───┬────┘└──┬──┘└┬┘
  scheme    host    port   path     query fragment
  • Scheme: https. Which protocol to speak.
  • Host: example.com. The domain name of the server to reach.
  • Port: omitted here, so it defaults to 443 for https and 80 for http.
  • Path, query, fragment: which resource you want inside the server, which parameters to attach, and where to scroll within the document.

The browser already performs a few optimizations here. If the domain is on the HSTS (HTTP Strict Transport Security) list, even an http entry is forcibly rewritten to https. It also checks whether the resource is in the browser cache, a service worker, or the HTTP cache — and if so, it may skip the network entirely.

Step 2 — DNS Resolution: Turning a Name into an Address

The hostname example.com is for humans. Networks communicate with IP addresses, so we need to translate the domain name into one. That is DNS (Domain Name System) resolution.

The browser checks several layers of cache in order. If any layer has the answer, resolution ends immediately — an actual query only happens when every cache is empty.

  1. It checks the browser's own DNS cache.
  2. If that misses, it checks the operating system's resolver cache (and the hosts file).
  3. Still nothing? It asks the configured recursive resolver — usually your ISP's or a public DNS such as 8.8.8.8.

If the recursive resolver does not know the answer either, the real recursive query begins. The resolver visits a chain of authoritative servers.

  Recursive resolver
     │  1) "What is the IP of example.com?"
  Root server       -> "The .com people are over at that TLD server"
  .com TLD server   -> "The nameserver for example.com is here"
  Authoritative NS  -> "example.com is 93.184.216.34"
  Resolver caches the result and returns it to the browser

Each answer carries a TTL (Time To Live). For the length of the TTL, the resolver keeps that answer cached, so asking for the same domain again is fast — no trip back to the root required. This multi-layer caching is exactly why DNS usually feels instant.

For performance, browsers also do DNS prefetching — resolving names ahead of the actual load. Hover over a link and the browser may quietly start resolving it in the background.

Step 3 — The TCP Three-Way Handshake: Establishing a Connection

Now that we have an IP address, we need a connection to that server. Because https runs over TCP, the reliable transport layer, we establish a TCP connection before sending any data. This is the famous three-way handshake.

  Client                              Server
     │ ── SYN (seq=x) ─────────────►│   "I want to connect"
     │                              │
     │ ◄──── SYN-ACK (seq=y, ack=x+1)│  "OK, I'm ready too"
     │                              │
     │ ── ACK (ack=y+1) ───────────►│   "Confirmed, let's go"
     ▼                              ▼
      Bidirectional data can now flow

The connection is effectively established after two of the three messages have crossed. Each step exchanges sequence numbers (seq), which become the basis for ordering later data and detecting loss. This handshake costs at least 1 RTT (round-trip time). If the server sits on the other side of the planet, that single round trip alone can take hundreds of milliseconds.

For reference, the newer HTTP/3 uses UDP-based QUIC instead of TCP and folds connection setup and the next step — encryption — into one, cutting latency. But the classic TCP + TLS combination is the clearest way to understand the concepts, so this post follows that flow.

Step 4 — The TLS Handshake: Building a Secure Channel

The TCP connection stands, but the data flowing over it is still plaintext. The 's' in https means TLS (Transport Layer Security), and before any real HTTP data goes out, we build an encrypted channel first. The TLS handshake does three things: it authenticates that the server is who it claims to be, it exchanges the keys used to encrypt the traffic, and it negotiates which cipher to use.

Simplified to TLS 1.3, the flow looks like this.

  Client                                Server
     │ ── ClientHello ────────────────►│  supported cipher suites, key shares
     │                                 │
     │ ◄── ServerHello, Certificate ───│  chosen cipher, certificate, key share
     │      + Finished                 │
     │                                 │
     │ ── Finished ───────────────────►│  verification complete
     ▼                                 ▼
      From here on, all HTTP data flows encrypted

Unpacking the key steps:

  • Certificate verification: The server sends its certificate. The browser checks that it is signed by a trusted certificate authority (CA), that the domain name matches, and that it has not expired. If any of these fail, you get that scary "your connection is not secure" warning.
  • Key exchange: Using public-key cryptography (such as ECDHE), both sides derive a shared secret that an eavesdropper cannot recover even while watching every message go by. A symmetric key for encrypting the actual data is derived from that secret.
  • Cipher suite negotiation: They agree on which combination of encryption and hashing algorithms to use.

TLS 1.3 cut all of this down to 1 RTT, and for a server you have visited before, even 0-RTT resumption is possible — a big improvement over the 2 RTT of the older TLS 1.2. If you want to experiment with certificate verification, cipher suites, and handshake flows firsthand, this site's Auth & Security Lab lets you play with TLS and related concepts.

Step 5 — Sending the HTTP Request

With the encrypted channel in place, the browser can finally ask for what it came for. It builds and sends an HTTP request message. In HTTP/1.1 terms, a request is roughly this piece of text.

GET /store/cart?id=42 HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 ...
Accept: text/html,application/xhtml+xml
Accept-Encoding: gzip, br
Cookie: session=abc123
Connection: keep-alive

What each line means:

  • Request line: the method (GET), the path (/store/cart?id=42), and the protocol version.
  • Host header: when many sites share one IP (virtual hosting), this tells the server which domain you want.
  • Accept-family headers: tell the server which content formats and compressions you can accept.
  • Cookie header: sends back cookies the server previously set, preserving things like a login session.

In HTTP/2 and HTTP/3 these headers are not human-readable text but compressed binary frames, and one connection can carry many requests at once (multiplexing). But the information conveyed is the same as above.

Step 6 — The Server's Processing and Response

The request crosses the network and arrives at the server. Except "the server" is rarely a single machine. The request usually passes through several layers.

  Request --> [CDN / edge cache] --> [load balancer] --> [reverse proxy]
                 │(on a cache hit, responds right here)
                 --> [web/app server] --> [database / cache]
  • CDN and edge cache: static assets (images, CSS, JS) and cacheable pages are served directly by an edge server near the user, so there is no need to reach the origin at all — much faster.
  • Load balancer: spreads traffic across multiple server instances.
  • Application server: runs the actual logic. It routes, reads data from a database, and renders HTML or produces JSON.

Once processing finishes, the server returns an HTTP response. Its first line carries the status code.

HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Content-Encoding: br
Cache-Control: max-age=3600
Set-Cookie: session=abc123; HttpOnly; Secure

<!DOCTYPE html>
<html> ... </html>

A status code summarizes the outcome in three digits. 200 is success, 301 and 302 are redirects, 404 is not found, 500 is a server error. If you are curious about the precise meaning and nuance of these numbers, the HTTP Status Code Reference lets you inspect each one. The response body usually arrives compressed with gzip or br, so the browser has to decompress it first.

Step 7 — Parsing HTML and Building the DOM

As the browser begins receiving HTML bytes, it does not wait for the whole thing to arrive — it starts parsing as bytes stream in. The goal of parsing is to turn the document into a tree structure, the DOM (Document Object Model).

  HTML bytes --> tokenize --> create nodes --> DOM tree

During parsing the browser encounters additional resources it needs: CSS referenced by <link>, JavaScript referenced by <script>, images in <img>, and so on. Two properties matter a lot here.

  • CSS is render-blocking: the browser cannot safely paint the screen before it knows all the styles. So it defers rendering until it has fetched and parsed the CSS into the CSSOM (CSS Object Model).
  • Scripts can be parser-blocking: a plain <script> stops HTML parsing the moment it is encountered, downloads, and runs — because the script might modify the DOM. To avoid this you add async or defer to run it alongside parsing or to postpone it.

Even so, the browser runs a preload scanner that starts downloading resources it will soon need before full parsing reaches them. Thanks to that, other downloads keep going even while a script blocks the parser.

Step 8 — The Critical Rendering Path: Down to Pixels

Once the DOM and CSSOM are ready, the browser goes through a sequence of steps to combine them and draw the screen. This whole process is called the critical rendering path.

  DOM + CSSOM --> render tree --> layout --> paint --> composite

What each step does:

  • Render tree: combines the DOM and CSSOM, but only includes nodes that are actually visible. Elements with display: none are dropped here.
  • Layout (reflow): computes the geometry — where and how large each element sits on screen. When the viewport size changes, this has to be recomputed.
  • Paint: fills in the actual pixels of each element — colors, text, images, shadows.
  • Composite: merges the painted layers in the correct order into the final screen. The GPU accelerates this.

The performance-critical concepts here are reflow and repaint. Change an element's size or position with JavaScript and you trigger a reflow that recomputes layout; change only its color and you get a repaint that redraws without touching layout. Reflow is more expensive than repaint, so for smooth animation it is best to use properties that do not trigger layout (transform, opacity, and the like). Those are handled only in the composite step, which lets the browser skip layout and paint.

The Whole Flow, One More Time

Here is the entire journey at a glance.

  1. URL parsing       split the input string into scheme/host/path
  2. DNS resolution    domain name -> IP (multi-layer cache + recursive query)
  3. TCP handshake     establish a reliable connection via 3-way (1 RTT)
  4. TLS handshake     authenticate + key exchange + cipher negotiation (1 RTT)
  5. HTTP request      send method/headers/cookies
  6. Server processing generate the response through CDN/LB/app server/DB
  7. HTML parsing      bytes -> DOM, CSS -> CSSOM
  8. Rendering         render tree -> layout -> paint -> composite

Each step is deep enough to fill a book, but from the big-picture view they all serve one goal: turning "a human-readable address" into "pixels on a screen."

Wrapping Up

In the brief instant you type a URL and press Enter, wildly different kinds of work — name resolution, connection setup, identity verification, encryption, data transfer, document parsing, and screen rendering — mesh together in exactly the right order. Understanding this flow means that when you hit a performance problem, you know which step to suspect, and when a security warning pops up, you know what went wrong — far faster.

The next time a page feels slow to load, imagine where among these eight steps the time is leaking. Is it DNS, the handshake, server processing, or rendering? Simply being able to ask that question means you already see the web one layer deeper.

References

현재 단락 (1/121)

"What happens when you type a URL into the address bar and press Enter?" It is a cliché interview qu...

작성 글자: 0원문 글자: 11,403작성 단락: 0/121