How to Read Code You Didn't Write

Introduction — We Read More Than We Write
Find the Entry Point
Follow the Data, Not Every Line
Treat Tests and Types as Documentation
- Tests = Executable Specification
- Types = Contract
Draw the Call Graph — grep and IDE Jumps
Actually Run It and Add Prints
Change Something and See What Breaks
Read Top-Down, Then Drill
Putting It Together — An Unfamiliar Bug in 30 Minutes
Wrapping Up
References

Introduction — We Read More Than We Write

Picture your first week on a new team. Half a million lines of code, a framework you've never seen, a wiki that went stale two years ago, and the person who wrote the code has already left. Your job is to fix one bug somewhere in there. Where do you even start?

We tend to think of programming as "writing code," but the way developers actually spend their time is the opposite. A widely cited rule of thumb puts the ratio of reading to writing at roughly 10 to 1. To add a feature, you have to read and understand the existing code around it; to fix a bug, you read the code that causes it; to do a review, you read someone else's code. We are less writers than we are readers.

Yet, oddly, nobody teaches code reading systematically. There are endless courses and books on writing, while reading is left to "you'll get better at it eventually." This post collects concrete strategies you can use when you face an unfamiliar codebase. If I had to compress it into one line: don't try to read every line — follow the path the data flows along.

Find the Entry Point

The first thing to do when reading code is to find the entry point — the place where the program actually begins executing. Opening a random file and reading top-to-bottom is like being dropped into the middle of a forest with no map.

The entry point depends on the kind of program.

CLI tool: the main() function, or the bin field in package.json, or a script definition in pyproject.toml.
Web server: the routing table. The mapping "when a request hits this URL, this handler processes it" is your map. To understand a specific feature, find its API route first and step into the handler.
Frontend app: the root component (e.g. App) and the router config. Curious about a particular screen? Start from the component wired to that route.
Library: the public API. The index file, or the exports in the package manifest, is the "door people enter through." Drill inward from there.

A practical trick for finding entry points: grep for error messages, log strings, and text visible in the UI. If the screen shows "Payment failed," search for that string in the code. It usually teleports you straight to the feature you care about. This is a genuinely powerful technique that's easy to forget.

Follow the Data, Not Every Line

Trying to read unfamiliar code from start to finish, every single line, almost always fails. You can pour in days and still have no mental picture. The reason is simple: most of the code is unrelated to the question you're trying to answer right now.

A far better strategy is to follow the data. Pick one specific value, and trace only its flow — where it's born (input/creation), what transformations it goes through (processing), and where it ends up (storage/output). Ignore the rest of the code for now.

Say you're investigating "why does an uploaded image get saved rotated?":

Find where the image comes in (the upload handler).
See which variable holds that image data, and follow it. Which function is it passed to?
Inside that function, how is the image transformed (resize, format conversion, EXIF handling...)?
Where does it finally get saved?

Follow just that chain, and out of half a million lines you actually read a few dozen. For a rotation problem, the culprit will turn up near the EXIF handling in step 3. The unrelated auth logic, payment logic, and admin pages — you don't read a single line of them.

A useful thinking tool while following data is the shape of the data. At each step, keep asking, "what type is this value right now, and what's its structure?" Here it's a file-path string, here a byte buffer, here a decoded pixel array... tracking the change in shape makes the flow crisp.

Treat Tests and Types as Documentation

Official docs are usually stale or nonexistent. But two kinds of documentation that never lie are already sitting in the code: tests and types.

Tests = Executable Specification

If you want to know what a function does, read its tests first. A test is an executable user manual that shows, with concrete examples, "given this input, you get this output." And if CI is passing, that manual is true at this very moment. Unlike a stale wiki, tests are validated alongside the code.

What's especially valuable in tests:

Edge cases: the names of the describe/it blocks enumerate the boundaries the function cares about. "when the array is empty," "when negative," "under concurrent requests" — they reveal what the author worried about.
Usage examples: the arrange part of a test shows how the function is actually called. Learn the API here instead of from docs.

If you want to understand a feature and there are tests, pick one and step through it in a debugger. Nothing gets you up to speed faster than reading the code as it actually executes.

Types = Contract

In statically typed code (TypeScript, Rust, Go, type-hinted Python, and so on), the type signature is the contract. Before you read a function's body, the signature alone gets you halfway: it spells out what it takes (input types) and what it returns (return type).

Types are also the data's shape, nailed down in code. When you do the "follow the data" trace above, checking each step's type in your IDE lets you see the shape changing directly. Even in an untyped language, the validation logic at the top of a function or a doc comment plays a similar role.

Draw the Call Graph — grep and IDE Jumps

As you read, you run into two questions constantly: "where is this function defined?" and "who calls this function?" The ability to answer these two fast is what governs your reading speed.

Go to definition: teleport to where a symbol is defined. This IDE feature is a fundamental of code reading. Use it to drill top-down (from caller to callee).
Find references / find callers: conversely, find everywhere a function is used. Use it to trace bottom-up (from definition to usages). Essential when asking, "if I change this function, what gets affected?"

When you have no IDE, or several languages are mixed and the IDE can't follow, or you only have the code open on a server, grep (or ripgrep, rg) is the master key.

# Find everywhere this function is called
rg "processPayment\("

# Where in the code is this string (error message, label)?
rg "Payment failed"

# Who reads this environment variable?
rg "STRIPE_SECRET_KEY"

# Only certain extensions, with line numbers
rg -t ts "useAuth" -n

Repeat these jumps in your head and a call graph takes shape: "A calls B, B calls C and D, and C is called from all over." For complex flows, actually draw this graph on paper or a whiteboard. Even five or six nodes clears your head noticeably.

If you want to get these grep/jump workflows into your fingers alongside git-history exploration, you can practice hopping between branches to trace how code evolved in the Git Playground. Using git log -S"functionName" to find "when was this code added?" is another powerful weapon for reading code.

Actually Run It and Add Prints

You'll always reach a point where static reading alone won't crack it. You can read what the code "can do," but "what it actually did this time" you have to run to know. This is especially true when the branching is complex, values are decided at runtime, or the flow bounces around through callbacks and events.

So just run it. Then drop a print (or log) at the spot you're curious about and confirm the real value with your own eyes.

"Does execution actually reach here?": put print("HERE reached") in code whose reachability you doubt. If it doesn't fire, that branch isn't taken — and your understanding of the flow is wrong.
"What is this value here?": print the variable with a label to confirm its shape and content. The gap between your mental guess and reality is surprisingly wide surprisingly often.
Call order: put an entry log in several functions to reveal the real execution order. Especially useful in async code.

This is the dynamic version of "follow the data" from earlier. When static tracing stalls, plant a print at that spot to catch the real data. Alternating between reading and running is the fastest way to understand unfamiliar code.

Change Something and See What Breaks

There's a surprisingly effective, badly underrated way to understand code: deliberately change something and watch what breaks.

When you're unsure what a function, a setting, or a constant does, change it. Flip a value, comment out a line, set a constant to something absurd, then run it. The system answers you with its reaction.

You delete a line and nothing changes → that line doesn't matter (at least on this path). It might even be dead code.
You change a constant and a particular screen breaks → that constant is wired to that screen. You've learned one connection.
You hardcode a function's return value and three tests fail → those three tests depend on this function. The blast radius reveals itself.

This is active experimentation. Instead of passively staring at code, you ask the system a question and get an answer. Do it safely, of course — locally, not in production, in a state you can always revert with git. With git stash or a branch, smashing things to bits and then restoring is free. "If I don't understand it, break it and revert" is a powerful code-reading loop.

Read Top-Down, Then Drill

The right order for understanding an unfamiliar codebase is generally top-down. Drill into the details first and you'll see the trees but miss the forest. Grab the big picture first, then pick the parts worth going deep on.

The order I recommend:

Skim the directory structure. Folder names alone reveal the separation of concerns. api/, db/, components/, services/... build a map of what lives where.
Read the README, config files, and manifests. The dependency list in package.json/pyproject.toml compresses "what tools this project uses to do what." The script definitions tell you "how to build, run, and test this."
Grasp the architecture's layers. From a request coming in to a response going out, which layers does it pass through (router → controller → service → repository, say)? Once you have this skeleton, any new code you meet can be placed: "ah, this is the service layer."
Now pick one specific feature and drill deep, following the data. This is where entry-point finding, data tracing, prints, and grep all come together.

The key is the order wide and shallow → narrow and deep. Sink into one function from the start and you lose your bearings, because you don't know its place in the whole. Draw the map first, then pick one path and walk down it.

Putting It Together — An Unfamiliar Bug in 30 Minutes

Let's weave the pieces into one flow. First week, and you're handed the bug "the confirmation email after payment sometimes doesn't arrive."

Big picture (top-down): skim the directories and spot services/email/ and services/payment/. So email and payment are separated.
Entry point + grep: grep for a phrase that would be in the confirmation email ("Your order is complete"). Find the email template and the function that sends it, sendOrderConfirmation.
Find callers: find references to sendOrderConfirmation. It's called from the payment-success handler.
Follow the data: trace the chain from payment success to email dispatch. Along the way you see a step that "puts the email on a queue."
Read tests: read the tests for this dispatch logic and find a case: "if the queue is full, drop silently." That smells.
Change it + print: add a log at the enqueue point and try to reproduce. You watch the queue overflow under a traffic spike and some emails get dropped.

Thirty minutes ago the code was half a million lines of the unknown, and you pinned the cause by reading only the few dozen lines that mattered. Had you tried to read every line, the whole first day would be gone.

Wrapping Up

Code reading isn't a talent — it's a skill, and being a skill, it has methods. Gathering the core principles again:

Don't open a random file; find the entry point first.
Follow only the path the data flows along, not every line.
Use tests and types as documentation that doesn't lie.
Draw the call graph with grep and IDE jumps.
When static reading stalls, run it and add prints.
When you're unsure, change something and see what breaks.
Top-down for the big picture first, then depth.

The next time you're staring blankly at an unfamiliar codebase, resist the urge to open the first file and read from the top. Instead, ask: "Where's the entry point? What data will I follow?" Those two questions are the first trail you cut through the unknown forest.

References

Diomidis Spinellis, Code Reading: The Open Source Perspective: https://www.spinellis.gr/codereading/
Dustin Boswell & Trevor Foucher, The Art of Readable Code: https://www.oreilly.com/library/view/the-art-of/9781449318482/
ripgrep User Guide: https://github.com/BurntSushi/ripgrep/blob/master/GUIDE.md
Google Engineering Practices, Code Review Developer Guide: https://google.github.io/eng-practices/review/reviewer/