- Introduction — Debugging Is Not Guessing
- The Scientific Method — Hypothesis, Prediction, Experiment, Observation
- Binary-Search the Problem Space
- Minimal Reproducible Example — Caging the Bug
- git bisect — Which Commit Is the Culprit?
- print vs Debugger vs Logging — Tools of Observation
- Actually Read the Error
- "It's Never the Compiler"
- Rubber Duck — Explaining It Out Loud
- Putting It Together — One Bug, All the Way Through
- Wrapping Up
- References
Introduction — Debugging Is Not Guessing
A bug shows up. Payments occasionally fail in production. You can't reproduce it, the logs are vague, and the deadline is today. Here's what a lot of developers do next: stare at the code, change a line that "looks suspicious," run it again, and if that doesn't work, change another line, and another. I call this shotgun debugging — spray bullets everywhere and hope one hits.
The problem is that it occasionally works. So it becomes a habit. But a bug fixed by luck is a bug you don't understand, and a bug you don't understand will come back. Worse, in the process of "fixing" it, you've touched three other lines of perfectly good code.
The claim of this post is simple: debugging is a science. The way a scientist understands nature — form a hypothesis, predict what would happen if it were true, test it, and revise the hypothesis based on what you observe — maps directly onto code. You catch bugs by method, not by luck.
The Scientific Method — Hypothesis, Prediction, Experiment, Observation
The loop you run when you hit a bug is precisely the scientific method.
- Observe: What is happening? What exactly is wrong?
- Hypothesize: Propose one testable explanation for why.
- Predict: If that hypothesis is true, then if I do X, I should observe Y.
- Experiment: Do X.
- Observe: Did Y actually happen? If yes, the hypothesis gets stronger; if no, discard or revise it.
The crucial step is step 3, prediction. This is where good debugging and bad debugging part ways. Before you change any code, you should be able to say out loud, "if I change this, here is what I expect to happen." Changing code without a prediction isn't an experiment — it's just gambling.
Let's make it concrete. Start from the observation "payments occasionally fail."
- Bad approach: "Maybe it's a timeout. Let's bump the timeout." (No prediction, no idea why.)
- Good approach: "Hypothesis: the card processor's API sometimes takes more than 3 seconds, and our client cuts it off at 2. Prediction: if that's true, failed requests must show a timeout error in the logs, and successful requests should all be under 2 seconds. Experiment: pull the response-time distribution for failed vs successful requests."
The second approach teaches you something even when the experiment fails. If the failed requests turn out to be 500 errors, not timeouts, your hypothesis was wrong — but now you know the cause is your own server, not the processor. A wrong hypothesis still gives you information. That's the power of the method.
A hypothesis must be falsifiable. "The network is being weird" can't be tested, so it isn't a hypothesis. "Only requests from a specific region fail" can be tested, so it is.
Binary-Search the Problem Space
The single most powerful technique for catching bugs is binary search. It isn't just for finding a value in a sorted array — you use it to find the cause of a problem.
Here's the principle. Somewhere between where the bug appears (the symptom) and where the code is fine (the input), the cause is hiding. Pick a midpoint between them and ask, "is everything still fine here?" One experiment cuts the search space in half. Repeat, and even thousands of lines of code collapse to the culprit in about ten experiments — 2 to the 10th is 1024, after all.
Concretely, there are several axes you can binary-search along.
- Code path: if a request goes A → B → C → D, print the value at C. If it's fine, the cause is between C and D; if it's wrong, it's between A and C.
- Time (commit history): it worked yesterday but not today? Binary-search the commits in between. (This is exactly
git bisect, coming up next.) - Input data: crashing on a 100k-line input? Cut it to 50k and see if it still reproduces. If it does, the bad data is in the first half; if not, the second.
- Config / dependencies: turn off half your feature flags, or remove half your dependencies, and see which half breaks.
Binary search is powerful because each experiment is designed to yield maximum information. In information-theoretic terms, a question that splits the space in half extracts one full bit per experiment. Poking at a random spot yields far less.
Minimal Reproducible Example — Caging the Bug
"I can't reproduce it" is the most common wall in debugging. The decisive tool here is the minimal reproducible example (MRE).
An MRE is the smallest chunk of code that triggers the bug. The act of building one is debugging. You strip away everything in your application that's unrelated to the bug, one piece at a time, until you reach the smallest state that still reproduces it. During this process, one of two things happens.
- You remove something and the bug disappears → what you just removed is related to the cause. You've found your suspect.
- You remove everything and are left with 20 lines that still reproduce it → now you only have to stare at those 20 lines. It just became infinitely more tractable.
Rules for building an MRE:
- Remove external dependencies: replace the DB, network, and file system with hardcoded values. If the bug persists, the outside world isn't the cause.
- Minimize the data: if 1 of 100 records reproduces it, keep only that one.
- Make it self-contained: someone should be able to copy-paste it and run it as-is. When that works, asking a colleague — or filing an issue — becomes easy.
You'll often find the bug vanishes while you're building the MRE. That's not a failure. It means you passed over the cause while removing things; undo the last removal and the culprit is revealed.
git bisect — Which Commit Is the Culprit?
"It worked last week, and now it doesn't." The tool for this situation is git bisect. It's a hidden gem of git that runs a binary search over your commit history automatically.
The principle is the binary search from above, applied to time. You tell it an old commit that was fine (good) and the current commit that's broken (bad), and git checks out the commit halfway between them. You test whether the bug is present and report the result; git takes you to the midpoint of the remaining range. Even with 1000 commits, about ten steps narrow it to a single culprit.
# Start the binary search
git bisect start
# The current commit has the bug
git bisect bad
# This commit from 3 weeks ago was fine
git bisect good v1.4.0
# git now checks out a commit in the middle.
# Test the bug here, and depending on the result:
git bisect good # this commit is fine
# or
git bisect bad # this commit has the bug too
# Repeat, and git points to the culprit:
# "abc1234 is the first bad commit"
# When done, return to where you started
git bisect reset
The real magic is automation. If you have a script that judges the bug (for example, a test that returns a non-zero exit code on failure), git bisect run drives the whole process with no human in the loop.
git bisect start
git bisect bad
git bisect good v1.4.0
# exit 0 from the script means good, anything else means bad
git bisect run ./test-for-bug.sh
A few seconds later, the culprit commit appears. Look at that commit's diff and the cause is usually obvious. The fact that git bisect works best when commits are small and each one builds is yet another reason to keep commits granular.
If you want to get git bisect and other git workflows into your fingers, you can safely practice commits, branches, and bisecting in the Git Playground.
print vs Debugger vs Logging — Tools of Observation
To test a hypothesis, you have to observe the system's internals. There are three main tools of observation. None is the one right answer — you pick based on the situation.
print Debugging
The most primitive and most underrated tool. You drop a print (or console.log) and look at the value with your own eyes. It's easy to dismiss, but it's genuinely powerful.
- Pros: works everywhere, needs no setup. You see how a value flows over time at a glance (often faster for grasping flow than stepping through a debugger). Especially useful in async, multithreaded, or distributed environments where pausing with a debugger is hard.
- Cons: you have to touch the code. Forget to remove them and your logs get messy. May require a recompile/redeploy.
A tip for printing: label what you print. print("after validation, x =", x) beats print(x) by a mile. If several values matter, print them together so you can see the correlation.
Debugger
Set a breakpoint, pause execution, inspect the entire state at that instant (every variable, the call stack), and step through one line at a time.
- Pros: no code changes. You can see everything at the paused point. Conditional breakpoints ("only stop when i == 4821"), step in/over, call-stack inspection, even modifying variables mid-run. Unbeatable for understanding complex state or deep call stacks.
- Cons: needs setup. Async or timing-dependent bugs can vanish the moment you pause, because pausing changes the conditions (a "heisenbug"). Usually unavailable in production.
Logging
The grown-up version of print. You record structured logs permanently, with levels (DEBUG/INFO/WARN/ERROR).
- Pros: runs continuously in production. You can investigate past events after the fact (a debugger only shows what's happening right now). Levels control the noise, and structure makes logs searchable and aggregatable.
- Cons: you have to instrument ahead of time. If there's no log at the exact spot you need, it's useless. High volume becomes cost and noise.
In short: to dig into complex state in a reproducible local bug, use a debugger; to skim the flow quickly or in async/distributed settings, use print; to investigate past events in production, use logging. The people who move fluidly between all three are the ones who really catch bugs.
Actually Read the Error
This seems too obvious, yet it's astonishingly often ignored: read the error message from top to bottom, for real. Don't skim it, don't dismiss it, don't go "oh, that thing again" and move on — actually read it.
The error message and stack trace are the bug's confession of where and why it died. Yet many developers close their eyes the instant they see red text and switch into guessing mode — even though the answer is written right there.
What to extract when you read:
- The exact exception type and message: a
NullPointerExceptionand anIndexOutOfBoundsExceptionare completely different stories. Every word of the message is a clue. - File and line number: tells you exactly where it blew up.
- The stack trace: the top is where it blew up; going down is the chain of calls that led there. Find the first line that's "your code" (that's usually the real starting point, rather than deep inside a library).
- The "Caused by" chain: the true root cause is often hidden behind the last "caused by" at the bottom.
Simply copying the exact wording of the message and searching for it solves half of these. Just strip out anything environment-specific — a file path or an ID — before you search.
"It's Never the Compiler"
There's an old saying in the debugging world: "It's never the compiler." It's almost never the compiler's fault (nor the interpreter's, the runtime's, the standard library's, or a well-known framework's).
When you're stuck and frustrated, a thought starts to creep in. "Is this a language bug?" "Did the compiler optimize this wrong?" "Is this library broken?" It could be. But probabilistically, a mature tool is something millions of people hammer on and validate every day. Between your 100 lines of brand-new code and a compiler that has run hundreds of millions of times over ten years, which is more likely to be the one with the bug?
This matters practically because the moment you conclude "it's the tool's fault," you stop searching. The real cause is almost certainly somewhere in your assumptions, and blaming the tool means you never look there. So make your default posture "the culprit is in my code, my assumptions, my understanding." On the very rare occasion it really is a tool bug, that conclusion comes last — only after you've exhausted your own side.
A common trap in the same spirit: when a regular expression "just won't match," it's usually not the regex engine — it's your pattern. Rather than guessing in your head, drop the pattern and the input into a Regex Tester and observe what it actually matches. It's a hundred times faster.
Rubber Duck — Explaining It Out Loud
The last technique sounds silly but genuinely works: rubber duck debugging. Put a rubber duck on your desk and explain the problem to it, out loud, line by line.
Why does it work? When you read code with your eyes, your brain automatically skips over "well, this part is obviously correct." The bug is hiding inside that skipped assumption. But to explain it to someone (the duck), you have to put the assumption into words — and the moment you do, you catch yourself: "wait, is that actually true?" Being forced to explain turns implicit assumptions explicit.
This is the real mechanism behind asking a colleague for help and then going "oh, never mind, I just got it" and hanging up. The colleague said nothing. You found it yourself while preparing the explanation. The duck is free and infinitely patient, so explain it to the duck before you summon a human.
To boost the effect, explain very specifically, from the very basics. Not "this function gets the user and..." but "this function takes a user_id integer as an argument, fires this query at the DB, maps the first row of the result into this object..." The more specific you are, the more the hidden assumptions pop out.
Putting It Together — One Bug, All the Way Through
Let's weave the pieces into a single flow. Back to the original bug: "payments occasionally fail."
- Read: actually read the failed request's error log to the end.
PaymentGatewayTimeoutExceptionis at the bottom of the stack trace, behind "caused by." - Hypothesis: when the card processor's API is slow, we cut it off first.
- Prediction: if true, failed requests should all have died near our timeout value.
- Observe: pull the response times of failed requests. All right around 2000ms. The hypothesis gets stronger.
- Binary-search (time): ask "since when?" Run
git bisect. It points to the commit that reduced the timeout from 5 seconds to 2. - Minimal reproduction: reproduce locally with a mock that delays the processor by 2.5 seconds. The failure reproduces.
- Fix and verify: raise the timeout or add a retry. Prediction: with this, it should succeed even with the mock delay in place. Experiment. It succeeds.
Every step had a hypothesis and a prediction, and each experiment narrowed the search space. Not a single drop of luck. And above all, you know exactly why it's fixed. That's what it means to debug like a scientist.
Wrapping Up
The difference between people who are good at debugging and people who aren't is not how much they know — it's method. The good ones don't stare at code and guess. They observe, form a falsifiable hypothesis, make a prediction, cut the problem space in half, and actually listen to what the error is telling them.
The next time you hit a bug, ask yourself before you change any code: "What's my hypothesis? If I change this, what do I predict will happen?" If you can't answer those two questions, you're not ready to experiment yet. The moment you can, you stop being a gambler and become a scientist.
References
- Andrew Hunt & David Thomas, The Pragmatic Programmer (the debugging chapter): https://pragprog.com/titles/tpp20/the-pragmatic-programmer-20th-anniversary-edition/
- David Agans, Debugging: The 9 Indispensable Rules: https://debuggingrules.com/
- git bisect official docs: https://git-scm.com/docs/git-bisect
- "How to create a Minimal, Reproducible Example" (Stack Overflow): https://stackoverflow.com/help/minimal-reproducible-example
- Rubber Duck Debugging: https://rubberduckdebugging.com/
현재 단락 (1/91)
A bug shows up. Payments occasionally fail in production. You can't reproduce it, the logs are vague...