Skip to content
Published on

Concurrency vs Parallelism — Dealing With vs Doing Many Things

Authors

Introduction — Two Words That Keep Getting Mixed Up

"Concurrency" and "parallelism" are the pair of words developers confuse most often, because both give the impression that "several things are happening at once." But they are different concepts, and leaving the difference fuzzy sends you hunting for both performance problems and bugs in the wrong direction.

The clearest definition comes from Rob Pike, designer of the Go language.

Concurrency is about dealing with many things at once. Parallelism is about doing many things at once.

That one sentence holds the whole point. Concurrency is about structure — how you break work into pieces and coordinate them. Parallelism is about execution — whether multiple computations actually run at the same instant. Concurrency can exist without parallelism (even a single-core machine can deal with several tasks by interleaving them), and parallelism is just one way to realize a concurrent design.

This post starts from that distinction and maps out the core terrain of concurrent programming: threads and event loops, race conditions and locks, deadlock and atomic operations. If you want to watch an event loop actually interleave tasks, open the asyncio tab in the Message Queue Playground; to see a computation graph where many operations flow in parallel, open the Neural Net Lab alongside.

Understanding It Through a Café

Let us move Pike's definition into an everyday scene.

A café has one barista. Three orders come in. While pulling the espresso for the first order, the barista steams milk for the second, and in between prepares the cup for the third. One person, but dealing with three orders. That is concurrency: not literally doing two things with two hands at the same instant, but moving between tasks by using the waiting time.

Now hire three baristas. Each takes one order and truly makes it at the same time. That is parallelism. Three drinks are genuinely being made at the same instant.

The key insight is this. Concurrency is a way of structuring a problem into independently executable pieces; parallelism is actually distributing those pieces across multiple workers to run at the same time. A well-designed concurrent structure interleaves when there is one worker and scales out to run in parallel when there are many.

Threads vs Event Loops — Two Execution Models

There are two dominant ways to implement concurrency in code: the thread-based model and the event loop (async) model.

The thread model has the operating system create multiple execution flows (threads), and a scheduler assigns them to cores. With multiple cores, threads run in true parallel. Each thread has its own stack and proceeds independently, and the OS can pause one thread and switch to another at any time (preemptive scheduling).

The event loop model has a single thread that interleaves multiple tasks (coroutines, tasks) within it. It is exactly like the café barista. The moment a task waits on I/O (await), it is set aside and another task proceeds. Because the programmer marks the switch points explicitly, this is cooperative scheduling.

Thread model (preemptive)
  Thread A ████░░░░████░░░░       OS switches at arbitrary points
  Thread B ░░░░████░░░░████       true parallelism with multiple cores

Event loop model (cooperative)
  single thread  A-await-B-await-A-await-C ...
                 switches only where a task yields on its own

The fundamental difference between the two models is "when does the switch happen." Threads are switched by the OS at any moment. That enables true parallel execution, but for exactly that reason it is dangerous when touching shared data. The event loop switches only at explicit points like await, so it is predictable — but if one task refuses to yield and holds the CPU with computation, everything else stalls.

Race Conditions and Data Races

At the heart of why concurrent programming is hard sits the race condition: multiple execution flows access a shared resource, and the outcome depends on the order (timing) in which they run.

The most classic example is incrementing a counter. The single line counter += 1 is actually three steps.

1. read the value of counter    (say, 10)
2. add 1 to that value          (11)
3. write the result to counter  (11)

If two threads execute these three steps at the same time, this can happen:

Thread A: read(10) ....... add(11) write(11)
Thread B: ........ read(10) ....... add(11) write(11)
result: counter = 11  (incremented twice, but 11!)

Two additions, but the result is 11. One increment vanished. This situation — multiple threads accessing the same memory without synchronization, at least one of them writing — is specifically called a data race. A data race is one kind of race condition, and in many languages it is treated as undefined behavior.

Reproducing this in Python:

import threading

counter = 0

def increment():
    global counter
    for _ in range(100_000):
        counter += 1   # not atomic — a race can occur

threads = [threading.Thread(target=increment) for _ in range(4)]
for t in threads:
    t.start()
for t in threads:
    t.join()

print(counter)  # may print less than 400000

What makes race conditions frightening is their low reproducibility. Most of the time the order happens to line up and everything works, and it goes wrong only under high load or a specific timing. So it becomes the infamous kind of bug that never shows up in tests and only blows up in production.

Locks and Mutexes — Enforcing Order

The basic tool for preventing race conditions is the lock, and specifically the mutex (mutual exclusion). The idea is simple: allow only one execution flow at a time into the code region that touches the shared resource (the critical section).

import threading

counter = 0
lock = threading.Lock()

def increment():
    global counter
    for _ in range(100_000):
        with lock:          # only one thread inside this block at a time
            counter += 1

# now the result is always 400000

To enter the with lock block you must acquire the lock, and while one thread holds it, others wait until it is released. As a result the three steps of read-add-write execute atomically, without being split apart.

Locks are powerful but not free. They come with three costs.

  • Performance: acquiring and releasing a lock has a cost, and when multiple threads fight over the same lock (contention), parallelism disappears as they queue up and wait.
  • Granularity choice: a coarse-grained lock is safe but reduces parallelism; a fine-grained lock gives good parallelism but is complex and bug-prone.
  • Deadlock risk: with multiple locks, you can create a deadlock where each waits forever for the other's lock.

Deadlock and the Dining Philosophers

A deadlock is a state where two or more execution flows each wait for a resource the other holds, and no one can make progress. The classic problem that best illustrates it is the dining philosophers, posed by Edsger Dijkstra.

Five philosophers sit at a round table. In front of each is a plate of spaghetti, and between the philosophers lies one fork each — five in total. To eat, a philosopher needs both the left and right forks.

        philosopher0
     fork4       fork0
  philosopher4        philosopher1
     fork3       fork1
        philosopher3  fork2  philosopher2

  each philosopher needs both adjacent forks to eat

Now suppose every philosopher follows the same rule: "first pick up the left fork, then pick up the right fork." If everyone picks up the left fork at the same time, all five forks are each held by one hand. Now everyone tries to pick up the right fork, but the right fork is in the neighbor's left hand. No one gets a second fork, and no one puts down their first. Deadlock forever.

For a deadlock to arise, four conditions (the Coffman conditions) must all hold simultaneously.

  • Mutual exclusion: a resource can be used by only one at a time (forks cannot be shared).
  • Hold and wait: holding one resource while waiting for another (holding the left fork, waiting for the right).
  • No preemption: you cannot forcibly take a resource someone else holds.
  • Circular wait: the chain of waiting forms a cycle.

Breaking just one of these prevents deadlock. A classic solution is to break the circular wait. For example, number the forks and impose the rule "always pick up the lower-numbered fork first." Then only the last philosopher picks up in the opposite order, the cycle is broken, and the deadlock disappears.

Atomic Operations — Safe Without Locks

Locks are heavy. Grabbing and releasing a lock just to safely bump a single counter is overkill. So there are atomic operations, supported directly by the hardware.

An atomic operation is one the CPU guarantees as "a single, indivisible action." It executes the read-add-write we saw earlier as one instruction that no other thread can interrupt mid-way. The most representative one is CAS (Compare-And-Swap).

CAS(address, expected, new):
    if *address == expected:
        set *address = new and return success
    else:
        do nothing and return failure
    — all of this happens atomically

With CAS you can build a safe increment without a lock: "read the current value; if it is still the same, swap it for value + 1; if someone changed it in the meantime, we fail, so retry." This approach is called lock-free programming.

The advantage of atomic operations is that they carry no lock overhead and no deadlock risk. But they are not a cure-all. They are excellent for a simple single value like a counter or a flag, but in complex cases where several data structures must change consistently all at once, atomic operations alone are hard to express, and misused they invite subtle bugs (such as the ABA problem). So in practice you split the roles: atomic operations for simple cases, locks for complex ones.

When Async Wins, and When Threads Win

Now the most practical question. When you need concurrency, do you use async (an event loop) or threads (or multiprocessing)? The answer depends on the nature of the work.

Work falls broadly into two categories.

  • I/O-bound: work that spends most of its time waiting for something. Network responses, disk reads, database queries. The CPU is mostly idle.
  • CPU-bound: work that spends most of its time on actual computation. Image processing, encryption, numerical simulation. The CPU never rests.

This distinction decides the choice.

For I/O-bound work, async wins. Because there is a lot of waiting, a single thread can handle other tasks in the gaps of that waiting. Handling thousands of connections with one event loop uses far less memory and has lower switching cost than creating thousands of threads. "Lots of waiting" work like web servers, proxies, and crawlers fits here perfectly.

import asyncio

async def fetch(name, delay):
    await asyncio.sleep(delay)   # another coroutine runs during the wait
    return name

async def main():
    # three tasks concurrently — close to the max, not the sum, of the waits
    results = await asyncio.gather(
        fetch("a", 2), fetch("b", 1), fetch("c", 3),
    )
    print(results)

asyncio.run(main())

For CPU-bound work, you need threads — or more precisely, multiple processes. Computation offers no gap to yield, so on an event loop, one task holding the CPU stalls the rest. You need true parallel execution, which means distributing work across multiple cores. Here lies a Python-specific trap. CPython has the GIL (Global Interpreter Lock), so even with multiple threads, only one thread executes Python bytecode at a time. To run CPU-bound work in true parallel in Python, you therefore use multiprocessing (multiple processes), not threads.

In summary:

SituationGood choiceWhy
Lots of waiting I/O (network, disk)async / event loopreuses the gaps of waiting, lightweight
Heavy computation (CPU-intensive)multiprocessing / multiple coresneeds true parallel execution
I/O but no async library availablethread poolisolates the blocking call in a thread
CPU parallelism in Pythonmultiprocessingthreads don't parallelize due to the GIL

Common Misconceptions and Pitfalls

Points people trip over in concurrent programming:

  • The "async is always faster" myth: async only wins for I/O-bound work. Wrapping CPU-bound work in async blocks the event loop and makes it slower.
  • Blocking calls inside the event loop: calling a synchronous blocking function (plain time.sleep, a blocking DB driver, etc.) on a single-threaded event loop stalls the whole loop. Use an async-capable library or offload it to a separate thread.
  • The "more threads means faster" myth: threads each consume stack memory and have switching costs. Far more threads than cores actually slow things down through switching overhead.
  • Killing parallelism with a coarse lock: making the critical section large "to be safe" effectively serializes execution and erases the benefit of multiple cores.
  • Leaving a race condition as an "occasional bug": dismissing it because it is hard to reproduce comes back as data corruption in production. Shared state always needs a synchronization design.

Conclusion

Concurrency and parallelism look similar but sit at different levels. Concurrency is structuring work so it can be dealt with; parallelism is actually executing that work at the same time across multiple cores. A well-designed concurrent program interleaves with one core and scales out naturally to parallel with many.

And what makes that structure safe is the craft of synchronization: preventing race conditions with locks or atomics, avoiding deadlock with ordering rules, and choosing between async and multiprocessing according to the nature of the work (I/O or CPU). Keep this map in mind and you can hunt the two distinct classes of problem — "why is it slow" and "why is it occasionally wrong" — in far more accurate places.

Watch how an event loop moves between tasks in the asyncio tab of the Message Queue Playground, and explore a computation graph where operations flow in parallel in the Neural Net Lab.

References