Chaos and Order

💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

Prologue — What "The End of the Pandas Era" Actually Means

In 2025 and 2026, the most-quoted sentence in the dataframe community was this:

"I switched to Polars and the same code got 10x faster."

And right next to it, always:

"But to push it into sklearn I ended up calling to_pandas() anyway."

Those two sentences summarize the 2026 dataframe world precisely. Polars is taking almost all of the new workloads. And yet Pandas is not going away. Fifteen years of ecosystem inertia — sklearn input, statsmodels, plotting libraries, hundreds of thousands of Stack Overflow answers, every classroom — make Pandas the safe default.

This post tries to locate the balance point. The Polars 1.x stability promise, the Rust plus Arrow architecture, what the query optimizer actually does, how far Pandas 2.2 plus PyArrow caught up and where it stopped, how DuckDB — the other Arrow-native engine — beats Polars on some workloads and loses on others, where Dask and Ibis sit, the real cost of migrating. And at the very end — when you should still stay on Pandas.

1. The Foundation Is Arrow — The Real Hero of the DataFrame Revolution

The most common omission in Polars vs Pandas comparisons: both libraries sit on top of Apache Arrow. The revolution is not Polars. It is Arrow.

Why Arrow Matters

Arrow is a columnar in-memory format. NumPy is columnar too, but the differences are huge:

Language-neutral memory layout — Python, Rust, C++, Java, Go all share the same memory zero-copy.
Column-major — values of the same column live contiguously in memory. Ideal for SIMD.
Built-in type system — strings, lists, structs, dictionaries (categories), date, time, decimal all standardized.
Separate null bitmap — no more abusing NaN as null the way NumPy and Pandas 1.x had to.
Chunked arrays — split big columns into chunks, the substrate for lazy and streaming execution.

The one-line takeaway:

On top of Arrow, Polars, Pandas, DuckDB and Spark see the same memory. Conversion cost collapses toward zero.

df.to_arrow() then DuckDB reads it directly. DuckDB writes Arrow then Polars reads it directly. Zero-copy. This is why the dataframe world of the 2020s is converging.

Arrow vs NumPy Backend in Practice

Aspect	NumPy backend (Pandas 1.x)	Arrow backend (Polars / Pandas 2.2+)
Strings	`object` dtype, list of Python objects	`string` dtype, contiguous memory, 5–10x faster
Null	forced `NaN`, int column silently promoted to float	proper null bitmap, dtype preserved
Categorical	`category` dtype, custom impl	standard `dictionary` type
Timestamps	locked to `datetime64[ns]`	rich `timestamp[us, tz]` and friends
Zero-copy sharing	hard	immediate with DuckDB, Polars, Spark
SIMD utilization	limited	natural fit for columnar

That table is most of the reason Polars beats Pandas. Polars started here; Pandas is catching up via an opt-in backend in 2.x.

2. Polars 1.x — A Rust Core and a Stability Promise

Polars is a Rust dataframe library started by Ritchie Vink in 2020. 1.0 shipped in August 2024, and since then minors have been moving fast while the project promises public API stability. As of 2026 we are deep in the 1.x line (see the release notes at Polars GitHub Releases); breaking changes arrive as opt-in new APIs, and old APIs go through a deprecation cycle.

A Rust Core With Python Bindings

Core is Rust — the polars crate. Memory safety, SIMD, no-GIL parallelism.
Python is a binding — py-polars. Wraps the Rust core via PyO3.
R and NodeJS bindings sit on the same core.

What this means: Python's GIL does not slow Polars down. Group-by genuinely uses every core. NumPy parallelizes via BLAS, but the dataframe operations themselves (group-by, join) are stuck behind the GIL in Pandas.

The Polars Data Model

import polars as pl

df = pl.DataFrame({
    "user_id": [1, 2, 3, 4],
    "country": ["KR", "US", "KR", "JP"],
    "amount": [120.0, 95.0, 200.0, 75.0],
})
print(df.schema)
# Schema({'user_id': Int64, 'country': String, 'amount': Float64})

The schema is an ordered map from column name to Arrow dtype. Every dtype is explicit; there is very little inference magic.

Eager vs Lazy

Polars has two modes:

Eager — pl.DataFrame. Executes immediately, just like Pandas.
Lazy — pl.LazyFrame. Builds a query plan and only executes at .collect(), after optimization.

# eager
df = pl.read_parquet("sales.parquet")
big_kr = df.filter(pl.col("country") == "KR").select(["user_id", "amount"])

# lazy — build the plan, optimize and execute on .collect()
plan = (
    pl.scan_parquet("sales.parquet")          # I/O is deferred
      .filter(pl.col("country") == "KR")
      .select(["user_id", "amount"])
)
big_kr = plan.collect()

See the difference? scan_parquet does not read the file yet. The next chapter shows why that matters so much.

3. Lazy Evaluation and the Query Optimizer — Why Polars Feels Magical

Half of Polars' performance comes from the optimizer that rewrites the lazy plan. What does it do?

3.1 Predicate Pushdown — Push the Filter Into I/O

plan = (
    pl.scan_parquet("sales.parquet")          # 100 million rows
      .filter(pl.col("country") == "KR")       # 1 million KR rows
      .select(["user_id", "amount"])
)

The naive execution: read all 100M rows, then filter in memory. A hundred times too expensive.

What the optimizer does: push the filter into the scan. It reads Parquet row-group statistics (min and max) and skips row groups where country != "KR". Disk I/O drops by 100x.

3.2 Projection Pushdown — Read Only the Columns You Need

Even though select(["user_id", "amount"]) is at the end, Parquet is columnar, so only those two columns are read off disk. Pandas reads everything at pd.read_parquet() time and selecting later is too late. (Pandas does accept a columns= parameter, but it has no plan-level inference.)

3.3 Common Subexpression Elimination

plan = (
    df.lazy()
      .with_columns([
          (pl.col("price") * pl.col("qty")).alias("revenue"),
          (pl.col("price") * pl.col("qty") * 0.1).alias("commission"),
      ])
)

price * qty appears twice. The optimizer computes it once and reuses it.

3.4 Slice Pushdown, Type Coercion, Dead-Column Removal

If .head(100) sits at the bottom of a plan, only enough rows are pulled up to satisfy those 100. Columns unused after a join are dropped before the join. Implicit casts get reduced.

3.5 The Streaming Engine

collect(streaming=True) — and in 2026 the new streaming engine is approaching stable status (see the Polars blog for the streaming write-ups) — processes datasets that do not fit in memory in chunks. Out-of-core inside a single library.

Visualize the Plan

plan.explain()       # human-readable plan
plan.show_graph()    # graphviz; compare optimized vs naive

This is Polars' brain. Pandas has nothing like it. Pandas takes orders and executes them verbatim.

4. The Polars Expression API — One Series, One Expression

The biggest mental model shift in Polars is the expression.

import polars as pl

df.select([
    pl.col("amount").sum().alias("total"),
    pl.col("amount").mean().over("country").alias("country_avg"),
    (pl.col("amount") - pl.col("amount").mean()).alias("diff_from_mean"),
])

Each expression is a DAG that transforms columns. Where Pandas column manipulation is imperative, Polars expressions are declarative — closer to Spark DataFrame or SQL.

Mental model differences vs Pandas:

Operation	Pandas	Polars
Create a new column	`df["x"] = df["a"] + df["b"]` (assign)	`df.with_columns((pl.col("a") + pl.col("b")).alias("x"))` (expression)
Group-by aggregation	`df.groupby("k")["x"].sum()` (chained)	`df.group_by("k").agg(pl.col("x").sum())` (expression list)
Window	`df.groupby("k")["x"].transform("mean")`	`pl.col("x").mean().over("k")`
Multi-column transform	`for` loop or `apply`	one expression list, all at once

It feels awkward for a few days, then it clicks. And the key point is that an expression is a node in the optimizer's plan. Because expressions are declarative, Polars can optimize them.

5. Pandas 2.2 — How Far Did It Catch Up With the PyArrow Backend?

Pandas introduced the PyArrow backend in 2.0 (2023) and stabilized it in 2.2 (2024). As of 2026, Pandas 2.2.x is the de-facto standard while 3.0 has been in a long beta.

Turning On the Arrow Backend

import pandas as pd

# Option A — explicit Arrow dtypes
df = pd.read_parquet("sales.parquet", dtype_backend="pyarrow")

# Option B — set a Series dtype to Arrow
s = pd.Series([1, 2, None], dtype="int64[pyarrow]")

What you gain:

String columns become genuinely fast and use much less memory.
An int column with nulls no longer silently promotes to float (Int64[pyarrow] is nullable).
Date and time stay as Arrow timestamps.
Zero-copy interop with DuckDB, Polars, Spark.

What Pandas 2.2 Still Cannot Do

No lazy evaluation. Everything is eager. Predicate pushdown only via read_parquet(filters=).
No query optimizer. Pandas runs operations in the order you wrote them.
Single-threaded inside the GIL by default. You opt into numba, numexpr, or pyarrow.compute case by case.
No expression API. Composing multi-column expressions inside a group-by stays awkward.
Huge API surface. Pandas is a sprawling dictionary; a lot of implicit magic happens.

To summarize: the data structures moved to Arrow, but the execution engine is still 1990s imperative. That gap is why Polars wins even on the same Arrow foundation.

6. Polars vs Pandas — The Full Comparison Matrix

The same task, side by side in both libraries.

6.1 Read Parquet, Filter, Group-By

Pandas:

import pandas as pd

df = pd.read_parquet("sales.parquet", dtype_backend="pyarrow")
kr = df[df["country"] == "KR"]
result = (
    kr.groupby("user_id", as_index=False)["amount"]
      .agg(["sum", "mean", "count"])
)

Polars (lazy):

import polars as pl

result = (
    pl.scan_parquet("sales.parquet")
      .filter(pl.col("country") == "KR")
      .group_by("user_id")
      .agg([
          pl.col("amount").sum().alias("total"),
          pl.col("amount").mean().alias("avg"),
          pl.col("amount").count().alias("n"),
      ])
      .collect()
)

The line counts look similar, but the plan-level work is not. Polars pushes the filter into the scan and only reads three columns off disk.

6.2 Join

Pandas:

merged = pd.merge(orders, users, on="user_id", how="left")

Polars:

merged = orders.lazy().join(users.lazy(), on="user_id", how="left").collect()

A Polars join is parallel hash-join by default. The gap widens fast as the data grows.

6.3 Window Functions

Pandas:

df["avg_country"] = df.groupby("country")["amount"].transform("mean")
df["rank_in_country"] = df.groupby("country")["amount"].rank(method="dense")

Polars:

df = df.with_columns([
    pl.col("amount").mean().over("country").alias("avg_country"),
    pl.col("amount").rank(method="dense").over("country").alias("rank_in_country"),
])

If you group multiple windows into a single plan, Polars computes each partition only once.

6.4 The Comparison Matrix

Aspect	Pandas 2.2 + Arrow	Polars 1.x
Memory backend	NumPy default, Arrow optional	Arrow only
String performance	good with Arrow on	fast from day one
Null handling	OK with Arrow dtypes	standard null bitmap
Multi-core	partial via numba and numexpr	parallel by default, GIL-free
Lazy plan	none	LazyFrame
Query optimizer	none	predicate, projection, CSE and more
Streaming	none	streaming engine (out-of-core)
Expression API	none (chaining only)	first-class
Time series	mature and powerful	improving; some gaps remain
Plotting integration	seaborn and matplotlib directly	often needs `to_pandas()`
sklearn input	direct	needs `to_pandas()` or `to_numpy()`
Learning material	overwhelming	growing but smaller

On raw performance Polars wins decisively. Add ecosystem integration, learning material and legacy and Pandas still has the safety edge.

7. TPC-H Benchmarks — The Real Numbers

The Polars team publishes its own runs of all 22 TPC-H queries (see the Polars benchmark page; concrete numbers depend on SF, hardware and version, so this table is shape-only and meant to anchor decisions, not replace the originals).

Query	Pandas 2.x	Polars 1.x (lazy)	DuckDB	Dask
Q1 (simple group-by)	baseline	roughly 1/10 the time	roughly 1/15	roughly 1/3
Q3 (join + filter)	baseline	roughly 1/8	roughly 1/10	roughly 1/2
Q9 (multi-join, complex)	baseline	roughly 1/7	roughly 1/8	similar or slightly faster
Q21 (correlated subquery)	often OOMs	runs fine	very fast	can OOM
Larger-than-memory dataset	impossible	streaming engine handles it	handles it	distributed handles it

Key observations:

Polars and DuckDB are in the same league on a single node — query optimizer plus Arrow plus multi-core. Pandas trails far behind.
If you need distribution, you reach for Dask, Spark or Ray. Polars optimizes for single-node (or one large cloud VM).
For SQL-style analytics like TPC-H, DuckDB tops the chart most often. For a dataframe API, Polars wins.

Benchmarks live and die on "which workload". The table is a starting point, not the verdict.

8. Polars vs DuckDB — Two Faces of the Same Engine

This is the most interesting comparison of 2026. Both are Arrow-native, both are vectorized columnar engines, both massacre large data on a single node. The interface is what differs.

DuckDB — an embeddable SQL engine. duckdb.sql("SELECT ... FROM parquet_scan('sales.parquet') WHERE ..."). Heavy analytical SQL and OLAP. Windowing, complex joins and subqueries are unmatched.
Polars — a dataframe API. Declarative dataframe with expressions.

# Same task, two interfaces
# DuckDB
import duckdb
result = duckdb.sql("""
    SELECT user_id, SUM(amount) AS total
    FROM 'sales.parquet'
    WHERE country = 'KR'
    GROUP BY user_id
""").to_df()

# Polars
import polars as pl
result = (
    pl.scan_parquet("sales.parquet")
      .filter(pl.col("country") == "KR")
      .group_by("user_id")
      .agg(pl.col("amount").sum().alias("total"))
      .collect()
)

Where DuckDB Wins

Complex multi-table joins, deep CTEs, correlated subqueries — SQL reads more naturally.
Analysts already write SQL — zero cognitive cost.
BI tooling, Jupyter magic, dbt integration — nearly every BI tool can read DuckDB.

Where Polars Wins

Long pipelines of column-level transforms — expressions are cleaner than nested CTEs.
The codebase is already in Pandas — the migration surface is smaller.
ML feature engineering, where you want to hold the dataframe imperatively in Python.
User-defined Python functions sneak in (map_elements, pipe).

The Pattern of Using Both

# Receive in Polars, hand heavy SQL off to DuckDB, take the result back as Polars
import duckdb, polars as pl

lf = pl.scan_parquet("sales.parquet").filter(pl.col("amount") > 100)
out = duckdb.sql("SELECT country, percentile_cont(0.95) WITHIN GROUP (ORDER BY amount) AS p95 FROM lf GROUP BY country").pl()

On Arrow, the two exchange data zero-copy. Drop the "pick exactly one" obsession.

9. Dask and Ibis — The Supporting Cast

Dask — When You Need Out-Of-Core or Distribution

Dask is a distributed dataframe with a Pandas-like API. It launches Pandas DataFrames per partition and binds them into a lazy graph, then executes across a cluster.

import dask.dataframe as dd
df = dd.read_parquet("s3://bucket/sales-*.parquet")
result = df[df.country == "KR"].groupby("user_id").amount.sum().compute()

Dask keeps its identity: "Pandas, but distributed". By 2026 the 2024.x line and beyond ships a gradually-rolled-out query optimizer (Coiled has led the work) and Pandas 2.x PyArrow dtypes interop cleanly. On a single node it loses to Polars and DuckDB — by design. Tens of terabytes scattered across S3 is the real use case.

Ibis — A Backend-Agnostic DataFrame Spec

Ibis lets you write dataframe expressions and compile them to 20-plus backends: DuckDB, BigQuery, Snowflake, Polars, Pandas, Spark and more.

import ibis
t = ibis.read_parquet("sales.parquet")  # default backend is DuckDB
expr = t.filter(t.country == "KR").group_by("user_id").agg(t.amount.sum().name("total"))
expr.execute()  # returns Pandas

The same query runs against DuckDB, gets pushed to BigQuery, or hits Snowflake — without code changes. Push heavy work into the warehouse, test the same code locally.

Where Ibis sits in 2026:

A candidate for the dataframe interface standard. The PyData ecosystem is gradually coalescing here.
Strong choice for analytics teams that live in a cloud warehouse.
Backend quirks still leak through occasionally — the same expression does not run identically everywhere.

10. Migration Traps — The Real Cost of Pandas to Polars

On performance alone the move to Polars looks obvious. In practice, people hit a handful of traps.

10.1 No Index

Pandas centers on the index. Polars has no index. .set_index(), .reset_index(), .loc[], MultiIndex — all gone. Joins use explicit key columns and sorts are explicit. At first it feels annoying, but you end up with fewer bugs — no more silent corruption from implicit index alignment.

10.2 No `inplace`

# Pandas
df["x"] = df["a"] + df["b"]            # works
df.rename(columns={"a": "A"}, inplace=True)

# Polars
df = df.with_columns((pl.col("a") + pl.col("b")).alias("x"))
df = df.rename({"a": "A"})

Every operation returns a new DataFrame. You are forced into an immutable mindset.

10.3 The Death of `apply` (Kind Of)

# Pandas — apply is so common it feels like boilerplate
df["y"] = df.apply(lambda r: complicated_func(r["a"], r["b"]), axis=1)

In Polars, map_elements is the explicitly slow path. Combine expressions when you can. If you must, prefer map_batches for a vectorized batch function.

10.4 Group-By Result Shape

Whether a Pandas group-by returns a Series or a DataFrame depends on subtleties; Polars always returns a DataFrame. But the column-naming rules differ enough to cause KeyErrors — Polars makes you call .alias() explicitly.

10.5 Datetime Is Different

Pandas pins datetime64[ns]. Polars (via Arrow) uses Datetime("us", "UTC") and friends. Timezone comparisons get strict. Mixed-timezone columns will make Polars complain. Annoying at first, a blessing later.

10.6 CSV Type Inference Diverges

read_csv infers dtypes differently. Pandas peeks at a slice and guesses; Polars is more conservative. Get into the habit of declaring dtypes explicitly.

10.7 Method Names Are Subtly Different

reset_index — does not exist. concat(axis=1) — pl.concat([..], how="horizontal"). pd.melt — pl.DataFrame.unpivot. merge — join. rename — different signature. Find-and-replace alone will not get you there; a human has to read the code.

10.8 An Incremental Migration Strategy

# Receive data in Polars from the start
df = pl.read_parquet("a.parquet")
# Heavy transforms in Polars
df = df.filter(...).group_by(...).agg(...)
# Only convert to pandas at the sklearn or seaborn boundary
pdf = df.to_pandas()

This is the realistic pattern. Do not rewrite everything in one go. New pipelines go to Polars; existing ones stay as they are.

11. When You Should Still Stay on Pandas

The most honest chapter of this post. In 2026 it is still right to stay on Pandas in several cases.

11.1 sklearn Is the Endpoint

sklearn treats pandas DataFrames as first-class citizens. ColumnTransformer, OneHotEncoder(sparse_output=False), every step of a pipeline preserves column names. If you are going to call .to_pandas() right before sklearn anyway, the case for moving thins out.

11.2 statsmodels and Causal-Inference Libraries

Most statistics libraries (statsmodels, lifelines, dowhy, econml) assume Pandas input. Formula notation (y ~ x1 + x2) is bound to column names.

11.3 Plotting Integration

seaborn, plotnine and many plotly examples want Pandas. Polars now supports the __dataframe__ protocol so more libraries can read it directly, but Pandas is still the smoothest path.

11.4 Small Data and Quick Prototyping

Below a million rows the difference is tens to hundreds of milliseconds — irrelevant. Familiarity wins.

11.5 The Team Only Knows Pandas, and Migration Cost Exceeds the Gain

If the bottleneck is somewhere else (network I/O, model inference), moving to Polars barely changes end-to-end latency. The migration is pure cost.

11.6 Education

Fifteen years of lectures, Stack Overflow and books are in Pandas. The mass of learning material drives the learning cost.

Polars is becoming the default for new pipelines. Pandas survives as a glue language. Knowing both is the rational position.

12. The 2026 DataFrame Stack — A Decision Map

The same table from a different angle — "which workload picks which tool".

Workload	First choice	Second choice
In-memory analytical dataframe, from Python	Polars	DuckDB
Heavy SQL analytics in BI or Jupyter	DuckDB	Polars
TB scale, scattered across S3	Dask or Spark	Ray Data
Push down to Snowflake or BigQuery	Ibis	(warehouse SDK)
ML feature engineering	Polars	Pandas (with Arrow)
sklearn model input	Pandas	Polars then `to_pandas()`
Time series (statsmodels)	Pandas	(simple cases in Polars)
Small data, quick prototyping	Pandas	Polars
Standardized backend-agnostic expressions	Ibis	(custom wrapper)

Epilogue — The Balanced One-Liner

The one-sentence summary of this post:

Polars is winning almost all of the new workloads. Pandas survives in the swamp of sklearn, plotting and education. The 2026 answer is to know both and choose by workload.

A decision tree for the data engineer:

New pipeline, single-node analytics → Polars by default.
SQL is more natural and analysts share the work → DuckDB.
The data does not fit on one node → Dask, Spark or Ray.
The cloud warehouse is the real store → Ibis plus the warehouse.
The ML endpoint is sklearn → transform in Polars, call to_pandas() last.
Plotting or statistics is the final step → receive into Pandas.

12-Item Checklist

Is the dataframe backend Arrow? (Pandas: dtype_backend="pyarrow".)
Is your I/O Parquet? (Anchoring to CSV is a loss in any library.)
Are you using a lazy plan? (Polars' biggest win.)
Are filter and select near the top of the plan? (Predicate and projection pushdown.)
Are expressions inside group-by collected into an expression list?
Did you check whether the same subexpression appears twice? (CSE target.)
Do the join-key dtypes match on both sides?
Are timestamp timezones explicit?
Are you abusing map_elements?
Did you measure whether you need the streaming engine?
Did you batch the final to_pandas() once, right before plotting or ML?
Did you re-run the benchmarks on your own environment and data?

10 Anti-Patterns

Loading into Polars and immediately calling to_pandas() to keep writing Pandas code — no gain.
Skipping the lazy plan and collect()ing at every step — the optimizer cannot help.
Leaving map_elements in place where a vectorized expression would do.
Storing everything in CSV instead of Parquet — half of the Arrow benefit gone.
Bringing inplace=True mindset into Polars — it does not exist.
Comparing datetimes without timezones — Polars yells first; that is when you learn.
Introducing Dask or Spark before you actually need distribution — complexity spikes, performance drops.
Forcing SQL-shaped analytics into a dataframe API.
Quoting benchmark tables without re-running on your own data — weak foundation.
Mass-migrating off Pandas on the assumption it will "disappear soon" — pure cost.

Next Up

Candidates for the next post: DuckDB deep dive — everything about the embedded analytical engine, Apache Arrow — Flight, DataFusion, ADBC, PySpark 4.x with Polars and Ray Data — the 2026 distributed-dataframe map, A real Pandas to Polars migration case study.

"The engine is the same — it is Arrow. Only the dataframe API sitting on top differs. In 2026 the data engineer is not a believer in one library; they are someone who reads the whole stack."

— Polars 1.x vs Pandas, end.

Prologue — What "The End of the Pandas Era" Actually Means

1. The Foundation Is Arrow — The Real Hero of the DataFrame Revolution

Why Arrow Matters

Arrow vs NumPy Backend in Practice

2. Polars 1.x — A Rust Core and a Stability Promise

A Rust Core With Python Bindings

The Polars Data Model

Eager vs Lazy

3. Lazy Evaluation and the Query Optimizer — Why Polars Feels Magical

3.1 Predicate Pushdown — Push the Filter Into I/O

3.2 Projection Pushdown — Read Only the Columns You Need

3.3 Common Subexpression Elimination

3.4 Slice Pushdown, Type Coercion, Dead-Column Removal

3.5 The Streaming Engine

Visualize the Plan

4. The Polars Expression API — One Series, One Expression

5. Pandas 2.2 — How Far Did It Catch Up With the PyArrow Backend?

Turning On the Arrow Backend

What Pandas 2.2 Still Cannot Do

6. Polars vs Pandas — The Full Comparison Matrix

6.1 Read Parquet, Filter, Group-By

6.2 Join

6.3 Window Functions

6.4 The Comparison Matrix

7. TPC-H Benchmarks — The Real Numbers

8. Polars vs DuckDB — Two Faces of the Same Engine

Where DuckDB Wins

Where Polars Wins

The Pattern of Using Both

9. Dask and Ibis — The Supporting Cast

Dask — When You Need Out-Of-Core or Distribution

Ibis — A Backend-Agnostic DataFrame Spec

10. Migration Traps — The Real Cost of Pandas to Polars

10.1 No Index

10.2 No inplace

10.3 The Death of apply (Kind Of)

10.4 Group-By Result Shape

10.5 Datetime Is Different

10.6 CSV Type Inference Diverges

10.7 Method Names Are Subtly Different

10.8 An Incremental Migration Strategy

11. When You Should Still Stay on Pandas

11.1 sklearn Is the Endpoint

11.2 statsmodels and Causal-Inference Libraries

11.3 Plotting Integration

11.4 Small Data and Quick Prototyping

11.5 The Team Only Knows Pandas, and Migration Cost Exceeds the Gain

11.6 Education

12. The 2026 DataFrame Stack — A Decision Map

Epilogue — The Balanced One-Liner

12-Item Checklist

10 Anti-Patterns

Next Up

References

10.2 No `inplace`

10.3 The Death of `apply` (Kind Of)