Power and Cooling in the AI Data Center — Infrastructure for the Gigawatt Era

Introduction
Why Power Suddenly Became the Problem
The Surge in Rack Power Density
The Limits of Air Cooling
Direct Liquid Cooling (DLC) and Immersion Cooling
- Direct Liquid Cooling
- Immersion Cooling
PUE — The Ruler for Power Efficiency
Grid and Generation Constraints
The Perf/Watt Race Among Chips
Carbon and Sustainability
Cost Structure
Siting and Supply Chain
Operational Challenges
Implications for Developers and Architects
Following the Flow of Power
Trade-offs in Choosing a Cooling Method
New Problems Created by Scale
Power Procurement Models
What to Observe
The Paradox of Efficiency and Demand — Jevons Paradox
Where Cooling Is Heading
Correcting Common Misconceptions
The Core, at a Glance
Conclusion
References

Introduction

A few years ago, the central question for a data center was "how densely can we pack the servers?" In 2026 the question has become "where do we get the electricity, and how do we cool the heat?" As AI capex exploded, a single campus is now discussed in hundreds of megawatts, and at the planning stage, in gigawatts. A gigawatt is roughly the output of one large nuclear power plant.

This essay frames the AI data center along two axes: power and cooling. The point is simple. The limit on infrastructure is no longer the money to buy chips, but the electricity to feed those chips and the cooling to handle the heat they pour out. We will calmly examine why developers and architects need to understand this constraint, and where the industry is heading.

Why Power Suddenly Became the Problem

Power became the number-one constraint because two trends collided.

First, the power draw of a single AI accelerator rose fast. Data-center-class GPU TDP was once around 300W, but the latest generations exceed 700W and approach 1000W per module. The faster the accelerator, the more electricity it eats and the more heat it throws off.

Second, those accelerators are bundled by the tens and hundreds of thousands into a single training/inference cluster. Multiply per-chip power by scale and total campus consumption becomes enormous in an instant.

power = (power per accelerator) x (number of accelerators) x (overhead factor)

a feel for the scale:
  1000W x 100,000 units = 100MW (accelerators alone)
  add CPU/network/cooling/losses and the campus reaches hundreds of MW

As a result, the bottleneck of the data center business moved from "land and buildings" to "power contracts and transmission lines." The first question when choosing a site is now "how many megawatts can we get here, and by when?"

The Surge in Rack Power Density

The basic unit of data center design is the rack. A traditional enterprise rack drew about 5-10kW per unit, well within what air cooling could handle.

AI racks are a different dimension.

Era/type	Power per rack (approx.)	Primary cooling
Traditional enterprise	5-10kW	Air
Early GPU cluster	15-30kW	Enhanced air
Current AI rack	40-80kW	Direct liquid (DLC) centric
Latest high-density rack	100kW+	Liquid/immersion required

Putting more than 100kW into one rack means that little cabinet generates the heat of dozens of household space heaters. To cool that with air would require an enormous volume of airflow, and beyond a point, air physically cannot do the job.

There is a clear reason to push density. Accelerators must sit close together for favorable interconnect (NVLink, etc.) latency and bandwidth, and more compute can be extracted from the same floor area. So the industry refused to give up density and chose instead to change the cooling method.

The Limits of Air Cooling

Air cooling was the data center standard for a long time: send cold air to the front of servers and pull hot air out the back. Simple and proven, but the physical limits are clear.

Air is an inefficient medium for carrying heat. For the same volume, water can carry thousands of times more heat than air, because air's specific heat and density are low.

heat-carrying capacity (rough intuition)
  air   : low (small specific heat and density)
  water : about 3500x the volumetric heat capacity of air

to remove the same heat:
  air -> needs huge airflow and fan power
  liquid -> handles it with small flow

Once you pass roughly 30kW per rack, air cooling hits several problems. Spinning fans harder raises fan power and noise, hot spots become hard to control, and ultimately the chip throttles (forced performance reduction), so an expensive accelerator cannot deliver its worth. This is where water enters.

Direct Liquid Cooling (DLC) and Immersion Cooling

Direct Liquid Cooling

The mainstream for current AI racks is direct liquid cooling, especially the cold plate approach. A metal plate carrying cold liquid is pressed directly onto the chip to take the heat. The warmed liquid goes to a heat exchanger (CDU) at the back of the rack or the end of the row, cools, and circulates again.

flow of cold-plate direct liquid cooling

  chip --- cold plate (liquid channel) --- manifold --- CDU
   ^                                                   |
   |_________________ cooled liquid loop ______________|

  CDU: Coolant Distribution Unit (separates primary/secondary loops)

The advantages are clear: it removes heat right above the chip with high efficiency, handles high density, and greatly reduces fan power. In exchange, it requires new facilities and operational know-how — piping, leak management, manifolds, CDUs.

Immersion Cooling

A more aggressive approach is immersion cooling. The entire server is submerged in a special non-conductive (dielectric) fluid. Single-phase keeps the fluid from boiling and merely circulates it; two-phase has the fluid boil on the chip and carry heat away as vapor.

Method	Principle	Characteristics
Cold-plate DLC	Run liquid through a plate	Current mainstream, easy to fit existing form factors
Single-phase immersion	Submerge in dielectric fluid	Very high density, removes fans
Two-phase immersion	Boil fluid into vapor	Highest efficiency, demanding fluid/seal management

Immersion is powerful in density and efficiency, but its operational difficulty — fluid cost, maintenance access, component compatibility — keeps adoption focused on specific niches for now.

PUE — The Ruler for Power Efficiency

No discussion of data center efficiency is complete without PUE (Power Usage Effectiveness).

PUE = total facility power / IT equipment power

PUE = 1.0  -> all power goes to IT (ideal, unrealistic)
PUE = 1.5  -> 0.5 units of cooling/loss per 1 unit of IT
PUE = 2.0  -> half is non-IT overhead (inefficient)

The closer PUE is to 1, the smaller the cooling and power-conversion losses. Well-designed modern hyperscale centers reach PUE in the low 1.1 range. One major reason liquid beats air on PUE is reduced fan power.

But judging everything by PUE alone leads to a trap. PUE is an "overhead ratio," not "total power." Even with great PUE, if the absolute consumption is enormous, the burden on the grid and the environment remains large. That is why companion metrics such as water usage effectiveness (WUE) and carbon usage effectiveness (CUE) are used alongside it.

Grid and Generation Constraints

Even after solving chips and cooling, one wall remains: can you actually bring that electricity in?

Building one gigawatt-class campus means placing a load on the regional grid equal to a large power plant. Transmission upgrades, substations, and grid stability all become new constraints. And building new generation or transmission takes years, while AI demand moves on a quarterly cadence. That time gap is the greatest tension of 2026 infrastructure.

As a result, the industry pursues several paths at once.

A siting strategy of choosing locations near generation sites or transmission hubs.
Locking in power early via on-site generation (gas turbines, etc.) or long-term power purchase agreements (PPAs).
Direct procurement of renewables plus large-scale energy storage (batteries) to absorb variability.
Discussions of demand flexibility, such as running data centers harder when power is abundant.

As power itself becomes a scarce resource, the proposition that "whoever secures electricity secures AI capacity" is starting to hold.

The Perf/Watt Race Among Chips

With demand-side pressure this large, the heart of the supply-side answer also converges on "performance per watt (perf/watt)." Extracting more compute from the same electricity buys one notch of relief from the power constraint.

The 2026 trajectory shows this directly.

After Blackwell (GTC 2026), NVIDIA set a goal for next-generation Vera Rubin to adopt HBM4 and raise perf/watt by roughly 10x. When performance per watt jumps by a single-digit multiple, a far larger scale can run on the same power budget.
Google aimed squarely at efficiency with TPU v6 Trillium (about 4.7x peak versus the prior generation) and the inference-specialized 7th-generation Ironwood.
Cloud providers rapidly expanding their own inference ASICs (inference-ASIC share rising from about 15% in 2024 toward an expected 40% in 2026) is also a move to lift perf/watt by tailoring chips to workloads. NVIDIA still holds roughly 75-80% of the accelerator market, with AMD MI350X joining the competition.

One important balance: better perf/watt does not lower total power. As efficiency improves, demand to run larger models and more inference grows accordingly, so absolute consumption tends to rise. Efficiency gains delay the power constraint; they do not remove it.

Carbon and Sustainability

When power consumption grows enormous, carbon emissions and environmental impact follow. The sustainability debate for AI data centers splits into several threads.

Power source: for the same amount of power, the carbon footprint differs greatly depending on whether it comes from coal or from renewables/nuclear. So siting and power procurement are, in effect, a carbon strategy.
Water use: cooling, especially evaporative cooling, uses a lot of water. In water-stressed regions this becomes a social constraint. Closed-loop liquid designs and free-air cooling aim to reduce water use.
Waste heat reuse: cases of supplying warmed coolant to nearby district heating, turning waste heat into a resource, are increasing.
Lifecycle: the impact of the full process including chip and server manufacturing and disposal (embodied carbon) is increasingly considered.

Sustainability is not only a matter of regulation and reputation; in an environment where power itself grows scarce, it operates as a real operational constraint.

Cost Structure

The total cost of ownership (TCO) of an AI data center has a different center of gravity than traditional IT.

a feel for rough proportions (over campus lifecycle)

  accelerator/server capex ....... large share
  power (operating electricity) .. fast-growing share
  cooling facilities/ops ......... non-negligible
  building/land .................. relatively smaller
  network/storage ................ situational

Two key shifts. First, as operating electricity grows as a share of lifecycle cost, perf/watt and PUE translate directly into money. Second, because accelerator capex is so large, keeping accelerators busy rather than idle (utilization) becomes the key to cost efficiency. If poor cooling throttles the chips, that is direct waste — an expensive asset that cannot deliver its worth.

Siting and Supply Chain

Site selection criteria have changed. Where distance to users (latency) and land cost once dominated, now the following come first.

Power availability: how many megawatts, by when. The most decisive factor.
Cooling resources: ambient temperature (potential for free cooling) and water availability.
Power price and source: is there cheap, clean electricity?
Permitting speed: do transmission and environmental permits come quickly?

On the supply chain side, bottlenecks are spread across not just accelerators but high-bandwidth memory like HBM, advanced packaging like CoWoS, power-conversion gear, and cooling components. A jam in any one delays the whole schedule. So large operators pre-secure (forward-purchase) components and power on multi-year horizons.

Operational Challenges

Even after the design is done, operations are a separate challenge.

Leak management: liquid cooling means running water next to electrical gear, so leak detection and shutoff design are essential.
Dynamic thermal swings: training workloads make power draw surge and dip. When tens of thousands of accelerators ramp load up and down together, both power and cooling experience sharp swings.
Integrating heterogeneous facilities: air-cooled zones and liquid-cooled zones, and hardware of different generations, coexist in one campus.
Reliability: a single failure can cascade into massive downtime, so both power and cooling need redundancy and fast incident response.
Monitoring: temperature, power, and flow must be observed densely at the rack and chip level to catch hot spots and anomalies early.

Implications for Developers and Architects

Even for developers and system architects who are not on the infrastructure team, this trend is not someone else's problem.

Efficiency is cost and availability. Serving a model more efficiently (quantization, smaller models, batching) saves power and cost at once, and serves more users within scarce capacity.
Design for utilization. Scheduling, batching, and autoscaling that keep expensive accelerators from sitting idle translate directly into infrastructure efficiency.
A sense for workload placement. Which job runs in which region/time affects power price and carbon. Latency-tolerant batch jobs can be moved to cheaper or cleaner times and places.
Design assuming constraints. The assumption that "GPUs can be scaled infinitely" is no longer safe. Capacity is bound by the physical limits of power and cooling.

Following the Flow of Power

On the path from power plant to chip, several conversions and losses intervene. Understanding this path reveals why PUE exceeds 1 and where efficiency can be squeezed.

the journey of power (outline)

  power plant
    | transmission (high voltage, low loss)
  substation
    | data center intake (medium voltage)
  UPS / distribution
    | conversion loss
  server PSU (AC -> DC)
    | conversion loss
  board VRM (DC -> low voltage)
    | conversion loss
  chip (actual computation)

At each conversion step, a little power leaks away as heat. So the industry works hard to reduce conversion steps or raise their efficiency. High-voltage DC (HVDC) distribution, more efficient power supplies, and 48V DC designs are all attempts to cut these losses. Even if a chip draws 1000W, supplying that 1000W requires pulling more electricity at the facility level because of conversion and cooling losses. That difference is precisely PUE.

One important intuition here: the heat from the chip does not disappear. Nearly all the electricity that goes in must eventually become heat and leave the building. A campus consuming 100MW is effectively a 100MW heater, and cooling away and expelling all that heat is the essence of cooling.

Trade-offs in Choosing a Cooling Method

Which cooling to choose is not simply a question of "what is better" but a balance across several axes.

Consideration	Air	Direct liquid (DLC)	Immersion
Density it handles	Low	High	Very high
Upfront investment	Low	Medium	High
Operational difficulty	Low	Medium	High
Compatibility with existing gear	Good	Moderate	Low
Maintenance access	Good	Moderate	Demanding

The reason many operators choose direct liquid cooling as the current-generation standard is that it handles high density while fitting relatively well with existing rack and server form factors. Immersion goes further in efficiency and density, but its operational transition cost is high, so it is adopted carefully.

Another axis is "how cold do you cool?" Cooling as cold as possible is not always best. Lowering coolant temperature too far raises chiller power and worsens PUE. So the latest designs prefer "warm-water cooling" — cooling with relatively warm water within what the chip can tolerate — because it can be cooled by ambient air alone (free cooling), reducing chiller power.

New Problems Created by Scale

At hundreds of megawatts or gigawatts, problems appear that did not exist at small scale.

Concurrency of power swings: tens of thousands of accelerators ramp load up and down at the same step of the same training job. This synchronized surge shakes the campus-wide power sharply and burdens the grid. So techniques to deliberately spread or buffer the load are studied.
Inertia of cooling: cooling systems cannot react instantly to load changes. Thermal buffering is needed to bridge the lag between a sudden heat surge and the cooling response.
Cascading failures: a power or cooling problem in one zone can halt the operation of an entire huge cluster. So both power and cooling are made redundant, but the dilemma is that redundancy itself draws more power.
Supply chain synchronization: accelerators, memory, power gear, and cooling components must arrive on the same schedule to power up a campus. A delay in one component leaves billions in assets idle.

Scale is favorable for efficiency and cost, but it also summons new system-level challenges like these. Operating a giant campus is not simply several small data centers combined, but a qualitatively different engineering problem.

Power Procurement Models

As power becomes a scarce resource, "how to secure electricity" has become a core business capability. Let us compare a few models.

Model	Method	Advantages	Limits
Grid power	Drawing from the existing grid	Simple, fast start	Bound by available capacity/upgrade speed
Long-term PPA	Long purchase contract with a generator	Price stability, renewable sourcing	Needs siting/contract negotiation
On-site generation	On-site generation such as gas turbines	Bypasses grid limits, fast to secure	Carbon/fuel/permitting burden
Renewable + storage	Solar/wind + batteries	Carbon reduction	Variability, large storage cost

In reality these are blended. Base load is carried by the grid and PPAs, while variability and emergencies are backed by on-site generation and storage. The key is locking in not just "is there electricity now" but "will the promised electricity still arrive five years from now." Because AI demand grows faster than generation and transmission can be built, the operator who secures power first secures capacity first.

A new perspective is added here: the temporal flexibility of power. Latency-tolerant workloads like training can be moved to times when electricity is cheaper or cleaner. Running harder when the grid has surplus and easing off when it is short extracts more compute within the same power contract and contributes to grid stability.

What to Observe

Operating giant infrastructure lives on dense observation. The key metrics are these.

observation layers (outline)

  facility level : total power, PUE, WUE, ambient temperature, chiller load
  rack level     : rack power, inlet/outlet temperature, coolant flow/pressure
  server level   : PSU efficiency, fan speed, board temperature
  chip level     : chip temperature, power, clock, whether throttling occurs

These metrics must be gathered in real time to catch hot spots early, adjust cooling before chips enter throttling, and immediately shut off anomalies like leaks. In particular, chip-level throttling is a direct signal that "an expensive asset is not delivering its worth," so it deserves the most sensitive attention.

Observation also serves after-the-fact analysis. Logging which time windows see power swing, which rack runs unusually hot, and how PUE shifts with season improves the next campus design and operating policy. In the end, the operation of power and cooling can only be improved as much as it can be measured.

The Paradox of Efficiency and Demand — Jevons Paradox

When talking about power and cooling, there is an easy misconception: the expectation that "as chips become efficient, total power will fall." Reality is often the opposite.

Economics has a concept called the Jevons paradox: when the efficiency of using a resource improves, that resource becomes cheaper and easier to use, so total consumption actually rises. AI hardware is walking exactly this path.

efficiency improves -> less electricity per task
        -> compute cost falls
        -> demand to run bigger models and more inference rises
        -> total power consumption actually increases

Even if next-generation chips raise perf/watt by roughly 10x, if demand fills in with that much larger models and more usage, the campus's absolute power does not fall. This is why data center power demand is projected to keep rising despite efficiency gains.

The lesson of this paradox is not pessimism but a sense of reality. Efficiency gains are clearly valuable, but they alone will not make the power problem solve itself. Efficiency, power procurement, siting, workload flexibility, and sustainability must all be handled together for gigawatt-era infrastructure to work. There is no solution that leans on any single one.

Where Cooling Is Heading

As rack density keeps rising, cooling technology does not stop evolving either. Let us outline a few directions.

Ubiquity of liquid cooling: direct liquid cooling is no longer a special design but is becoming the default premise of high-density AI racks. As standard components and design practices settle in, the barrier to adoption falls.
Into the chip package: attempts to remove heat closer to the chip continue. As in research that runs liquid at the chip-package level, cooling increasingly burrows close to the chip.
Expansion of warm-water cooling: the trend of cooling with relatively warm water to reduce chiller dependence and increase free-air cooling grows stronger. It improves PUE and water use at once.
Turning waste heat into a resource: cases of supplying warmed coolant to district heating or nearby facilities, turning discarded heat into value, increase.

The core trend converges on one thing: removing heat close to its source, with as warm a medium as possible, using as little additional energy as possible. For the same power, reducing the overhead spent on cooling improves PUE that much and lowers cost and environmental burden.

That said, no cooling technology changes the fundamental problem. The electricity that goes in eventually becomes heat, and that heat must leave somewhere. The evolution of cooling is about handling that heat more efficiently, not eliminating the heat itself. So cooling and power are designed together as an inseparable pair.

Correcting Common Misconceptions

Finally, let us clear up a few common misconceptions surrounding AI data center power and cooling.

"Good PUE means green": PUE is just an overhead ratio. You must look at absolute power, power source, and water use together to know the true environmental impact.
"Liquid cooling is too risky to use": leak management is indeed demanding, but liquid cooling has effectively become the standard for high-density AI racks. The design and operational know-how have matured.
"If chips get efficient, power falls": as seen with the Jevons paradox, efficiency gains tend to grow demand and actually increase absolute consumption.
"Power can be bought with money alone": building transmission and generation takes years, so even with money you often cannot get megawatts right away. Power is a resource that takes time.
"Move everything to the edge and you need no data centers": the edge absorbs some inference, but training and large-model inference still belong to the data center. The two are a division of labor.

What these misconceptions share is simplifying a complex problem to a single metric or a single solution. Gigawatt-era infrastructure is a system problem of intertwined constraints, and it must be viewed with a balanced perspective. No single number (TOPS, PUE, TDP) declares good or bad on its own. They must be read together in context.

The Core, at a Glance

Let us bundle the discussion so far briefly.

Axis	Past	Now (2026)
Top constraint	Land/buildings	Power/cooling
Rack density	5-10kW	40-100kW and beyond
Cooling	Air	Direct liquid/immersion
Site selection	User distance/land price	Power availability/source
Chip competition axis	Absolute performance	Performance per watt
Cost center of gravity	Capex	Capex + operating electricity

As this table shows, almost every axis of AI infrastructure has been reorganized around "physical constraints." No matter how fast the chip, if you cannot feed it electricity and cool its heat, it does not become capacity. So understanding infrastructure is, in effect, understanding the limits and possibilities of AI.

Conclusion

An AI data center is no longer "a building full of servers" but closer to "a machine that turns enormous electricity into compute, and then back into heat." Infrastructure in the gigawatt era is defined not by the money to buy chips but by the ability to secure electricity and to cool the heat.

Rack density now exceeds 100kW, cooling has crossed from air to water, and chips try to delay the constraint through a perf/watt race (next-generation Vera Rubin targeting roughly 10x). Yet within the paradox that demand grows as efficiency improves, power and cooling will remain the hardest constraints of AI infrastructure. Understanding this constraint is not just the infrastructure engineer's job — it belongs to everyone who designs efficient models and systems.

References

NVIDIA Data Center / Blackwell: https://www.nvidia.com/en-us/data-center/
Google Cloud TPU: https://cloud.google.com/tpu
The Green Grid (PUE and efficiency metrics): https://www.thegreengrid.org/
Uptime Institute (data center operations/reliability): https://uptimeinstitute.com/
IEA reports on data centers and power: https://www.iea.org/
Open Compute Project (open hardware/cooling): https://www.opencompute.org/
SemiAnalysis (data center/power analysis): https://www.semianalysis.com/
ASHRAE (data center thermal guidelines): https://www.ashrae.org/