[BCH] Benchmarks

Independent infrastructure performance data.

MLPerf-aligned training and inference benchmarks, W/FLOP power efficiency rankings, and real-world LLM throughput across H100, H200, B200, and MI300X.

12
model suites
240+
facilities ranked
0.42
best W/FLOP
Q1 '26
last refresh
[01] MLPerf · training + inference

GPU-class comparison · real workloads

WorkloadUnitH100H200B200MI300X
Llama 3.1 405B · pretrainmin to target412298168
Llama 3.1 70B · pretrainmin to target61442358
Mixtral 8x22B · finetunetok/s/gpu1820241041801990
SDXL · trainsamples/s31.239.862.429.1
Llama 3.1 70B · inferencetok/s · BS=1112148284126
GPT-OSS 120B · inferencetok/s · BS=321840239047201960
[02] W/FLOP leaderboard

Power efficiency · top facilities

#FacilityPUEW/FLOPFleet
#1Nebius EU-FI-011.130.4221,000× H200
#2Crusoe Iceland1.080.4612,800× H100
#3Google Hamina1.100.49TPU v5p · 8,960
#4Meta Eagle Mountain1.090.5124,576× H100
#5CoreWeave Plano1.180.5832,000× B200
#6AWS us-east-11.210.63p5 · undisclosed
#7Microsoft Quincy1.220.65ND H200 v5

METHODOLOGY · W/FLOP measured at sustained FP16 utilization > 80% over 24-hour windows. Facility-reported + cross-validated against grid telemetry.