LLM Inference on Bare Metal GPU

Run your LLM pipeline on infrastructure built for it

Full GPU, Full VRAM

96 GB GDDR7 ECC per GPU, 768 GB total. No virtualisation layer, no shared tenants, no memory limits imposed by infrastructure.

FP4 & FP8 Precision

Native Blackwell precision support for modern LLM inference frameworks, vLLM, TGI, TensorRT-LLM, Ollama.

Predictable Cost

No egress fees, no hidden infrastructure costs. From $1.34/GPU/hr on a 12-month term.

Why bare metal matters for LLM inference

Shared GPU infrastructure introduces latency variance you cannot predict or control. When multiple tenants compete for the same memory bandwidth, inference throughput degrades, especially under load.

On 1Legion bare metal, you get the full GPU. No competing jobs. The same throughput at request 1 and at request 10,000.

The 8x RTX Pro 6000 Blackwell Max-Q server runs models up to 70B parameters in full precision, and larger models with FP4 quantisation, on dedicated hardware, with direct engineering support.

Apply for a Bare Metal Pilot →

LLM Inference on Dedicated Bare Metal

Run your LLM pipeline on infrastructure built for it

Full GPU, Full VRAM

FP4 & FP8 Precision

Predictable Cost

Why bare metal matters for LLM inference

Ready to deploy your LLM on bare metal?

LLM Inference on Dedicated Bare Metal

Run your LLM pipeline on infrastructure built for it

Full GPU, Full VRAM

FP4 & FP8 Precision

Predictable Cost

Why bare metal matters for LLM inference

Ready to deploy your LLM on bare metal?

Contact us

Quick Links

In the Spotlight