...

Reserve Cloud HGX H100 Capacity


OR

NVIDIA H100 SXM5:
Fastest LLM inference & training from $2.49/hr.

From deep learning training to LLM inference, the NVIDIA H100 Tensor Core GPU accelerates the most demanding AI workloads

Up to 30x improvement on LLM inference over the A100 on the largest models

Up to 4x improvement on training over the A100

Hundreds of H100s online, ready to deploy!

19,000 tokens/sec

Llama 7B inference speed using TensorRT-LLM in FP8

80 GB VRAM at 3.35 TB/s

The H100 has the most memory bandwidth of any of our GPUs

From just $1.91/hr

On TensorDock's cloud platform. See below for more pricing details

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

About the NVIDIA H100

...

About the NVIDIA H100 SXM5 GPU

The NVIDIA H100 is based on NVIDIA's latest GPU architecture, hopper. The H100 is engineered to excel in tasks involving deep learning, data analytics, and scientific computing.

The H100 is fast. Its 528 fourth-generation Tensor Cores and 16896 CUDA Cores enable it to provide breakthrough performance in AI model training and inference.

Apart from pure performance, its massive 80 GB of VRAM and 3.35 TB/s of memory bandwidth make it ideal for large-scale LLM, data analytics, and scientific computing tasks that require large amounts of fast memory.

The H100 also boasts advanced features like Multi-Instance GPU (MIG), enabling it to efficiently handle multiple workloads simultaneously — up to 7 10GiB VRAM instances per physical GPU. Overall, the NVIDIA H100 represents a significant leap forward in GPU technology, offering unmatched performance and efficiency for the most demanding computing tasks.

See full data sheet.

19,694 tok/sec LLaMa 7B

A token is a word or subword, so the H100 can generate 19,694 words per second on the LLaMa 7B model.

16,896 CUDA Cores

CUDA cores are the basic processing units of NVIDIA GPUs. The more CUDA cores, the better.

528 Tensor Cores

Tensor cores are specialized processing units that are designed to efficiently execute matrix operations, used in deep learning.

3,958 Teraflops FP8

With sparsity.

80 GB VRAM @ 3.35 TB/s

The more VRAM, the more data a GPU can store at once.

From just $2.49/hour, no commitments

Deploy the Powerful NVIDIA H100 GPU on TensorDock

Unleash unparalleled computing power for on the industry's most cost-effective cloud

NVIDIA H100

Each hostnode has:

  • 8x NVIDIA H100 SXM5 GPUs for 640 GB of combined VRAM
  • 2x Intel Xeon or AMD EPYC CPUs with 200+ combined threads
  • 27 TB local PCIe 4.0 NVMe SSD
  • 10 Gbps of public internet connectivity

We have a select number of hostnodes that we offer on-demand. You can deploy 1-8 GPU H100 virtual machines fully on-demand starting at just $2.49/hour depending on CPU/RAM resources allocated, or $1.91/hour if deployed as a spot instance. We are seeing high demand, so it is difficult to snag a multi-GPU H100 VM at this time.

As such, we higly recommending contacting us to reserve an entire 8x hostnode from our upcoming February 1st cluster.

Hourly Spot Bid Monthly Annual 3-Year
Price per GPU, 8x minimum From $2.49/hr* $1.91/hr** $2.57/hr $2.47/hr $1.91/hr
Bare metal hostnode N/A N/A $24.72/hr $19.76/hr $15.28/hr

* On-demand pricing is billed a la carte; allocating additional CPU/RAM/storage will increase the cost. Reserved instances come with 1/8th of the advertised resources above, per GPU. Thus, an 8x configuration would include all the available hostnode resources listed above. Additionally, hosts are listed at staggered pricing from $1.99 to $3.70/hr. Pricing at $2.49/hr is not guaranteed.
** The minimum bid is the lowest price you can bid, but actual pricing fluctuates based on market conditions.

Why deploy an NVIDIA H100 on the TensorDock Cloud?

...

Engineered for Excellence

We built our own hypervisor, our own load balancers, and our own orchestration engine — all so that we can deliver the best performance.

VMs in 10 seconds, not 10 minutes. Instant stock validation. Resource webhooks/callbacks. À la carte resource allocation and resizing.

Save with Storage-Only Pricing

For on-demand servers, when you stop and unreserve your GPUs, you are billed a lower rate for storage. You can always request an export of your VM's disk image.

...
...

Jupyter Notebook made easy

Deploy our machine learning image and get Jupyter Notebook/Lab out of the box. Slash your development setup times.

Reliable, Enterprise-Grade Infrastructure

Our NVIDIA H100 clusters are hosted in Evoque's Dallas data center with 24/7 security, redundant power, multihomed network feeds, and 100% network and power uptime SLAs.

With SSAE-18 SOC2 Type II, CJIS, HIPAA, & PCI-DSS compliance, you can trust that your most mission-critical workloads are in a safe place.

...
...

More than just H100s.

Hi! We're TensorDock, and we're building a radically more efficient cloud. Five years ago, we started hosting GPU servers in two basements because we couldn't find a cloud suitable for our own AI projects. Soon, we couldn't keep up with demand, so we built a partner network to source supply.

Today, we operate a global GPU cloud with 27 GPU types located in dozens of cities. Some are owned by us, and some are owned by partners, but all are managed by us.

In addition to GPUs, we also offer CPU-only servers.

We speak in tokens and iterations; in IB and TLC/MLC, and we're excited to serve you.

0

GPUs

0

vCPUs

0

GB RAM

... all deployed within the past 24 hours

Frequenty Asked Questions

?

Where is the H100 available for deployment?

The H100 is available for deployment at our secure Dallas data center, protected by 24/7 security, powered via redundant feeds, and backed by a 100% power and network SLA.

Experience sub-40 ms latencies to nearly every US population center for low-latency LLM inference traffic.

?

How do you guarantee security?

Every layer of our infrastructure is protected by a variety of security measures, ensuring privacy and security for our customers.

Read more about our security.

?

What virtualization? Docker containers?

We're thrilled to offer bare-metal virtualization for customers looking to rent full 8x configurations for a long period of time.

For on-demand customers, we offer KVM virtualization with root access and a dedicated GPU passed through. You get to use the full compute power of your GPU without resource contention.

?

How is billing done?

For our on-demand platform, we operate on a pre-paid model: you deposit money and then provision a server. Once your balance nears $0, the server is automatically deleted. For these H100s, we are also earmarking a portion to be billed via long term contracts.

Get started

Deploy an H100 GPU server from $2.49/hour.

Go ahead — go build the future of tomorrow — on TensorDock. Cloud-based machine learning and rendering has never been easier and cheaper.

Deploy an H100 GPU Server
...

World-class enterprise support

Delivered by dedicated professionals

Deploy your first TensorDock server.

And you'll never look back.

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...