比较对象:

        V100 SXM2、V100 PCIe、V100S PCIe

        A100 40GB PCIe、A100 80GB PCIe、A100 40GB SXM、A100 80GB SXM

        H100 SXM5、H100 PCIe

陪跑:4090

一、硬件参数

V100 SXM2 V100 PCIe V100S PCIe A100 40GB PCIe A100 80GB PCIe A100 40GB SXM A100 80GB SXM  H100 SXM5 H100 PCIe 4090
核心 GV100 GV100 GV100 GA100 GA100 GA100 GA100 GH100 GH100 AD102-300
架构 Volta Volta Volta Ampere Ampere Ampere Ampere Hopper Hopper Ada Lovelace
SM 80 80 80 108 108 108 108 132 114 128
CUDA Cores / SM 64 64 64 64 64 64 64 128 128 128
CUDA Cores / GPU 5120 5120 5120 6912 6912 6912 6912 16896 14592 16384
FP32 Cores / SM 64 64 64 64 64 64 64 128 128 128*
FP32 Cores / GPU 5120 5120 5120 6912 6912 6912 6912 16896 14592 16384
FP64 Cores / SM 32 32 32 32 32 32 32 64 64 2
FP64 Cores / GPU 2560 2560 2560 3456 3456 3456 3456 8448 7296 256
INT32 Cores / SM 64 64 64 64 64 64 64 64 64 64*
INT32 Cores / GPU 5120 5120 5120 6912 6912 6912 6912 8448 7296 8192
Tensor Core 1st 1st 1st 3rd 3rd 3rd 3rd 4th 4th 4th
Tensor Cores / SM 8 8 8 4 4 4 4 4 4 4
Tensor Cores / GPU 640 640 640 432 432 432 432 528 456 512
GPU 加速频率 (MHz) 1530 1380 1597 1410 1410 1410 1410 1830 / 1980** 1620 / 1755** 2520
显存 16 / 32 GB HBM2 16 / 32 GB HBM2 32 GB HBM2 40 GB HBM2 80 GB HBM2e 40 GB HBM2 80 GB HBM2e 80 GB HBM3 80 GB HBM2e 24 GB GDDR6X
显存位宽 (bit) 4096 4096 4096 5120 5120 5120 5120 5120 5120 384
显存带宽 (GBps) 897 897 1133 1555 1935 1555 2039 3352 2039 1008
一缓 (KB per SM) 128 128 128 192 192 192 192 256 256 128
二缓 (MB) 6 6 6 40 40 40 40 50 50 72
接口 SXM2 PCIe 3.0x16 PCIe 3.0x16 PCIe 4.0x16 PCIe 4.0x16 SXM4 SXM4 SXM5 PCIe 5.0x16 PCIe 4.0x16
TDP (W) 300 250 250 250 300 400 400 700 350 450
制程 TSMC 12nm FFN TSMC 12nm FFN TSMC 12nm FFN TSMC N7 (7nm) TSMC N7 (7nm) TSMC N7 (7nm) TSMC N7 (7nm) TSMC 4N (5nm) TSMC 4N (5nm) TSMC 4N (5nm)

* 4090 的 AD102-300 核心中每个 SM 单元中有 128 个 CUDA 计算单元,其中 64 个 CUDA 可以计算 FP32 或 INT32,另外 64 个只能计算 INT32。

** 第一项为 Tensor Core 计算 FP8、FP16、BF16、TF32 时的加速频率,第二项为 Tensor Core 计算 FP64 和 CUDA Core 计算 FP32、FP64 时的加速频率。

二、算力

1、CUDA Core 算力

浮点:TFLOPS

整型:TIOPS

取 A100 80GB PCIe 的算力为 100%

V100 SXM2 V100 PCIe V100S PCIe A100 40GB PCIe A100 80GB PCIe A100 40GB SXM A100 80GB SXM H100 SXM5 H100 PCIe 4090
FP32 15.67 14.13 16.35 19.5 19.5 19.5 19.5 66.9 51.2 82.6
FP16 31.33 28.26 32.71 78 78 78 78 133.8 102.4 82.6
FP64 7.834 7.066 8.177 9.7 9.7 9.7 9.7 33.5 25.6 1.29
BF16 NA NA NA 39 39 39 39 133.8 102.4 82.6
INT32 15.67 14.13 16.35 19.5 19.5 19.5 19.5 33.5 25.6

41.3

V100 SXM2 V100 PCIe V100S PCIe A100 40GB PCIe A100 80GB PCIe A100 40GB SXM A100 80GB SXM H100 SXM5 H100 PCIe 4090
FP32 80.4% 72.5% 83.8% 100% 100% 100% 100% 343% 263% 424%
FP16 40.2% 36.2.% 41.9% 100% 100% 100% 100% 172% 131% 106%
FP64 80.4% 72.5% 83.8% 100% 100% 100% 100% 343% 263% 13.3%
BF16 NA NA NA 100% 100% 100% 100% 343% 263% 212%
INT32 80.4% 72.5% 83.8% 100% 100% 100% 100% 172% 131% 212%

2、Tensor Core 算力

浮点:TFLOPS

整型:TIOPS

稠密/稀疏

取 A100 80GB PCIe 的算力为 100%

V100 SXM2 V100 PCIe V100S PCIe A100 40GB PCIe A100 80GB PCIe A100 40GB SXM A100 80GB SXM H100 SXM5 H100 PCIe 4090
FP8 NA NA NA NA NA NA NA 1978.9 / 3957.8 1513 / 3026 660.6 / 1321.2
FP16 125 112 130 312 / 624 312 / 624 312 / 624 312 / 624 989.4 / 1978.9 756 / 1513 330.3 / 660.6
BF16 NA NA NA 312 / 624 312 / 624 312 / 624 312 / 624 989.4 / 1978.9 756 / 1513 165.2 / 330.4
TF32 NA NA NA 156 / 312 156 / 312 156 / 312 156 / 312 494.7 / 989.4 378 / 756 82.6 / 165.2
FP64 NA NA NA 19.5 19.5 19.5 19.5 66.9 51.2 NA
INT8 NA NA NA 624 / 1248 624 / 1248 624 / 1248 624 / 1248 1978.9 / 3957.8 1513 / 3026 660.6 / 1321.2
INT4 NA NA NA 1248 / 2496 1248 / 2496 1248 / 2496 1248 / 2496 3957.8 / 7915.6 3026 / 6052 1321.2 / 2642.4
Binary NA NA NA 4992 4992 4992 4992 NA NA NA
V100 SXM2 V100 PCIe V100S PCIe A100 40GB PCIe A100 80GB PCIe A100 40GB SXM A100 80GB SXM H100 SXM5 H100 PCIe 4090
FP8 NA NA NA NA NA NA NA NA NA NA
FP16 40.1% 35.9% 41.7% 100% 100% 100% 100% 317% 242% 106%
BF16 NA NA NA 100% 100% 100% 100% 317% 242% 52.9%
TF32 NA NA NA 100% 100% 100% 100% 317% 242% 52.9%
FP64 NA NA NA 100% 100% 100% 100% 343% 263% NA
INT8 NA NA NA 100% 100% 100% 100% 317% 242% 106%
INT4 NA NA NA 100% 100% 100% 100% 317% 242% 106%
Binary NA NA NA 100% 100% 100% 100% NA NA NA

更多推荐