NVIDIA’s Tesla V100 GPU at a glance
NVIDIA’s (NVDA) Pascal-based Tesla P100 GPU (graphics processing unit) is powering most of the DLT (deep learning training) across the world. Now, the company is targeting inferencing, wherein the computer uses training to act in the real world.
NVIDIA has launched its next-generation GPU architecture Volta on the Tesla platform, which is used in data center applications. The Tesla V100 GPU is priced at $150,000 and built on TSMC’s (TSM) 12 nm (nanometer) process, featuring 5120 cores, HBM2 (high bandwidth memory) memory, NVLINK 2.0, and tensor cores for deep learning.
V100’s inferencing capability
Until now, almost all inferencing happened on CPU (central processing unit). NVIDIA is now offering a TensorRT 3 inference optimizer and runtime that delivers 100 times faster, inferencing on V100. The optimizer supports the industry’s most widely used AI (artificial intelligence) frameworks, Google’s (GOOG) TensorFlow and Facebook’s (FB) Caffe.
But GPUs are not believed to be an ideal solution for inferencing. Addressing this concern, NVIDIA’s CFO (chief financial officer) Colette Kress stated that GPUs may not be suitable for all types of inferencing and that NVIDIA believes that its GPUs are ideal for more complex types of inferencing.
Pascal versus Volta
Wccftech.com compared Volta GPU’s performance with its predecessor Pascal and found that NVIDIA has brought a significant generational leap in performance with Volta. Below are a few details:
- Wccftech noted that V100 delivers 120 TFLOPs (tera-floating-point operations per second) performance, compared with P100’s 10 TFLOPs performance. Volta provides 12 times more DLT power and six times more inferencing power than P100.
- NVIDIA has also increased its memory bandwidth from 720 Gbps (gigabytes per second) to 900 Gbps.
- NVIDIA has increased L1 cache memory by almost eight times from 1.3MB (megabyte) to 10MB.
- The NVLINK 2.0 feature almost doubles the internal bandwidth, from 160 Gbps to 300 Gbps.
Wccftech.com reported that Tesla V100 was tested against Tesla P100 on single-core Geekbench 4 computing tests, which showed that V100 is 132% faster than P100.
Volta-based DGX-1 system
NVIDIA has also launched its DGX-1 supercomputer, featuring eight Tesla V100 GPUs. The DGX-1 features 40,960 CUDA Cores and 5120 Tensor Cores with 128 GB of HBM2 memory. The system features Intel’s (INTC) 20 core, 40 thread, dual Xeon E5-2698 V4 processor with 2.2 GHz (gigahertz) clock speed.
The Volta-based DGX-1 was tested against Pascal-based DGX-1 on Geekbench 4, and it was found that the addition of Tensor cores in Volta boosted FP16 computing performance from 170 TFLOPs to 960 TFLOPs.