Is AMD’s Vega Faster than Other Accelerators in Machine Learning?



AMD tests Vega’s performance in machine learning applications

In the previous part of the series, we saw that Advanced Micro Devices’ (AMD) Radeon Vega Frontier Edition GPU (graphics processing unit) for pros cannot compete with NVIDIA’s (NVDA) Pro GPUs in terms of performance. Thus, AMD aims to beat NVIDIA in the price-to-performance ratio. Apart from gaming, the Vega Frontier is also designed to handle deep learning workloads.

AMD tested its Vega GPU’s machine learning application on Baidu’s (BIDU) DeepBench neural network training benchmark. AMD tested the time taken by an accelerator-powered server to train a particular machine learning model and found that:

  • AMD’s Vega Frontier was able to do the task in 88 milliseconds.
  • Intel’s (INTC) Knights Landing Xeon Phi 7250 was able to do the task in 569 milliseconds.
  • NVIDIA’s Pascal-based Tesla P100 was able to do the task in 122 milliseconds.

NVIDIA’s upcoming Volta GPU might outperform AMD’s Vega in this test. Just comparing the theoretical performance of Vega and Volta shows a significant performance difference.

Article continues below advertisement

AMD’s Vega versus NVIDIA’s Volta

NVIDIA plans to release its Volta GPU for the data center market in 3Q17. Volta would deliver 15 TFLOPS (Tera floating point operations per second) at FP32 single precision, which is higher than Vega’s 13.1 TFLOPs.

Moreover, Volta would feature special 16-bit Tensor Core tensor processing units that would deliver 120 TFLOPS of performance on machine learning training algorithms. AMD’s Vega doesn’t feature these tensor cores.

Both the GPUs would feature 16GB (gigabyte) HBM2 (High Bandwidth Memory), but NVIDIA’s Volta would deliver 900 Gbps (gigabits per second) of memory bandwidth, which is way above Vega’s memory bandwidth of 483 Gbps.

NVIDIA’s Volta would also feature NVLink 2 connector that can connect six GPUs with a bidirectional link bandwidth of 25 Gbps. AMD’s Volta wouldn’t feature any such special connector.

Price to performance

However, one area where AMD’s Vega could outperform is price. NVIDIA’s Tesla V100 data center CPU would be priced at around $150,000 while AMD’s Vega would be priced between $1,000 and $1,500. At such low price points, a GPU without a tensor core unit and NVLink interconnector delivering average single precision performance isn’t a bad bargain in terms of price to performance.

Using this price-to-performance strategy, AMD regained some share from NVIDIA in the discrete GPU market. We’ll look into this in the next part.


More From Market Realist