Tuesday, January 25, 2022

The Mysterious GPU-N from NVIDIA might be its next-gen Hopper GH100 GPU in disguise

- Advertisement -

In a new research paper published by the green team, a mysterious NVIDIA GPU known as GPU-N has been revealed, which could be the first look at the next-gen Hopper GH100 chip (as discovered by Twitter user, Redfire).

The research paper ‘GPU Domain Specialization via Composable On-Package Architecture’ discusses a next-generation GPU design as the most practical solution for boosting Deep Learning performance by maximizing low-precision matrix math throughput. The ‘GPU-N’ and its COPA designs, as well as their possible specifications and simulated performance results, have been discussed.

134 SM units are believed to be included in the ‘GPU-N.’ (vs 104 SM units of A100). This brings the total number of cores to 8576, which is a 24% increase over the present Ampere A100 solution. The chip’s theoretical clock speed is 1.4 GHz, which is the same as the Ampere A100 and Volta V100 (not to be confused as the final clocks).

Other features include a 60 MB L2 cache, which is a 50% improvement over the Ampere A100, and a 2.68 TB/s DRAM bandwidth that can scale up to 6.3 TB/s. The HBM2e DRAM has a capacity of 100 GB, which can be increased to 233 GB using COPA implementations. It’s built around a 6144-bit bus interface with 3.5 Gbps clock speeds.

ConfigurationNVIDIA V100NVIDIA A100GPU-N
GPU frequency (GHz)
FP32 (TFLOPS)15.719.524.2
FP16 (TFLOPS)125312779
L2 cache (MB)64060
DRAM BW (GB/s)9001,5552,687
DRAM Capacity (GB)1640100

In terms of performance, the ‘GPU-N’ (probably Hopper GH100) generates 24.2 TFLOPs of FP32 (a 24 percent increase over A100) and 779 TFLOPs of FP16 (a 2.5x increase over A100), which seems very near to the 3x gains rumored for GH100 over A100. The FP32 performance of the Instinct MI250X accelerator is less than half that of AMD’s cDNA 2 ‘Aldebaran’ GPU (95.7 TFLOPs versus 24.2 TFLOPs), however, the FP16 performance is 2.15x higher.

NVIDIA’s H100 accelerator would be based on an MCM solution and use TSMC’s 5nm process node, according to previous information. Hopper is expected to have two next-generation GPU modules, totalling 288 SM units. We can’t give an exact number of cores because we don’t know how many are in each SM, but if it sticks to 64 cores per SM, we’ll have 18,432 cores, which is 2.25x more than the full GA100 GPU configuration. NVIDIA could also use more FP64, FP16, and Tensor cores in its Hopper GPU to greatly improve performance. And that will be required to compete with Intel’s Ponte Vecchio, which is expected to include 1:1 FP64.

- Advertisement -

The final configuration is likely to have 134 of the 144 SM units activated on each GPU module, implying that we’re looking at a single GH100 chip in action. However, without GPU Sparsity, NVIDIA is unlikely to achieve the same FP32 or FP64 Flops as MI200.

NVIDIA, on the other hand, may have a hidden weapon in the form of Hopper’s COPA-based GPU implementation. NVIDIA mentions two Domain-Specialized COPA-GPUs based on next-generation architecture, one for HPC and the other for DL. The HPC variant uses a relatively normal method, with an MCM GPU and HBM/MC+HBM (IO) chipsets, but the DL variation is where things start to get interesting. The DL version includes a large cache on a separate die that is coupled to the GPU modules.

ArchitectureLLC CapacityDRAM BWDRAM Capacity
Perfect L2infiniteinfiniteinfinite

There are several variants with up to 960 / 1920 MB of LLC (Last-Level-Cache), up to 233 GB of HBM2e DRAM capacity, and up to 6.3 TB/s of bandwidth. These are all hypothetical, but given that NVIDIA has already discussed them, we could see a Hopper variant with such a design at GTC 2022.

NVIDIA Hopper GH100 ‘Preliminary Specs’:

NVIDIA Tesla Graphics CardTesla K40
Tesla M40
Tesla P100
Tesla P100 (SXM2)Tesla V100 (SXM2)NVIDIA A100 (SXM4)NVIDIA H100 (SMX4?)
GPUGK110 (Kepler)GM200 (Maxwell)GP100 (Pascal)GP100 (Pascal)GV100 (Volta)GA100 (Ampere)GH100 (Hopper)
Process Node28nm28nm16nm16nm12nm7nm5nm
Transistors7.1 Billion8 Billion15.3 Billion15.3 Billion21.1 Billion54.2 BillionTBD
GPU Die Size551 mm2601 mm2610 mm2610 mm2815mm2826mm2TBD
SMs1524565680108134 (Per Module)
FP32 CUDA Cores Per SM1921286464646464?
FP64 CUDA Cores / SM6443232323232?
FP32 CUDA Cores2880307235843584512069128576 (Per Module)
17152 (Complete)
FP64 CUDA Cores9609617921792256034564288 (Per Module)?
8576 (Complete)?
Tensor CoresN/AN/AN/AN/A640432TBD
Texture Units240192224224320432TBD
Boost Clock875 MHz1114 MHz1329MHz1480 MHz1530 MHz1410 MHz~1400 MHz
2496 TOPs with Sparsity
FP16 ComputeN/AN/A18.7 TFLOPs21.2 TFLOPs30.4 TFLOPs312 TFLOPs
624 TFLOPs with Sparsity
779 TFLOPs (Per Module)?
1558 TFLOPs with Sparsity (Per Module)?
FP32 Compute5.04 TFLOPs6.8 TFLOPs10.0 TFLOPs10.6 TFLOPs15.7 TFLOPs19.4 TFLOPs
156 TFLOPs With Sparsity
24.2 TFLOPs (Per Module)?
193.6 TFLOPs With Sparsity?
FP64 Compute1.68 TFLOPs0.2 TFLOPs4.7 TFLOPs5.30 TFLOPs7.80 TFLOPs19.5 TFLOPs
(9.7 TFLOPs standard)
24.2 TFLOPs (Per Module)?
(12.1 TFLOPs standard)?
Memory Interface384-bit GDDR5384-bit GDDR54096-bit HBM24096-bit HBM24096-bit HBM26144-bit HBM2e6144-bit HBM2e
Memory Size12 GB GDDR5 @ 288 GB/s24 GB GDDR5 @ 288 GB/s16 GB HBM2 @ 732 GB/s
12 GB HBM2 @ 549 GB/s
16 GB HBM2 @ 732 GB/s16 GB HBM2 @ 900 GB/sUp To 40 GB HBM2 @ 1.6 TB/s
Up To 80 GB HBM2 @ 1.6 TB/s
Up To 100 GB HBM2e @ 3.5 Gbps
L2 Cache Size1536 KB3072 KB4096 KB4096 KB6144 KB40960 KB81920 KB

also read:

AMD stock reports steady growth after a decline over the second half of this year


Avatar of Nivedita Bangari
Nivedita Bangari
I am a software engineer by profession and technology is my love, learning and playing with new technologies is my passion.


Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

More To Consider


Stay Connected

Boat Lifestyle [CPS] IN

Hot Topics

Latest Articles



Adblocker detected! Please consider reading this notice.

We've detected that you are using AdBlock Plus or some other adblocking software which is preventing the page from fully loading.

We don't have any banner, Flash, animation, obnoxious sound, or popup ad. We do not implement these annoying types of ads!

We need money to operate the site, and almost all of it comes from our online advertising.

Please add technosports.co.in to your ad blocking whitelist or disable your adblocking software.