The Mysterious GPU-N from NVIDIA might be its next-gen Hopper GH100 GPU in disguise

In a new research paper published by the green team, a mysterious NVIDIA GPU known as GPU-N has been revealed, which could be the first look at the next-gen Hopper GH100 chip (as discovered by Twitter user, Redfire).

The research paper ‘GPU Domain Specialization via Composable On-Package Architecture’ discusses a next-generation GPU design as the most practical solution for boosting Deep Learning performance by maximizing low-precision matrix math throughput. The ‘GPU-N’ and its COPA designs, as well as their possible specifications and simulated performance results, have been discussed.

134 SM units are believed to be included in the ‘GPU-N.’ (vs 104 SM units of A100). This brings the total number of cores to 8576, which is a 24% increase over the present Ampere A100 solution. The chip’s theoretical clock speed is 1.4 GHz, which is the same as the Ampere A100 and Volta V100 (not to be confused as the final clocks).

Other features include a 60 MB L2 cache, which is a 50% improvement over the Ampere A100, and a 2.68 TB/s DRAM bandwidth that can scale up to 6.3 TB/s. The HBM2e DRAM has a capacity of 100 GB, which can be increased to 233 GB using COPA implementations. It’s built around a 6144-bit bus interface with 3.5 Gbps clock speeds.

ConfigurationNVIDIA V100NVIDIA A100GPU-N
SMs80108134
GPU frequency (GHz)1.41.41.4
FP32 (TFLOPS)15.719.524.2
FP16 (TFLOPS)125312779
L2 cache (MB)64060
DRAM BW (GB/s)9001,5552,687
DRAM Capacity (GB)1640100

In terms of performance, the ‘GPU-N’ (probably Hopper GH100) generates 24.2 TFLOPs of FP32 (a 24 percent increase over A100) and 779 TFLOPs of FP16 (a 2.5x increase over A100), which seems very near to the 3x gains rumored for GH100 over A100. The FP32 performance of the Instinct MI250X accelerator is less than half that of AMD’s cDNA 2 ‘Aldebaran’ GPU (95.7 TFLOPs versus 24.2 TFLOPs), however, the FP16 performance is 2.15x higher.

NVIDIA’s H100 accelerator would be based on an MCM solution and use TSMC’s 5nm process node, according to previous information. Hopper is expected to have two next-generation GPU modules, totalling 288 SM units. We can’t give an exact number of cores because we don’t know how many are in each SM, but if it sticks to 64 cores per SM, we’ll have 18,432 cores, which is 2.25x more than the full GA100 GPU configuration. NVIDIA could also use more FP64, FP16, and Tensor cores in its Hopper GPU to greatly improve performance. And that will be required to compete with Intel’s Ponte Vecchio, which is expected to include 1:1 FP64.

The final configuration is likely to have 134 of the 144 SM units activated on each GPU module, implying that we’re looking at a single GH100 chip in action. However, without GPU Sparsity, NVIDIA is unlikely to achieve the same FP32 or FP64 Flops as MI200.

NVIDIA, on the other hand, may have a hidden weapon in the form of Hopper’s COPA-based GPU implementation. NVIDIA mentions two Domain-Specialized COPA-GPUs based on next-generation architecture, one for HPC and the other for DL. The HPC variant uses a relatively normal method, with an MCM GPU and HBM/MC+HBM (IO) chipsets, but the DL variation is where things start to get interesting. The DL version includes a large cache on a separate die that is coupled to the GPU modules.

ArchitectureLLC CapacityDRAM BWDRAM Capacity
Configuration(MB)(TB/s)(GB)
GPU-N602.7100
COPA-GPU-19602.7100
COPA-GPU-29604.5167
COPA-GPU-31,9202.7100
COPA-GPU-41,9204.5167
COPA-GPU-51,9206.3233
Perfect L2infiniteinfiniteinfinite

There are several variants with up to 960 / 1920 MB of LLC (Last-Level-Cache), up to 233 GB of HBM2e DRAM capacity, and up to 6.3 TB/s of bandwidth. These are all hypothetical, but given that NVIDIA has already discussed them, we could see a Hopper variant with such a design at GTC 2022.

NVIDIA Hopper GH100 ‘Preliminary Specs’:

NVIDIA Tesla Graphics CardTesla K40
(PCI-Express)
Tesla M40
(PCI-Express)
Tesla P100
(PCI-Express)
Tesla P100 (SXM2)Tesla V100 (SXM2)NVIDIA A100 (SXM4)NVIDIA H100 (SMX4?)
GPUGK110 (Kepler)GM200 (Maxwell)GP100 (Pascal)GP100 (Pascal)GV100 (Volta)GA100 (Ampere)GH100 (Hopper)
Process Node28nm28nm16nm16nm12nm7nm5nm
Transistors7.1 Billion8 Billion15.3 Billion15.3 Billion21.1 Billion54.2 BillionTBD
GPU Die Size551 mm2601 mm2610 mm2610 mm2815mm2826mm2TBD
SMs1524565680108134 (Per Module)
TPCs152428284054TBD
FP32 CUDA Cores Per SM1921286464646464?
FP64 CUDA Cores / SM6443232323232?
FP32 CUDA Cores2880307235843584512069128576 (Per Module)
17152 (Complete)
FP64 CUDA Cores9609617921792256034564288 (Per Module)?
8576 (Complete)?
Tensor CoresN/AN/AN/AN/A640432TBD
Texture Units240192224224320432TBD
Boost Clock875 MHz1114 MHz1329MHz1480 MHz1530 MHz1410 MHz~1400 MHz
TOPs (DNN/AI)N/AN/AN/AN/A125 TOPs1248 TOPs
2496 TOPs with Sparsity
TBD
FP16 ComputeN/AN/A18.7 TFLOPs21.2 TFLOPs30.4 TFLOPs312 TFLOPs
624 TFLOPs with Sparsity
779 TFLOPs (Per Module)?
1558 TFLOPs with Sparsity (Per Module)?
FP32 Compute5.04 TFLOPs6.8 TFLOPs10.0 TFLOPs10.6 TFLOPs15.7 TFLOPs19.4 TFLOPs
156 TFLOPs With Sparsity
24.2 TFLOPs (Per Module)?
193.6 TFLOPs With Sparsity?
FP64 Compute1.68 TFLOPs0.2 TFLOPs4.7 TFLOPs5.30 TFLOPs7.80 TFLOPs19.5 TFLOPs
(9.7 TFLOPs standard)
24.2 TFLOPs (Per Module)?
(12.1 TFLOPs standard)?
Memory Interface384-bit GDDR5384-bit GDDR54096-bit HBM24096-bit HBM24096-bit HBM26144-bit HBM2e6144-bit HBM2e
Memory Size12 GB GDDR5 @ 288 GB/s24 GB GDDR5 @ 288 GB/s16 GB HBM2 @ 732 GB/s
12 GB HBM2 @ 549 GB/s
16 GB HBM2 @ 732 GB/s16 GB HBM2 @ 900 GB/sUp To 40 GB HBM2 @ 1.6 TB/s
Up To 80 GB HBM2 @ 1.6 TB/s
Up To 100 GB HBM2e @ 3.5 Gbps
L2 Cache Size1536 KB3072 KB4096 KB4096 KB6144 KB40960 KB81920 KB
TDP235W250W250W300W300W400W~450-500W

also read:

AMD stock reports steady growth after a decline over the second half of this year

Source

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.

More like this

NVIDIA RTX 5060 Ti Leak

NVIDIA RTX 5060 Ti Leak: FurMark Reveals Next-Gen Power

Discover the leaked NVIDIA RTX 5060 Ti specs from the FurMark database. Learn about performance, GDDR7 memory,...
DLSS 4 Multi Frame Generation Transforms Performance in The FINALS, Enotria, and Wild Assault

DLSS 4 Multi Frame Generation Transforms Performance in The...

The gaming world is witnessing a performance revolution this week as NVIDIA’s cutting-edge DLSS 4 with Multi...

GeForce NOW Adds 13 Games in April, Including South...

NVIDIA has announced the latest GeForce NOW game additions for this week and the remainder of April....
NVIDIA DLSS 4: Revolutionizing Gaming Performance with Multi Frame Generation

NVIDIA DLSS 4: Revolutionizing Gaming Performance with Multi Frame...

NVIDIA DLSS 4: In the ever-evolving world of PC gaming, few technological advancements have made as significant...
DLSS 4 features now on Enlisted, Half-Life 2 RTX, Warhammer 40,000, & More

DLSS 4 features now on Enlisted, Half-Life 2 RTX,...

It was past midnight when I finally got the chance to try it. After a long day,...

LATEST NEWS

Inside Declan Rice’s Free-Kick Masterclass Against Real Madrid

Arsenal’s 3-0 triumph over Real Madrid in the Champions League wasn’t just historic—it was magical. Central to the unforgettable night was Declan Rice, who...

IPL 2025: RCB vs DC – Preview, Prediction, Starting XI and Where To Watch The Match LIVE?

Royal Challengers Bengaluru (RCB) will take on Delhi Capitals (DC) in Match 24 of IPL 2025 at the M Chinnaswamy Stadium on Thursday, April...

AMD and Google Cloud Launch New VMs Powered by 5th Gen EPYC Processors

In a groundbreaking collaboration that’s sending ripples through the tech world, AMD and Google Cloud have unveiled their latest technological marvel: C4D and H4D...

Framework Laptops Feel the Pinch: Trump Tariffs Take Toll

Trump’s tariffs impact Framework laptops in 2025. Learn about paused sales, price hikes, and what it means for US consumers. Trump Tariffs Shake Up Framework’s...

Featured