In a new research paper published by the green team, a mysterious NVIDIA GPU known as GPU-N has been revealed, which could be the first look at the next-gen Hopper GH100 chip (as discovered by Twitter user, Redfire).

The research paper ‘GPU Domain Specialization via Composable On-Package Architecture’ discusses a next-generation GPU design as the most practical solution for boosting Deep Learning performance by maximizing low-precision matrix math throughput. The ‘GPU-N’ and its COPA designs, as well as their possible specifications and simulated performance results, have been discussed.

Top 10 bowlers with the most wickets in test cricket history

Top 5 Players with the Fastest ODI Century in Cricket History

Top 5 Indians who scored the Fastest T20I Centuries

134 SM units are believed to be included in the ‘GPU-N.’ (vs 104 SM units of A100). This brings the total number of cores to 8576, which is a 24% increase over the present Ampere A100 solution. The chip’s theoretical clock speed is 1.4 GHz, which is the same as the Ampere A100 and Volta V100 (not to be confused as the final clocks).

Other features include a 60 MB L2 cache, which is a 50% improvement over the Ampere A100, and a 2.68 TB/s DRAM bandwidth that can scale up to 6.3 TB/s. The HBM2e DRAM has a capacity of 100 GB, which can be increased to 233 GB using COPA implementations. It’s built around a 6144-bit bus interface with 3.5 Gbps clock speeds.

Configuration	NVIDIA V100	NVIDIA A100	GPU-N
SMs	80	108	134
GPU frequency (GHz)	1.4	1.4	1.4
FP32 (TFLOPS)	15.7	19.5	24.2
FP16 (TFLOPS)	125	312	779
L2 cache (MB)	6	40	60
DRAM BW (GB/s)	900	1,555	2,687
DRAM Capacity (GB)	16	40	100

In terms of performance, the ‘GPU-N’ (probably Hopper GH100) generates 24.2 TFLOPs of FP32 (a 24 percent increase over A100) and 779 TFLOPs of FP16 (a 2.5x increase over A100), which seems very near to the 3x gains rumored for GH100 over A100. The FP32 performance of the Instinct MI250X accelerator is less than half that of AMD’s cDNA 2 ‘Aldebaran’ GPU (95.7 TFLOPs versus 24.2 TFLOPs), however, the FP16 performance is 2.15x higher.

NVIDIA’s H100 accelerator would be based on an MCM solution and use TSMC’s 5nm process node, according to previous information. Hopper is expected to have two next-generation GPU modules, totalling 288 SM units. We can’t give an exact number of cores because we don’t know how many are in each SM, but if it sticks to 64 cores per SM, we’ll have 18,432 cores, which is 2.25x more than the full GA100 GPU configuration. NVIDIA could also use more FP64, FP16, and Tensor cores in its Hopper GPU to greatly improve performance. And that will be required to compete with Intel’s Ponte Vecchio, which is expected to include 1:1 FP64.

The final configuration is likely to have 134 of the 144 SM units activated on each GPU module, implying that we’re looking at a single GH100 chip in action. However, without GPU Sparsity, NVIDIA is unlikely to achieve the same FP32 or FP64 Flops as MI200.

NVIDIA, on the other hand, may have a hidden weapon in the form of Hopper’s COPA-based GPU implementation. NVIDIA mentions two Domain-Specialized COPA-GPUs based on next-generation architecture, one for HPC and the other for DL. The HPC variant uses a relatively normal method, with an MCM GPU and HBM/MC+HBM (IO) chipsets, but the DL variation is where things start to get interesting. The DL version includes a large cache on a separate die that is coupled to the GPU modules.

Architecture	LLC Capacity	DRAM BW	DRAM Capacity
Configuration	(MB)	(TB/s)	(GB)
GPU-N	60	2.7	100
COPA-GPU-1	960	2.7	100
COPA-GPU-2	960	4.5	167
COPA-GPU-3	1,920	2.7	100
COPA-GPU-4	1,920	4.5	167
COPA-GPU-5	1,920	6.3	233
Perfect L2	infinite	infinite	infinite

There are several variants with up to 960 / 1920 MB of LLC (Last-Level-Cache), up to 233 GB of HBM2e DRAM capacity, and up to 6.3 TB/s of bandwidth. These are all hypothetical, but given that NVIDIA has already discussed them, we could see a Hopper variant with such a design at GTC 2022.

NVIDIA Hopper GH100 ‘Preliminary Specs’:

NVIDIA Tesla Graphics Card	Tesla K40 (PCI-Express)	Tesla M40 (PCI-Express)	Tesla P100 (PCI-Express)	Tesla P100 (SXM2)	Tesla V100 (SXM2)	NVIDIA A100 (SXM4)	NVIDIA H100 (SMX4?)
GPU	GK110 (Kepler)	GM200 (Maxwell)	GP100 (Pascal)	GP100 (Pascal)	GV100 (Volta)	GA100 (Ampere)	GH100 (Hopper)
Process Node	28nm	28nm	16nm	16nm	12nm	7nm	5nm
Transistors	7.1 Billion	8 Billion	15.3 Billion	15.3 Billion	21.1 Billion	54.2 Billion	TBD
GPU Die Size	551 mm2	601 mm2	610 mm2	610 mm2	815mm2	826mm2	TBD
SMs	15	24	56	56	80	108	134 (Per Module)
TPCs	15	24	28	28	40	54	TBD
FP32 CUDA Cores Per SM	192	128	64	64	64	64	64?
FP64 CUDA Cores / SM	64	4	32	32	32	32	32?
FP32 CUDA Cores	2880	3072	3584	3584	5120	6912	8576 (Per Module) 17152 (Complete)
FP64 CUDA Cores	960	96	1792	1792	2560	3456	4288 (Per Module)? 8576 (Complete)?
Tensor Cores	N/A	N/A	N/A	N/A	640	432	TBD
Texture Units	240	192	224	224	320	432	TBD
Boost Clock	875 MHz	1114 MHz	1329MHz	1480 MHz	1530 MHz	1410 MHz	~1400 MHz
TOPs (DNN/AI)	N/A	N/A	N/A	N/A	125 TOPs	1248 TOPs 2496 TOPs with Sparsity	TBD
FP16 Compute	N/A	N/A	18.7 TFLOPs	21.2 TFLOPs	30.4 TFLOPs	312 TFLOPs 624 TFLOPs with Sparsity	779 TFLOPs (Per Module)? 1558 TFLOPs with Sparsity (Per Module)?
FP32 Compute	5.04 TFLOPs	6.8 TFLOPs	10.0 TFLOPs	10.6 TFLOPs	15.7 TFLOPs	19.4 TFLOPs 156 TFLOPs With Sparsity	24.2 TFLOPs (Per Module)? 193.6 TFLOPs With Sparsity?
FP64 Compute	1.68 TFLOPs	0.2 TFLOPs	4.7 TFLOPs	5.30 TFLOPs	7.80 TFLOPs	19.5 TFLOPs (9.7 TFLOPs standard)	24.2 TFLOPs (Per Module)? (12.1 TFLOPs standard)?
Memory Interface	384-bit GDDR5	384-bit GDDR5	4096-bit HBM2	4096-bit HBM2	4096-bit HBM2	6144-bit HBM2e	6144-bit HBM2e
Memory Size	12 GB GDDR5 @ 288 GB/s	24 GB GDDR5 @ 288 GB/s	16 GB HBM2 @ 732 GB/s 12 GB HBM2 @ 549 GB/s	16 GB HBM2 @ 732 GB/s	16 GB HBM2 @ 900 GB/s	Up To 40 GB HBM2 @ 1.6 TB/s Up To 80 GB HBM2 @ 1.6 TB/s	Up To 100 GB HBM2e @ 3.5 Gbps
L2 Cache Size	1536 KB	3072 KB	4096 KB	4096 KB	6144 KB	40960 KB	81920 KB
TDP	235W	250W	250W	300W	300W	400W	~450-500W

The Mysterious GPU-N from NVIDIA might be its next-gen Hopper GH100 GPU in disguise

Top 10 bowlers with the most wickets in test cricket history

Top 5 Players with the Fastest ODI Century in Cricket History

Top 5 Indians who scored the Fastest T20I Centuries

Assassin’s Creed Valhalla getting a new DLC and its Dawn of Ragnarok

Is Samsung backed by its restructure for competing with TSMC and other Chinese companies? Read these 2 points

Related Posts

Top 10 bowlers with the most wickets in test cricket history

Top 5 Players with the Fastest ODI Century in Cricket History

Top 5 Indians who scored the Fastest T20I Centuries

Marvel 2025 Movie Lineup: Release Dates, Rumors, and What to Expect

Spectacular Livvy Dunne Height, Age, Weight, Net Worth, Family in 2025

The Top 10 Richest IPL Teams Net Worth in 2025

Is Samsung backed by its restructure for competing with TSMC and other Chinese companies? Read these 2 points

Leave a Reply Cancel reply

Email: admin@technosports.co.in

The Mysterious GPU-N from NVIDIA might be its next-gen Hopper GH100 GPU in disguise

RelatedPosts

Assassin’s Creed Valhalla getting a new DLC and its Dawn of Ragnarok

Is Samsung backed by its restructure for competing with TSMC and other Chinese companies? Read these 2 points

Related Posts

Leave a Reply Cancel reply

Email: admin@technosports.co.in

Follow Us