It’s proven! AMD’s RDNA 2 has better latency than Nvidia’s Ampere

According to recent sources, folks at Chips and Cheese conducted a recent GPU memory latency performance test on the AMD’s rDNA 2 & NVIDIA’s Ampere GPU architectures. The results produced were more than interesting.

Latency performance has become a very crucial factor in the ever-increasing use of multi-chipset dies and several IO chips onboard the same die. So, the performance of the two GPUs gives us a glance at their actual performance capabilities.

Coming to the results, AMD’s Radeon RX 6800 XT (RDNA 2 GPU) & the NVIDIA GeForce RTX 3090 (Ampere GPU) were positioned against each other. According to the test, the cache and memory benchmark of the AMD’s rDNA 2 architecture fared far better than NVIDIA’s Ampere GPU. It delivered lower latency despite having to check two more levels of cache on the way to the memory. The use of Infinity cache only adds 20ns over L2 hit and is still faster than NVIDIA’s Ampere.

The testers concluded that NVIDIA’s Ampere-based GA102 GPU is simply much larger and uses a more conventional GPU memory subsystem with only two cache levels, it has to take a lot of cycles and results in over 100ns latency (L1 to L2). RDNA 2 on the other hand has a latency of just 66ns.

We know that AMD’s Navi 21 GPU features a 4 MB L2 cache while the NVIDIA GA102 GPU features a 6 MB L2 cache for the whole chip. The NVIDIA A100 Ampere GPU for HPC features a massive 40 MB L2 cache.

The folks at Chip and Cheese had this to say about the results:

RDNA 2’s cache is fast and there’s a lot of it. Compared to Ampere, latency is low at all levels. Infinity Cache only adds about 20 ns over an L2 hit and has lower latency than Ampere’s L2. Amazingly, RDNA 2’s VRAM latency is about the same as Ampere’s, even though RDNA 2 is checking two more levels of cache on the way to memory.

In contrast, Nvidia sticks with a more conventional GPU memory subsystem with only two levels of cache and high L2 latency. Going from Ampere’s SM-private L1 to L2 takes over 100 ns. RDNA’s L2 is ~66 ns away from L0, even with an L1 cache between them. Getting around GA102’s massive die seems to take a lot of cycles.

This could explain AMD’s excellent performance at lower resolutions. RDNA 2’s low latency L2 and L3 caches may give it an advantage with smaller workloads, where occupancy is too low to hide latency. Nvidia’s Ampere chips in comparison require more parallelism to shine.

Source

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.

More like this

LATEST NEWS

BGMI 2025: Tournaments, Game Updates and Esports Highlights

BGMI (Battlegrounds Mobile India) is all set to get busy in 2025 with tournaments, game updates, and an intense fight for domination with top...

Netflix iOS App Now Lets Users Download Entire TV Seasons in One Tap

The streaming service Netflix has rolled out a one-tap download feature for a whole TV season in its iPhone and iPad apps. Now more...

The Greatest Rivalry India Vs Pakistan Trailer: Netflix’s Upcoming Cricket Documentary Set to Thrill Fans

The Greatest Rivalry India Vs Pakistan Trailer Here: Netflix has finally dropped the much-awaited trailer for The Greatest Rivalry: India vs Pakistan, a documentary...

 Pushpa 2: The Rule Now Streaming on Netflix in Kannada

Pushpa 2: The Rule Now Streaming on Netflix The highly anticipated sequel to Allu Arjun's blockbuster, "Pushpa 2: The Rule," has finally made its digital...

Featured