The first-ever GPU for the HPC segment to feature an MCM design based on the CDNA2 architecture, the MI200 from AMD is on the verge of being launched and we are getting reports that the GPU will offer some insane performance numbers compared to the existing Instinct MI100 GPU with a 4x increase in FP16 compute.
According to sources, the Instinct MI200 lineup will include two variants, which will be a standard MI250, and a MI250X. According to the details, the MI250X will get 110 CUs per die (220 CUs in total), 128 GB HBM2e memory, a 500W TDP and will be based on 7nm.
the tweets above suggest that AMD’s Instinct MI200 will rock a clock speed of up to 1.7 GHz which is a 13% increase over the Instinct MI100. The new MCM GPU will also be sporting twice the number of stream processors at 14,080 cores which will be packed within 220 Compute Units.
According to previous reports from HPCWire, AMD’s Instinct MI200 will power three top-tier supercomputers which include the United States’ exascale Frontier system; the European Union’s pre-exascale LUMI system; and Australia’s petascale Setonix system.
Performance-wise, AMD’s Instinct MI200 HPC Accelerator comes with almost 50 TFLOPs (47.9) TFLOPs of FP64 & FP32 compute horsepower.
AMD Radeon Instinct Accelerators 2020
Accelerator Name | AMD Instinct MI300 | AMD Instinct MI200 | AMD Instinct MI100 | AMD Radeon Instinct MI60 | AMD Radeon Instinct MI50 | AMD Radeon Instinct MI25 | AMD Radeon Instinct MI8 | AMD Radeon Instinct MI6 |
GPU Architecture | TBA (CDNA 3) | Aldebaran (CDNA 2) | Arcturus (CDNA 1) | Vega 20 | Vega 20 | Vega 10 | Fiji XT | Polaris 10 |
GPU Process Node | Advanced Process Node | Advanced Process Node | 7nm FinFET | 7nm FinFET | 7nm FinFET | 14nm FinFET | 28nm | 14nm FinFET |
GPU Dies | 4 (MCM)? | 2 (MCM) | 1 (Monolithic) | 1 (Monolithic) | 1 (Monolithic) | 1 (Monolithic) | 1 (Monolithic) | 1 (Monolithic) |
GPU Cores | 28,160? | 14,080? | 7680 | 4096 | 3840 | 4096 | 4096 | 2304 |
GPU Clock Speed | TBA | ~1700 MHz | ~1500 MHz | 1800 MHz | 1725 MHz | 1500 MHz | 1000 MHz | 1237 MHz |
FP16 Compute | TBA | 383 TOPs | 185 TFLOPs | 29.5 TFLOPs | 26.5 TFLOPs | 24.6 TFLOPs | 8.2 TFLOPs | 5.7 TFLOPs |
FP32 Compute | TBA | 95.8 TFLOPs | 23.1 TFLOPs | 14.7 TFLOPs | 13.3 TFLOPs | 12.3 TFLOPs | 8.2 TFLOPs | 5.7 TFLOPs |
FP64 Compute | TBA | 47.9 TFLOPs | 11.5 TFLOPs | 7.4 TFLOPs | 6.6 TFLOPs | 768 GFLOPs | 512 GFLOPs | 384 GFLOPs |
VRAM | TBA | 64/128 GB HBM2e? | 32 GB HBM2 | 32 GB HBM2 | 16 GB HBM2 | 16 GB HBM2 | 4 GB HBM1 | 16 GB GDDR5 |
Memory Clock | TBA | TBA | 1200 MHz | 1000 MHz | 1000 MHz | 945 MHz | 500 MHz | 1750 MHz |
Memory Bus | TBA | 8192-bit | 4096-bit bus | 4096-bit bus | 4096-bit bus | 2048-bit bus | 4096-bit bus | 256-bit bus |
Memory Bandwidth | TBA | ~2 TB/s? | 1.23 TB/s | 1 TB/s | 1 TB/s | 484 GB/s | 512 GB/s | 224 GB/s |
Form Factor | TBA | Dual Slot, Full Length / OAM | Dual Slot, Full Length | Dual Slot, Full Length | Dual Slot, Full Length | Dual Slot, Full Length | Dual Slot, Half Length | Single Slot, Full Length |
Cooling | TBA | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling |
TDP | TBA | TBA | 300W | 300W | 300W | 300W | 175W | 150W |