AMD’s next-generation MI200 HPC GPU is finally here and the new product is codenamed Aldebaran that uses a 6nm CDNA 2 architecture to deliver insane compute performance.
With the launch of this GPU, AMD became the first to launch MCM technology and the Aldebaran GPU will come in various forms & sizes but it’s all based on the brand new CDNA 2 architecture which is the most refined variation of Vega.
Some of the main features before we go into detail are listed below:
- AMD CDNA 2 architecture – 2nd Gen Matrix Cores accelerating FP64 and FP32 matrix operations, delivering up to 4X the peak theoretical FP64 performance vs. AMD previous-gen GPUs.
- Leadership Packaging Technology – Industry-first multi-die GPU design with 2.5D Elevated Fanout Bridge (EFB) technology delivers 1.8X more cores and 2.7X higher memory bandwidth vs. AMD previous-gen GPUs, offering the industry’s best aggregate peak theoretical memory bandwidth at 3.2 terabytes per second.
- 3rd Gen AMD Infinity Fabric technology – Up to 8 Infinity Fabric links connect the AMD Instinct MI200 with 3rd Gen EPYC CPUs and other GPUs in the node to enable unified CPU/GPU memory coherency and maximize system throughput, allowing for an easier on-ramp for CPU codes to tap the power of accelerators.
AMD Instinct MI200 GPU Die Shot:
The GPU features two dies, a secondary and a primary and both consist of 8 shader engines for a total of 16 SE’s. Each Shader Engine packs 16 CUs with full-rate FP64, packed FP32 & a 2nd Generation Matrix Engine for FP16 & BF16 operations.
Each die also consists of 128 compute units or 8192 stream processors which brings us to a total of 220 compute units or 14,080 stream processors for the entire chip.
Built on AMD CDNA 2 architecture, AMD Instinct MI200 series accelerators deliver leading application performance for a broad set of HPC workloads. The AMD Instinct MI250X accelerator provides up to 4.9X better performance than competitive accelerators for double precision (FP64) HPC applications and surpasses 380 teraflops of peak theoretical half-precision (FP16) for AI workloads to enable disruptive approaches in further accelerating data-driven research.
AMD Radeon Instinct Accelerators 2020
Accelerator Name | AMD Instinct MI300 | AMD Instinct MI250X | AMD Instinct MI250 | AMD Instinct MI210 | AMD Instinct MI100 | AMD Radeon Instinct MI60 | AMD Radeon Instinct MI50 | AMD Radeon Instinct MI25 | AMD Radeon Instinct MI8 | AMD Radeon Instinct MI6 |
GPU Architecture | TBA (CDNA 3) | Aldebaran (CDNA 2) | Aldebaran (CDNA 2) | Aldebaran (CDNA 2) | Arcturus (CDNA 1) | Vega 20 | Vega 20 | Vega 10 | Fiji XT | Polaris 10 |
GPU Process Node | Advanced Process Node | 6nm | 6nm | 6nm | 7nm FinFET | 7nm FinFET | 7nm FinFET | 14nm FinFET | 28nm | 14nm FinFET |
GPU Dies | 4 (MCM)? | 2 (MCM) | 2 (MCM) | 2 (MCM) | 1 (Monolithic) | 1 (Monolithic) | 1 (Monolithic) | 1 (Monolithic) | 1 (Monolithic) | 1 (Monolithic) |
GPU Cores | 28,160? | 14,080 | 13,312 | TBA | 7680 | 4096 | 3840 | 4096 | 4096 | 2304 |
GPU Clock Speed | TBA | 1700 MHz | 1700 MHz | TBA | ~1500 MHz | 1800 MHz | 1725 MHz | 1500 MHz | 1000 MHz | 1237 MHz |
FP16 Compute | TBA | 383 TOPs | 362 TOPs | TBA | 185 TFLOPs | 29.5 TFLOPs | 26.5 TFLOPs | 24.6 TFLOPs | 8.2 TFLOPs | 5.7 TFLOPs |
FP32 Compute | TBA | 95.7 TFLOPs | 90.5 TFLOPs | TBA | 23.1 TFLOPs | 14.7 TFLOPs | 13.3 TFLOPs | 12.3 TFLOPs | 8.2 TFLOPs | 5.7 TFLOPs |
FP64 Compute | TBA | 47.9 TFLOPs | 45.3 TFLOPs | TBA | 11.5 TFLOPs | 7.4 TFLOPs | 6.6 TFLOPs | 768 GFLOPs | 512 GFLOPs | 384 GFLOPs |
VRAM | TBA | 128 GB HBM2e | 128 GB HBM2e | TBA | 32 GB HBM2 | 32 GB HBM2 | 16 GB HBM2 | 16 GB HBM2 | 4 GB HBM1 | 16 GB GDDR5 |
Memory Clock | TBA | 3.2 Gbps | 3.2 Gbps | TBA | 1200 MHz | 1000 MHz | 1000 MHz | 945 MHz | 500 MHz | 1750 MHz |
Memory Bus | TBA | 8192-bit | 8192-bit | 8192-bit | 4096-bit bus | 4096-bit bus | 4096-bit bus | 2048-bit bus | 4096-bit bus | 256-bit bus |
Memory Bandwidth | TBA | 3.2 TB/s | 3.2 TB/s | TBA | 1.23 TB/s | 1 TB/s | 1 TB/s | 484 GB/s | 512 GB/s | 224 GB/s |
Form Factor | TBA | OAM | OAM | Dual Slot Card | Dual Slot, Full Length | Dual Slot, Full Length | Dual Slot, Full Length | Dual Slot, Full Length | Dual Slot, Half Length | Single Slot, Full Length |
Cooling | TBA | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling | Passive Cooling |
TDP | TBA | 560W | 500W? | TBA | 300W | 300W | 300W | 300W | 175W | 150W |