Within the last few years, we have seen a huge amount of processors enter the market and their sole purpose is to accelerate artificial intelligence and machine learning workloads. These processors are often focused on a few key areas, due to the different types of machine learning algorithms possible, but the size limits them all.
Two years ago, a computer systems company, Cerebras, unveiled a revolution in silicon design: a processor as big as a human head, built on 16nm, using as much area on a 12-inch wafer as a rectangular design would allow, focused on both HPC workloads and AI as well. On 20th April, the company launched its second-generation product, built on TSMC 7nm, with more than double the cores and pretty much more than double of everything.
Second Generation Wafer Scale Engine
Cerebras builds the new processor on the first by moving to TSMC’s N7 process. This allows SRAMs to scale down to some extent, and the logic to scale down, and now the new chip has 850,000 AI cores on board. Basically, almost everything about the new chip is over 2 times.
The original processor is known as the Wafer Scale Engine (WSE-1), and the new one is named WSE-2. The WSE-2 features hundreds of thousands of AI cores across a massive 46225 mm2 of silicon. In that space, 2.6 trillion transistors for 850,000 cores have been enabled by Cerebras – by comparison, the second biggest AI CPU on the market is ~826 mm2, with 0.054 trillion transistors, according to Anand Tech. Cerebras also cites 1000x more onboard memory, with 40 GB of SRAM, compared to 40 MB on the Ampere A100.
2D Mesh with FMAC datapaths is connected with the cores. With WSE, Cerebras’ goal is to provide a single platform, “designed through innovative patents, that allows for bigger processors useful in AI calculations but has also been extended into a wider array of HPC workloads.”
Building on First Gen WSE
The custom graph compiler is a key to the design. The compiler takes pyTorch or TensorFlow and maps each layer to a physical part of the chip. This allows for asynchronous computation as the data flows through. Having such a large processor means the data can continually be moved onto the next stage of the calculation as it never has to go off-die and wait in memory, wasting power. Sparsity has been kept in mind while designing the compiler and processor, allowing high utilization regardless of batch size, or can enable parameter search algorithms to run simultaneously.
WSE-1 is sold as a complete system called CS-1, and several dozen customers with deployed systems up and running are present, including a number of pharmaceutical companies, research laboratories, military, biotechnology research, and the oil and gas industries. “Lawrence Livermore has a CS-1 paired to its 23 PFLOP ‘Lassen’ Supercomputer. Pittsburgh Supercomputer Center purchased two systems with a $5m grant, and these systems are attached to their Neocortex supercomputer, allowing for simultaneous AI and enhanced compute.”
The uniqueness of Cerebras’ design is being able to go beyond the reticle limit, the physical manufacturing limits normally presented in manufacturing. As connecting two areas with a cross-reticle connection is difficult, processors are designed with this reticle limit as the maximum size of a chip.
Cerebras remains the only one offering a processor on this scale – “the same patents that Cerebras developed and were awarded to build these large chips are still in play here, and the second-gen WSE will be built into CS-2 systems with a similar design to CS-1 in terms of connectivity and visuals.”
Cerebras states that having the solution to such a large single-chip means that the barrier to distributed training methods across 100s of AI chips is now so much further away, that this excess complication is not needed in most scenarios – to that, we’re seeing CS-1 deployments of single systems attached to supercomputers. However, the company is keen to point out that “two CS-2 systems will deliver 1.7 million AI cores in a standard 42U rack, or three systems for 2.55 million in a larger 46U rack (assuming there’s sufficient power for all at once!), replacing a dozen racks of alternative computer hardware.”
At Hot Chips 2020, Sean Lie, Chief Hardware Architect of Cerebras, stated that one of the key benefits to customers of the company was the ability to enable workload simplification that previously required racks of GPU/TPU but instead can run on a single WSE in a computationally relevant fashion.
As a company, Cerebras has ~300 staff across different countries including Canada, the USA, and Japan. They have dozens of customers already with CS-1 deployed and a number more already trialing CS-2 remotely as they bring up the commercial systems.