The creation of the CS-2 Wafer Scale Engine, the largest accelerator chip in the world, on a single device signifies a watershed moment for Cerebras as it represents the largest learning effort of the most complete global Natural Language Processing (NLP) AI model.
The impressive and unique twenty billion parameters of the Cerebras artificial intelligence model are unheard of. Cerebras completed this task without the need to scale the burden over numerous accelerators. Because Cerebras demands less infrastructure and software complexity than previous models did, its success is essential for machine learning.
The Wafer Scale Engine-2, which is contained in a single 7 nm wafer and features 2.6 trillion 7 nm transistors, is comparable to hundreds of the most advanced CPUs now available. In addition to the wafer and transistors, the Wafer Scale Engine-2 includes 850,000 cores, 40 GB of integrated cache, and a 15 kW power consumption. A single CS-2 machine is comparable to a supercomputer all by itself, claims Tom’s Hardware.
The benefit for Cerebras is that by deploying a 20 billion-parameter NLP model in a single chip, it can lower its overhead in the cost of training thousands of GPUs, hardware, and scaling requirements.
In turn, the company might avoid any technical difficulties brought by dispersing various models over the chip.
In NLP, bigger models are shown to be more accurate. But traditionally, only a select few companies had the resources and expertise necessary to do the painstaking work of breaking up these large models and spreading them across hundreds or thousands of graphics processing units. As a result, few companies could train large NLP models – it was too expensive, time-consuming, and inaccessible for the rest of the industry. Today we are proud to democratize access to GPT-3XL 1.3B, GPT-J 6B, GPT-3 13B, and GPT-NeoX 20B, enabling the entire AI ecosystem to set up large models in minutes and train them on a single CS-2.
— Andrew Feldman, CEO and Co-Founder, Cerebras System
Systems that function very effectively with fewer parameters have been observed recently. Chinchilla is one such system, consistently surpassing the 70 billion parameters of GPT-3 and Gopher. Researchers will find that they can calculate and progressively construct complex models on the new Wafer Scale Engine-2 where others cannot. This makes Cerebras’ accomplishment all the more significant.
Cerebras’ ability to bring large language models to the masses with cost-efficient, easy access opens up an exciting new era in AI. It gives organizations that can’t spend tens of millions an easy and inexpensive on-ramp to major league NLP. It will be interesting to see the new applications and discoveries CS-2 customers make as they train GPT-3 and GPT-J class models on massive datasets.
— Dan Olds, Chief Research Officer, Intersect360 Research
Also Read:
Intel Arc A380 GPU delivers Disappointing performance against NVidia GeForce RTX 1650 and AMD’s Radeon RX 6400 GPUs
source