INSUBCONTINENT EXCLUSIVE:

Superlatives abound at Cerebras, the until-today stealthy next-generation silicon chip company looking to make training a deep learning

model as quick as buying toothpaste from Amazon

attendees

You can read more about the chip from Tiernan Ray at Fortune and read the white paper from Cerebras itself.Superlatives aside though, the

technical challenges that Cerebras had to overcome to reach this milestone I think is the more interesting story here

I sat down with founder and CEO Andrew Feldman this afternoon to discuss what his 173 engineers have been building quietly just down the

street here these past few years with $112 million in venture capital funding from Benchmark and others.Going big means nothing but

challengesFirst, a quick background on how the chips that power your phones and computers get made

Fabs like TSMC take standard-sized silicon wafers and divide them into individual chips by using light to etch the transistors into the chip

Wafers are circles and chips are squares, and so there is some basic geometry involved in subdividing that circle into a clear array of

individual chips.One big challenge in this lithography process is that errors can creep into the manufacturing process, requiring extensive

testing to verify quality and forcing fabs to throw away poorly performing chips

The smaller and more compact the chip, the less likely any individual chip will be inoperative, and the higher the yield for the fab

Higher yield equals higher profits.Cerebras throws out the idea of etching a bunch of individual chips onto a single wafer in lieu of just

using the whole wafer itself as one gigantic chip

architecture and design was led by co-founder Sean Lie

Feldman and Lie worked together on a previous startup called SeaMicro, which sold to AMD in 2012 for $334 million

silicon wafer

So the company had to invent new techniques to allow each of those individual chips to communicate with each other across the whole wafer

Working with TSMC, they not only invented new channels for communication, but also had to write new software to handle chips with trillion

plus transistors.The second challenge was yield

With a chip covering an entire silicon wafer, a single imperfection in the etching of that wafer could render the entire chip inoperative

This has been the block for decades on whole wafer technology: due to the laws of physics, it is essentially impossible to etch a trillion

transistors with perfect accuracy repeatedly.Cerebras approached the problem using redundancy by adding extra cores throughout the chip that

Leaving extra cores allows the chip to essentially self-heal, routing around the lithography error and making a whole wafer silicon chip

But they were known problems, and Feldman said that they were actually easier to solve that expected by re-approaching them using modern

tools.He likens the challenge though to climbing Mount Everest

no other chip designer had gotten past the scribe line communication and yield challenges to actually find what happened next.The third

challenge Cerebras confronted was handling thermal expansion

Chips get extremely hot in operation, but different materials expand at different rates

That means the connectors tethering a chip to its motherboard also need to thermally expand at precisely the same rate lest cracks develop

to invent a material

manufactured, it needs to be tested and packaged for shipment to original equipment manufacturers (OEMs) who add the chips into the products

used by end customers (whether data centers or consumer laptops)

There is a challenge though: absolutely nothing on the market is designed to handle a whole-wafer chip.Cerebras designed its own testing and

That is the truth

Nobody had a printed circuit board this size

Nobody had connectors

Nobody had a cold plate

Nobody had tools

Nobody had tools to align them

Nobody had tools to handle them

processing power in one chip requires immense power and cooling

to a modern-sized AI cluster

All that power also needs to be cooled, and Cerebras had to design a new way to deliver both for such a large chip.It essentially approached

power and cooling horizontally across the chip as is traditional, power and cooling are delivered vertically at all points across the chip,

that the company has worked around-the-clock to deliver these past few years.From theory to realityCerebras has a demo chip (I saw one, and

yes, it is roughly the size of my head), and it has started to deliver prototypes to customers according to reports

The big challenge though as with all new chips is scaling production to meet customer demand.For Cerebras, the situation is a bit unusual

together to create a compute cluster

Instead, they may only need a handful of Cerebras chips for their deep-learning needs

that also includes its proprietary cooling technology.Expect to hear more details of Cerebras technology in the coming months, particularly

as the fight over the future of deep learning processing workflows continues to heat up.

The five technical challenges Cerebras overcame in building the first trillion transistor chip