Discussion:
44GB SRAM on a single chip
(too old to reply)
Scott Lurndal
2024-03-13 19:51:34 UTC
Permalink
Viz. earlier discussions related to memory speeds,
here's a single chip (wafer-sized) with just under
a million cores and 44GB of SRAM. Four trillion
transistors. 21PB/s memory bandwidth. 23kW.

https://www.theregister.com/2024/03/13/cerebras_claims_to_have_revived/
Chris M. Thomasson
2024-03-13 20:54:32 UTC
Permalink
Post by Scott Lurndal
Viz. earlier discussions related to memory speeds,
here's a single chip (wafer-sized) with just under
a million cores and 44GB of SRAM. Four trillion
transistors. 21PB/s memory bandwidth. 23kW.
https://www.theregister.com/2024/03/13/cerebras_claims_to_have_revived/
Humm... I wonder if these can be stacked on top of each other to make a
cylinder... A volumetric version of this chip? Moronic thought, or just
a little? ;^)
Michael S
2024-03-13 21:08:50 UTC
Permalink
On Wed, 13 Mar 2024 19:51:34 GMT
Post by Scott Lurndal
Viz. earlier discussions related to memory speeds,
here's a single chip (wafer-sized) with just under
a million cores and 44GB of SRAM. Four trillion
transistors. 21PB/s memory bandwidth. 23kW.
https://www.theregister.com/2024/03/13/cerebras_claims_to_have_revived/
You were talking about 1ns latency then.
The latency on "chip" like that, corner to corner, would be measured in
microseconds, at best. Quite possibly, over 10 usec.
MitchAlsup1
2024-03-13 21:37:06 UTC
Permalink
Post by Michael S
On Wed, 13 Mar 2024 19:51:34 GMT
Post by Scott Lurndal
Viz. earlier discussions related to memory speeds,
here's a single chip (wafer-sized) with just under
a million cores and 44GB of SRAM. Four trillion
transistors. 21PB/s memory bandwidth. 23kW.
https://www.theregister.com/2024/03/13/cerebras_claims_to_have_revived/
You were talking about 1ns latency then.
The latency on "chip" like that, corner to corner, would be measured in
microseconds, at best. Quite possibly, over 10 usec.
A MECL 3 chip could deliver 1 ns latency and 1 ns edge speed
in a 16-pin DIP.

An ECL 10K gate with 1,000 gates with 0.5ns gates inside still took 4-5ns
simply to go from input pin on one side of the package to an output pin
on the other side with no logic being performed !! This is not a gate
speed problem, but a wire delay problem.
MitchAlsup1
2024-03-13 21:33:09 UTC
Permalink
Post by Scott Lurndal
Viz. earlier discussions related to memory speeds,
here's a single chip (wafer-sized) with just under
a million cores and 44GB of SRAM. Four trillion
transistors. 21PB/s memory bandwidth. 23kW.
https://www.theregister.com/2024/03/13/cerebras_claims_to_have_revived/
CPUs were limited to 100 W thermal dissipation,
GPUs got up to 300 W thermal dissipation,
Now you are looking at the thermal dissipation of 10% of an ECL CRAY-1

As to making racks of these things. The reason CRAYs were limited to 300
KVA is because that is the largest load a non-governmental electrical
consumer can turn on or off without calling the power company to coordinate
changing the grid (so the power company can prepare to ramp (up or down)
their generating capacity.) It generally takes them 15-30 minutes to prepare
for such a change in load.

That thermal limit (CRAYs) was a reason the memory sizes were so restricted
early on.
Scott Lurndal
2024-03-13 22:30:40 UTC
Permalink
Post by MitchAlsup1
Post by Scott Lurndal
Viz. earlier discussions related to memory speeds,
here's a single chip (wafer-sized) with just under
a million cores and 44GB of SRAM. Four trillion
transistors. 21PB/s memory bandwidth. 23kW.
https://www.theregister.com/2024/03/13/cerebras_claims_to_have_revived/
CPUs were limited to 100 W thermal dissipation,
GPUs got up to 300 W thermal dissipation,
Now you are looking at the thermal dissipation of 10% of an ECL CRAY-1
As to making racks of these things. The reason CRAYs were limited to 300
KVA is because that is the largest load a non-governmental electrical
consumer can turn on or off without calling the power company to coordinate
changing the grid (so the power company can prepare to ramp (up or down)
their generating capacity.) It generally takes them 15-30 minutes to prepare
for such a change in load.
Modern datacenters dissipate 16 to 20kW per rack, with hundreds
or thousands of racks. Basically a megawatt-hour per square meter
with cooling factored in.
MitchAlsup1
2024-03-14 00:28:41 UTC
Permalink
Post by Scott Lurndal
Post by MitchAlsup1
Post by Scott Lurndal
Viz. earlier discussions related to memory speeds,
here's a single chip (wafer-sized) with just under
a million cores and 44GB of SRAM. Four trillion
transistors. 21PB/s memory bandwidth. 23kW.
https://www.theregister.com/2024/03/13/cerebras_claims_to_have_revived/
CPUs were limited to 100 W thermal dissipation,
GPUs got up to 300 W thermal dissipation,
Now you are looking at the thermal dissipation of 10% of an ECL CRAY-1
As to making racks of these things. The reason CRAYs were limited to 300
KVA is because that is the largest load a non-governmental electrical
consumer can turn on or off without calling the power company to coordinate
changing the grid (so the power company can prepare to ramp (up or down)
their generating capacity.) It generally takes them 15-30 minutes to prepare
for such a change in load.
Modern datacenters dissipate 16 to 20kW per rack, with hundreds
or thousands of racks. Basically a megawatt-hour per square meter
with cooling factored in.
Yes, but racks (or motherboards with the racks) are power cycled individually.
Scott Lurndal
2024-03-14 15:20:49 UTC
Permalink
Post by MitchAlsup1
Post by Scott Lurndal
Post by MitchAlsup1
Post by Scott Lurndal
Viz. earlier discussions related to memory speeds,
here's a single chip (wafer-sized) with just under
a million cores and 44GB of SRAM. Four trillion
transistors. 21PB/s memory bandwidth. 23kW.
https://www.theregister.com/2024/03/13/cerebras_claims_to_have_revived/
CPUs were limited to 100 W thermal dissipation,
GPUs got up to 300 W thermal dissipation,
Now you are looking at the thermal dissipation of 10% of an ECL CRAY-1
As to making racks of these things. The reason CRAYs were limited to 300
KVA is because that is the largest load a non-governmental electrical
consumer can turn on or off without calling the power company to coordinate
changing the grid (so the power company can prepare to ramp (up or down)
their generating capacity.) It generally takes them 15-30 minutes to prepare
for such a change in load.
Modern datacenters dissipate 16 to 20kW per rack, with hundreds
or thousands of racks. Basically a megawatt-hour per square meter
with cooling factored in.
Yes, but racks (or motherboards with the racks) are power cycled individually.
Sometimes, perhaps even usually. Unless the datacenter looses power
completely (which does happen, if rarely).
MitchAlsup1
2024-03-14 16:24:25 UTC
Permalink
Post by Scott Lurndal
Post by MitchAlsup1
Post by Scott Lurndal
Post by MitchAlsup1
Post by Scott Lurndal
Viz. earlier discussions related to memory speeds,
here's a single chip (wafer-sized) with just under
a million cores and 44GB of SRAM. Four trillion
transistors. 21PB/s memory bandwidth. 23kW.
https://www.theregister.com/2024/03/13/cerebras_claims_to_have_revived/
CPUs were limited to 100 W thermal dissipation,
GPUs got up to 300 W thermal dissipation,
Now you are looking at the thermal dissipation of 10% of an ECL CRAY-1
As to making racks of these things. The reason CRAYs were limited to 300
KVA is because that is the largest load a non-governmental electrical
consumer can turn on or off without calling the power company to coordinate
changing the grid (so the power company can prepare to ramp (up or down)
their generating capacity.) It generally takes them 15-30 minutes to prepare
for such a change in load.
Modern datacenters dissipate 16 to 20kW per rack, with hundreds
or thousands of racks. Basically a megawatt-hour per square meter
with cooling factored in.
Yes, but racks (or motherboards with the racks) are power cycled individually.
Sometimes, perhaps even usually. Unless the datacenter looses power
completely (which does happen, if rarely).
When the entire data center looses power, its 300KVA+ is the least of the
power companies worries.
Scott Lurndal
2024-03-14 17:11:50 UTC
Permalink
Post by MitchAlsup1
Post by Scott Lurndal
Post by MitchAlsup1
Post by Scott Lurndal
Post by MitchAlsup1
Post by Scott Lurndal
Viz. earlier discussions related to memory speeds,
here's a single chip (wafer-sized) with just under
a million cores and 44GB of SRAM. Four trillion
transistors. 21PB/s memory bandwidth. 23kW.
https://www.theregister.com/2024/03/13/cerebras_claims_to_have_revived/
CPUs were limited to 100 W thermal dissipation,
GPUs got up to 300 W thermal dissipation,
Now you are looking at the thermal dissipation of 10% of an ECL CRAY-1
As to making racks of these things. The reason CRAYs were limited to 300
KVA is because that is the largest load a non-governmental electrical
consumer can turn on or off without calling the power company to coordinate
changing the grid (so the power company can prepare to ramp (up or down)
their generating capacity.) It generally takes them 15-30 minutes to prepare
for such a change in load.
Modern datacenters dissipate 16 to 20kW per rack, with hundreds
or thousands of racks. Basically a megawatt-hour per square meter
with cooling factored in.
Yes, but racks (or motherboards with the racks) are power cycled individually.
Sometimes, perhaps even usually. Unless the datacenter looses power
completely (which does happen, if rarely).
When the entire data center looses power, its 300KVA+ is the least of the
power companies worries.
Not necessarily - it could be one of the lines to the DC that failed.

IBM's almaden research center has two 115kv transmission lines feeding
the center, from two different circuits. A good friend was the
facilities manager there and has several stories about the effects of
outages and switchovers, mainly when on-site switching or circuit
breakre gear failed.

I could see the pylons from a former residence and saw one of the
transmission lines fail during a storm - the transmission line
made a 90-degree bend to head up hill to the labs; as usual
for such a severe bend, they terminated the lines from both
directions at the pylon and connected them with stubs; one
of the stubs had failed and was swinging in the wind - every
time it hit the pylon there was a visible (and audible!)
explosion. Lasted for a couple of hours before they were
able to kill the circuit.

Loading...