Page 33 - EE Times Europe November 2021 final
P. 33

EE|Times EUROPE   33

                                                        Memory Bottlenecks: Overcoming a Common AI Problem
































           Graphcore’s comparison of capacity and bandwidth for
           different memory technologies. While others try to solve both
           with HBM2E, Graphcore uses a combination of host DDR
           memory plus on-chip SRAM on its Colossus Mk2 AI    Graphcore’s cost analysis for HBM2 versus DDR4 memory has the
           accelerator chip. (Source: Graphcore)              former costing 10× more than the latter. (Source: Graphcore)


           chip uses two HBM2 stacks, providing   server-class DDR per byte,” Knowles said.   reason for the emergence of a pluggable eco-
           512-GB/s bandwidth connected through an   “Even at modest capacity, HBM dominates the   system of computer components is to avoid
           on-chip network.                    processor module cost. If an AI computer can   margin stacking.”
                                               use DDR instead, it can deploy more AI pro-
           PERFORMANCE PER DOLLAR              cessors for the same total cost of ownership.”  GOING WIDER
           While HBM offers extreme bandwidth for the                              Emerging from stealth mode at Hot Chips,
           off-chip memory needed for data center AI                               Esperanto Technologies offered yet another
           accelerators, a few notable holdouts remain.  “A primary reason for the   take on the memory bottleneck problem. The
             Graphcore is among them. During his Hot   emergence of a pluggable    company’s 1,000-core RISC-V AI acceler-
           Chips presentation, Graphcore CTO Simon                                 ator targets hyperscaler recommendation
           Knowles noted that faster computation in   ecosystem of computer        model inference rather than the AI training
           large AI models requires both memory capac-                             workloads mentioned above.
           ity and memory bandwidth. While others use   components is to avoid       Dave Ditzel, Esperanto’s founder and
           HBM to boost both capacity and bandwidth,                               executive chairman, noted that data center
           tradeoffs include HBM’s cost, power con-  margin stacking.”             inference does not require huge on-chip
           sumption, and thermal limitations.                                      memory. “Our customers did not want
             Graphcore’s second-generation intelli-  — Simon Knowles, Graphcore    250 MB on-chip,” Ditzel said. “They wanted
           gence processing unit (IPU) instead uses its                            100 MB — all the things they wanted to do
           large 896 MiB of on-chip SRAM to support                                with inference fit into 100 MB. Anything big-
           the memory bandwidth required to feed   According to Knowles, 40 GB of HBM   ger than that will need a lot more.”
           its 1,472 processor cores. That’s enough to   effectively triples the cost of a packaged   Ditzel added that customers prefer large
           avoid the higher bandwidth needed to offload   reticle-sized processor. Graphcore’s cost   amounts of DRAM on the same card as the
           DRAM, Knowles said. To support memory   breakdown of 8 GB of HBM2 versus 8 GB of   processor, not on-chip. “They advised us, ‘Just
           capacity, AI models too big to fit on-chip use   DDR4 reckons that the HBM die is double the   get everything onto the card once, and then
           low-bandwidth remote DRAM in the form   size of a DDR4 die (comparing a 20-nm HBM   use your fast interfaces. Then, as long as you
           of server-class DDR. That configuration is   with an 18-nm DDR4, which Knowles argued   can get to 100 GB of memory faster than you
           attached to the host processor, allowing   are contemporaries), thereby increasing man-  can get to it over the PCIe bus, it’s a win.’”
           mid-sized models to be spread over SRAM in a   ufacturing costs. Then there is the cost of TSV   Comparing Esperanto’s approach with
           cluster of IPUs.                    etching, stacking, assembly, and packaging,   other data center inference accelerators,
             Given that the company promotes its IPU   along with memory and processor makers’   Ditzel said that others focus on a single giant
           on a performance-per-dollar basis,    profit margins.                   processor consuming the entire power budget.
           Graphcore’s primary reason to reject HBM   “This margin stacking does not occur for   Esperanto’s approach — multiple low-power
           appears to be cost.                 the DDR DIMM, because the user can source   processors mounted on dual M.2 acceler-
             “The net cost of HBM integrated with an AI   that directly from the memory manufac-  ator cards — better enables use of off-chip
           processor is greater than 10× the cost of    turer,” Knowles said. “In fact, a primary   memory, the startup insists. Single-chip

                                                                                     www.eetimes.eu | NOVEMBER 2021
   28   29   30   31   32   33   34   35   36   37   38