Page 17 - EE|Times Europe Magazine - December 2020
P. 17

EE|Times EUROPE — The Memory Market  17

                                                  Memory Technologies Confront Edge AI’s Diverse Challenges


        technology that places multiple memory die in the same package as the   would go across [multiple chips], but for inference [the Cloud AI 100’s
        GPU itself.                                           market], a lot of the models are more localized.”
          Both are designed for the extremely high memory bandwidth
        required by AI applications.                          THE FAR EDGE
          For the most demanding AI model training, HBM2E offers 3.6 Gbps   Outside the data center, edge AI systems generally focus on inference,
        and provides a memory bandwidth of 460 GB/s (two HBM2E stacks pro-  with a few notable exceptions such as federated learning and other
                                 vide close to 1 TB/s). That’s among   incremental training techniques.
                                 the highest-performance memory   Some AI accelerators for power-sensitive applications use mem-
                                 available, in the smallest area and   ory for AI processing. Inference, which is based on multidimensional
                                 with the lowest power consumption.   matrix multiplication, lends itself to analog compute techniques with
                                 HBM is used by GPU leader Nvidia in   an array of memory cells used to perform calculations. Using this tech-
                                 all of its data center products.  nique, Syntiant’s devices are designed for voice control of consumer
                                   GDDR6 is also used for AI   electronics, and Gyrfalcon’s devices have been designed into a smart-
                                 inference applications at the edge,   phone, where they handle inference for camera effects.
                                 said Frank Ferro, senior director   In another example, Mythic uses analog operation of flash memory
                                 of product marketing for IP Cores   cells to store an 8-bit integer value (one weight parameter) on a single
                                 at Rambus. Ferro said that GDDR6   flash transistor, making it much denser than other compute-in-mem-
                                 can meet the speed, cost, and power   ory technologies. The programmed flash transistor functions as a
                                 requirements of edge AI inference   variable resistor; inputs are supplied as voltages and outputs collected
        Rambus’s Frank Ferro     systems. For instance, GDDR6 can   as currents. Combined with ADCs and DACs, the result is an efficient
                                 deliver 18 Gbps and provides    matrix-multiply engine.
        72 GB/s. Having four GDDR6 DRAMs provides close to 300 GB/s of   Mythic’s IP resides in the compensation and calibration techniques
        memory bandwidth.                                     that cancel out noise and allow reliable 8-bit computation.
          “GDDR6 is used for AI inference and ADAS applications,” he added.
          When comparing GDDR6 with LPDDR, the low-power DDR version
        that has been Nvidia’s approach for most non-data–center edge solu-  Mythic uses an
        tions from the Jetson AGX Xavier to Jetson Nano, Ferro acknowledged   array of flash
        that LPDDR is suited to low-cost AI inference at the edge or endpoint.  memory transistors
          “The bandwidth of LPDDR is limited to 4.2 Gbps for LPDDR4 and    to make dense
        6.4 Gbps for LPDDR5,” he said. “As the memory bandwidth demands   multiply-
        go up, we will see an increasing number of designs using GDDR6. This   accumulate
        memory bandwidth gap is helping to drive demand for GDDR6.”  engines.
          Though GDDR was designed to fit alongside GPUs, other processing   (Source: Mythic)
        accelerators can take advantage of its bandwidth. Ferro highlighted the
        Achronix Speedster7t, an FPGA-based AI accelerator used for inference
        and some low-end training.
          “There is room for both HBM and GDDR memories in edge AI
        applications,” said Ferro. HBM “will continue to be used in edge appli-
        cations. For all of the advantages of HBM, the cost is still high due to   Aside from compute-in-memory devices, ASICs are popular for
        the 3D technology and 2.5D manufacturing. Given this, GDDR6 is a   specific edge niches, particularly for low- and ultra-low-power sys-
        good tradeoff between cost and performance, especially for AI infer-  tems. Memory systems for ASICs use a combination of memory types.
        ence in the network.”                                 Distributed local SRAM is the fastest and most power-efficient but
          HBM is used in high-performance data center AI ASICs such as the   not very area-efficient. Having a single bulk SRAM on the chip is more
        Graphcore IPU. While HBM offers stellar performance, its price tag can   area-efficient but introduces performance bottlenecks. Off-chip DRAM
        be steep for some applications.                       is cheaper but uses much more power.
                                                                Geoff Tate, CEO of Flex Logix, said that finding the right balance
        Edge AI application-specific demands may              among distributed SRAM, bulk SRAM, and off-chip DRAM for its InferX
                                                              X1 required a range of performance simulations. The aim was to maxi-
        include size, power consumption, low-voltage          mize inference throughput per dollar — a function of die size, package
        operation, reliability, and cost.                     cost, and number of DRAMs used.
                                                                “The optimal point was a single ×32 LPDDR4 DRAM, 4K MACs
                                                              (7.5 TOPS at 933 MHz), and around 10 MB of SRAM,” he said. “SRAM
                                                              is fast, but it is expensive versus DRAM. Using TSMC’s 16-nm process
          Qualcomm’s Cloud AI 100, for example, targets AI inference acceler-  technology, 1 MB of SRAM takes about 1.1 mm . Our InferX X1 is just
                                                                                               2
        ation in edge data centers, 5G “edge boxes,” ADAS/autonomous driving,   54 mm , and due to our architecture, DRAM accesses are largely over-
                                                                   2
        and 5G infrastructure. “It was important for us to use standard DRAM   lapped with computation, so there is no performance compromise. For
        as opposed to something like HBM, because we want to keep the bill of   large models, having a single DRAM is the right tradeoff, at least with
        materials down,” said Keith Kressin, general manager of Qualcomm’s   our architecture.”
        Computing and Edge Cloud unit.                          The Flex Logix chip will be used in edge AI inference applications
          “We wanted to use standard components that you can buy from   that require real-time operation, including analyzing streaming video
        multiple suppliers,” said Kressin. “We have customers who want to do   with low latency. This includes ADAS systems, analysis of security foot-
        everything on-chip, and we have customers that want to go cross-card.   age, medical imaging, and quality assurance/inspection applications.
        But they all wanted to keep the cost reasonable and not go for HBM or   What kind of DRAM will go alongside the InferX X1 in these
        even a more exotic memory. In training, you have really big models that   applications?

                                                                                   www.eetimes.eu | DECEMBER 2020
   12   13   14   15   16   17   18   19   20   21   22