Page 17 - EE|Times Europe Magazine - December 2020
P. 17
EE|Times EUROPE — The Memory Market 17
Memory Technologies Confront Edge AI’s Diverse Challenges
technology that places multiple memory die in the same package as the would go across [multiple chips], but for inference [the Cloud AI 100’s
GPU itself. market], a lot of the models are more localized.”
Both are designed for the extremely high memory bandwidth
required by AI applications. THE FAR EDGE
For the most demanding AI model training, HBM2E offers 3.6 Gbps Outside the data center, edge AI systems generally focus on inference,
and provides a memory bandwidth of 460 GB/s (two HBM2E stacks pro- with a few notable exceptions such as federated learning and other
vide close to 1 TB/s). That’s among incremental training techniques.
the highest-performance memory Some AI accelerators for power-sensitive applications use mem-
available, in the smallest area and ory for AI processing. Inference, which is based on multidimensional
with the lowest power consumption. matrix multiplication, lends itself to analog compute techniques with
HBM is used by GPU leader Nvidia in an array of memory cells used to perform calculations. Using this tech-
all of its data center products. nique, Syntiant’s devices are designed for voice control of consumer
GDDR6 is also used for AI electronics, and Gyrfalcon’s devices have been designed into a smart-
inference applications at the edge, phone, where they handle inference for camera effects.
said Frank Ferro, senior director In another example, Mythic uses analog operation of flash memory
of product marketing for IP Cores cells to store an 8-bit integer value (one weight parameter) on a single
at Rambus. Ferro said that GDDR6 flash transistor, making it much denser than other compute-in-mem-
can meet the speed, cost, and power ory technologies. The programmed flash transistor functions as a
requirements of edge AI inference variable resistor; inputs are supplied as voltages and outputs collected
Rambus’s Frank Ferro systems. For instance, GDDR6 can as currents. Combined with ADCs and DACs, the result is an efficient
deliver 18 Gbps and provides matrix-multiply engine.
72 GB/s. Having four GDDR6 DRAMs provides close to 300 GB/s of Mythic’s IP resides in the compensation and calibration techniques
memory bandwidth. that cancel out noise and allow reliable 8-bit computation.
“GDDR6 is used for AI inference and ADAS applications,” he added.
When comparing GDDR6 with LPDDR, the low-power DDR version
that has been Nvidia’s approach for most non-data–center edge solu- Mythic uses an
tions from the Jetson AGX Xavier to Jetson Nano, Ferro acknowledged array of flash
that LPDDR is suited to low-cost AI inference at the edge or endpoint. memory transistors
“The bandwidth of LPDDR is limited to 4.2 Gbps for LPDDR4 and to make dense
6.4 Gbps for LPDDR5,” he said. “As the memory bandwidth demands multiply-
go up, we will see an increasing number of designs using GDDR6. This accumulate
memory bandwidth gap is helping to drive demand for GDDR6.” engines.
Though GDDR was designed to fit alongside GPUs, other processing (Source: Mythic)
accelerators can take advantage of its bandwidth. Ferro highlighted the
Achronix Speedster7t, an FPGA-based AI accelerator used for inference
and some low-end training.
“There is room for both HBM and GDDR memories in edge AI
applications,” said Ferro. HBM “will continue to be used in edge appli-
cations. For all of the advantages of HBM, the cost is still high due to Aside from compute-in-memory devices, ASICs are popular for
the 3D technology and 2.5D manufacturing. Given this, GDDR6 is a specific edge niches, particularly for low- and ultra-low-power sys-
good tradeoff between cost and performance, especially for AI infer- tems. Memory systems for ASICs use a combination of memory types.
ence in the network.” Distributed local SRAM is the fastest and most power-efficient but
HBM is used in high-performance data center AI ASICs such as the not very area-efficient. Having a single bulk SRAM on the chip is more
Graphcore IPU. While HBM offers stellar performance, its price tag can area-efficient but introduces performance bottlenecks. Off-chip DRAM
be steep for some applications. is cheaper but uses much more power.
Geoff Tate, CEO of Flex Logix, said that finding the right balance
Edge AI application-specific demands may among distributed SRAM, bulk SRAM, and off-chip DRAM for its InferX
X1 required a range of performance simulations. The aim was to maxi-
include size, power consumption, low-voltage mize inference throughput per dollar — a function of die size, package
operation, reliability, and cost. cost, and number of DRAMs used.
“The optimal point was a single ×32 LPDDR4 DRAM, 4K MACs
(7.5 TOPS at 933 MHz), and around 10 MB of SRAM,” he said. “SRAM
is fast, but it is expensive versus DRAM. Using TSMC’s 16-nm process
Qualcomm’s Cloud AI 100, for example, targets AI inference acceler- technology, 1 MB of SRAM takes about 1.1 mm . Our InferX X1 is just
2
ation in edge data centers, 5G “edge boxes,” ADAS/autonomous driving, 54 mm , and due to our architecture, DRAM accesses are largely over-
2
and 5G infrastructure. “It was important for us to use standard DRAM lapped with computation, so there is no performance compromise. For
as opposed to something like HBM, because we want to keep the bill of large models, having a single DRAM is the right tradeoff, at least with
materials down,” said Keith Kressin, general manager of Qualcomm’s our architecture.”
Computing and Edge Cloud unit. The Flex Logix chip will be used in edge AI inference applications
“We wanted to use standard components that you can buy from that require real-time operation, including analyzing streaming video
multiple suppliers,” said Kressin. “We have customers who want to do with low latency. This includes ADAS systems, analysis of security foot-
everything on-chip, and we have customers that want to go cross-card. age, medical imaging, and quality assurance/inspection applications.
But they all wanted to keep the cost reasonable and not go for HBM or What kind of DRAM will go alongside the InferX X1 in these
even a more exotic memory. In training, you have really big models that applications?
www.eetimes.eu | DECEMBER 2020