Page 40 - EE Times Europe November 2021 final
P. 40

40 EE|Times EUROPE

           Tesla AI Day: What to Expect for the Future of Self-Driving Cars


             The training tile packaging includes
           multiple layers of power and control, current
           distribution, compute plane (25 D1 chips), and
           cooling system. The training tile is for use in
           IT centers — not in autonomous vehicles.
             The training tile provides 25× performance
           of a single D1 chip or up to 9 petaflops
           for 16-bit floating-point calculations and
           up to 565 Tflops for 32-bit floating-point
           calculations.
             Twelve training tiles in 2 × 3 × 2 configura-
           tion can be packed in a cabinet, which Tesla
           calls a training matrix.
           EXAPOD
           The largest system that Tesla described is the
           ExaPOD. It is built from 120 training tiles,
           which adds up to 3,000 D1 chips and
           1.062 million training nodes. It fits in 10 cabi-
           nets and is clearly intended for IT center use.
             Maximum performance of the ExaPOD
           is 1.09 exaflops for 16-bit floating-point
           calculations and 67.8 Pflops for 32-bit
           floating-point calculations.
           DOJO SOFTWARE & DPU
           The Dojo software is designed to support
           training of large and small neural networks.   (Source: Tesla)
           Tesla has a compiler to create software code
           that leverages the structure and capabilities   D1-based systems can be subdivided and   systems will be used in inferencing appli-
           of the training nodes, D1 chips, training tiles,   partitioned into units called Dojo Processing   cations in AVs. The training tile’s power
           and ExaPOD systems. It uses the PyTorch   Units (DPUs). A DPU consists of one or more   consumption looks too high for auto use in
           open-source machine-learning library with   D1 chips, an interface processor, and one or   the current version. One picture in the pre-
           extensions to leverage the D1 chip and Dojo   more computer hosts. The DPU virtual system   sentation had a “15 KW Heat Rejection” label
           system architecture.                can be scaled up or down as needed by the   for the training tile. A D1 chip is probably in
             These capabilities allow big neural   neural network running on it.   the range with 400-W TDP listed in a slide.
           networks to be partitioned and mapped to                                  It looks like Tesla is hoping and/or
           extract model, graph, and data parallelism to   BOTTOM LINE             depending on this neural network training
           speed up large neural network training. The   The Tesla neural network training chip,    innovation to make its Autopilot into an
           compiler uses multiple techniques to extract   system, and software are very impressive.    L3- or L4-capable system — with only
           parallelism. It can transform the networks to   There is a lot of innovation, such as retaining   camera-based sensors. Is this a good bet?
           achieve fine-grain parallelism using data-  tremendous bandwidth and low latency    Time will tell, but so far, most of Elon Musk’s
           model-graph–parallelism techniques and can   from chip to systems. The packaging of    bets have been good, albeit with some delay. ■
           optimize to reduce memory footprints.  the training tile for power and cooling also
             The Dojo interface processors are used to   looks innovative.         Egil Juliussen is the former director of
           communicate with host computers in IT and   The neural network training systems are   research for infotainment and ADAS at IHS
           data centers. They connect with PCIe 4.0 to   for data center use and will certainly be used   Automotive; an independent auto industry
           host computers and to the D1-based system via   for improving Tesla’s AV software. It is likely   analyst; and EE Times’ “Egil’s Eye” columnist.
           the high bandwidth explained above. The inter-  that other companies will also use these Tesla   This article was originally published on
           face processors also provide high-bandwidth   neural network training systems.  EE Times and may be viewed at
           DRAM shared memory for the D1 systems.  A key question is how the neural network   bit.ly/3zO66Z6.







                                    bit.ly/3BwAJU5










           NOVEMBER 2021 | www.eetimes.eu
   35   36   37   38   39   40   41   42   43   44   45