Page 26 - EE Times Europe Magazine – November 2023
        P. 26
     26 EE|Times EUROPE
        How to Make Generative AI Greener
        EETE: What is the carbon footprint from   US$700,000 daily to run ChatGPT because   allocation, using resources only when
        running inference?                  the underlying architecture was not built for   needed at 100% utilization.
        Tanach: Let’s use an example from Google,   inference [NeuReality would infer even more   •  Running the complete AI pipeline of
        with its massive data centers handling a   now that ChatGPT can browse the internet   tasks [not just the DLA model] offloads
        variety of tasks from Google Search to Google   to provide information no longer limited to   intensive tasks to our NR1 hardware in
        Bard. Machine-learning training and infer-  data before September 2021]. ChatGPT is   parallel with heterogeneous compute
        ence account for only 10% to 15% of Google’s   simply too expensive and energy-intensive,   engines rather than software appli-
        total energy use for each of the last three   but it’s also likely to hit a performance ceil-  cations, making our solutions more
        years, according to Google Research from   ing sooner than later.          power-efficient.
        February 2022. And each year, it’s split 2/5 for   Our solution stack was designed specifi-  •  Reduced inference time—achieved
        training and 3/5 for inference.     cally for AI inference in all its forms, whether   through hardware offloading and leading
          Like other big players with big data centers,   for cloud computing, virtual reality, deep   to lower inference latency—makes it
        Google’s total energy use increases each year,   learning, cybersecurity or natural-language   suitable for real-time or low-latency
        according to Statista and their own internal   processing. The market and our customers   applications.
        sources. Machine-learning workloads have   have an urgent need to make generative AI
        especially grown rapidly, as has the computa-  profitable, which NeuReality can deliver with   EETE: Can you explain NeuReality’s long-
        tion per training run.              10× performance at a fraction of the cost—in   term vision and ambition? Where does
          While inference AI is already a smaller   other words, US$200,000 per day rather than   the company stand today?
        percentage of overall energy use, it is growing   US$1 million.         Tanach: In a few words, NeuReality aims to
        in popularity to support hungry generative AI   NeuReality addresses today’s challenges—  make AI easy. Our ultimate vision is a sustain-
        apps. It’s critical to select the right energy-   both economically [total customer value or   able AI digital world where we democratize AI
        efficient infrastructure to optimize models   total cost of ownership] and environmentally   and accelerate human achievement through
        and to implement software tools and algo-  [less power consumption and less carbon   AI technology.
        rithms that reduce computational workload   footprint]. What makes our AI-centric archi-  We are a young company with the vision
        during the inference process. This is exactly   tecture different revolves around four model   to make AI accessible to all innovators by
        what NeuReality is doing with our new NR1   characteristics:            empowering their efforts to cure diseases,
        shipping at the end of 2023.          •  Hardening of data movement and   improve public safety and bring innovative
                                               processing                       AI-based ideas to fruition.
        EETE: How can we achieve greener      •  Hardening of sequencing currently per-  Today, we have real products and partners
        generative AI with more sustainable    formed in software and CPUs—hence, our   creating a value chain to help bring our prod-
        inference? What are the options?       AI hypervisor                    ucts to market. Our team has worked hard for
        Tanach: NeuReality demonstrated foresight   •  Efficient networking for data manage-  the last three years to prototype [NR1-P] and
        when we began this journey three years ago.   ment between clients and server  then design an entirely new NR1 chip [NR1],
        The problem we set out to solve was how to   •  Heterogeneous computing incorpo-  which was validated and sent to a TSMC man-
        design the best AI technology at a system   rating decoders, DSPs, DLAs and Arm   ufacturing facility in September 2023. Those
        level and software tools suited precisely to   processors, all optimized for effi-  chips will ship by the end of 2023.
        the growing needs of inference AI.     cient operation and scaled to ensure   Our inference AI solution includes three
          Efforts to achieve high-performance,   continuous utilization of the DLA com-  additional components:
        affordable and accessible AI—with less   plemented by versatile multi-purpose   •  The NRI-M module is a full-height
        environmental impact—should be part of   processors                        double-wide PCIe card containing one
        a broader sustainability strategy where   These features are built into    NR1 chip and a network-attached infer-
        businesses large and small consider the   NeuReality’s AI solution stack to reduce   ence service, which can connect to an
        environmental impacts throughout their AI   energy per inference operation—making it a   external DLA.
        models’ life cycle.                 greener and more efficient approach.  •  The NR1-S inference server is a proto-
          There are multiple factors to weigh—                                     type design for an inference server with
        including energy-efficient hardware for both   EETE: NeuReality claims its system-level   NR1-M modules and an NR1 chip, which
        training and inference. This includes GPUs,   AI-centric approach simplifies running AI   enables truly disaggregated AI service.
        TPUs and custom DLAs designed to perform   inference at scale. How does it do that,   The system shows not only lower cost
        AI workloads with better energy efficiency.   and what makes it less energy-intensive?  and power performance by up to 50× but
        Of course, NeuReality knows that these chips   Tanach: NeuReality collaborated with IBM   does not require IT to implement it for
        have not been optimal and offers a clear   researchers to test our inference AI solution.   business end users.
        alternative with smaller models that consume   Results showed 10× the level of performance   •  We have also built out software tools and
        less energy.                        compared with a conventional CPU-server–  APIs to make it easy to develop, deploy
                                            based solution. Moving from the time- and   and manage our AI interface.
        EETE: NeuReality was founded in 2019   resource-intensive CPU to NeuReality’s   NeuReality’s larger vision is to make AI
        with the aim of developing a new    NAPU also lowers cost and power consump-  sustainable—economically and environmen-
        generation of AI inference solutions   tion—which is good for top-line revenue,   tally. We intend to keep anticipating and
        that break free from traditional CPU-  bottom-line cost management and the   building the future through our substantial
        centric architectures and deliver high   environment.                   systems engineering expertise. As we stay in
        performance, low latency and energy   Many factors work together to make   sync with customers and partners inside and
        efficiency. Why is it essential to develop   NeuReality’s system architecture less   outside of tech, we can start to design and
        alternatives to CPU-centric AI inference   energy-intensive:            build the kind of technology infrastructures
        architectures?                        •  NAPUs enable disaggregation and com-  and systems that are needed one, three, five
        Tanach: Right now, it costs at least   patibility of the AI computing resource   or 10 years into the future. ■
        NOVEMBER 2023 | www.eetimes.eu





