Page 25 - EE Times Europe Magazine – November 2023
P. 25

EE|Times EUROPE   25

                                                                           How to Make Generative AI Greener









































        EETE: Why is mitigating inference’s   150,000 server-nodes-hours, ChatGPT report-  around building higher-performing, less
        environmental impact crucial to scaling   edly emits about 55 tons of CO 2  equivalent   expensive inference AI solutions that also
        generative AI models in business    daily. This is comparable to the lifetime   reduce our carbon footprint. It’s an “and,” not
        applications effectively?           emissions of an average car, accumulating to   an “or.” In this way, we can sustainably meet
        Tanach: Generative AI is suffering from the   the equivalent of 365 cars’ lifetime emissions   the current and future demands of generative
        same CPU-centric architecture as other    each year, assuming steady usage.  AI and other AI applications for fraud detec-
        models, including image classification,    Below are three studies outlining the   tion, translation services, chatbots and more.
        natural-language processing, recommenda-  current and negative environmental impacts   Today’s infrastructure lacks in two main
        tion systems and anomaly-detection models.  of today’s CPU- and GPU-centric generative   ways:
          NeuReality is reinventing inference AI   AI models:                     •  The system architecture uses non-AI-
        to meet the current and future demands of   •  In 2019, University of Massachusetts   specific hardware; therefore, it can’t do
        generative AI and all other models that rely   Amherst researchers trained several   the real job of the inference server.
        on inference to scale without bleeding money.   LLMs and found that training a single   •  Even though the deep-learning model off-
        When a company relies on a CPU to manage   AI model can emit over 626,000 pounds   loads software to hardware, there are still
        inference in deep-learning models—no matter   [about 283,948.59 kg] of CO 2 —equivalent   too many surrounding functions running
        how powerful the DLA—that CPU will reach   to the emissions of five cars over their   in the software. It’s not completely off-
        an optimum threshold.                  lifetimes—as shared as far back as 2019 in   loading to the extent needed to be more
          In contrast, NeuReality’s AI solution stack   MIT Technology Review      energy-efficient.
        does not buckle under the weight. The system   (tinyurl.com/z28erurk).    These system deficiencies lower the
        architecture runs far more efficiently and   •  A more recent study (tinyurl.com/bdz9rtx5)   utilization of the GPUs and DLAs used today,
        effectively with less energy consumed.  made a similar analogy. It reported that   and the lack of efficiency takes a heavier toll
                                               training GPT-3 with 175 billion parame-  on energy consumption and therefore on the
        EETE: What is the carbon footprint from   ters consumed 1,287 MWh of electricity   environment.
        training generative AI models?         and resulted in carbon emissions of    NeuReality makes these models perform
        Tanach: NeuReality’s AI-centric architecture   502 metric tons of carbon. That’s like   better and more affordably while actually
        with more energy-efficient NAPUs—which is a   driving 112 gasoline-powered cars for a   decreasing the impact on the environment.
        new custom AI chip—reduces power con-  year.                            We designed our system architecture for AI
        sumption significantly.               •  Microsoft outlines the cost of Azure   as opposed to modifying the old architecture.
          In contrast, today’s generative AI and LLMs   instances for calculations    Our new NAPU offloads the leftover com-
        pose significant environmental concerns due   (tinyurl.com/rvp46re2).   puting functions waterfalling to Arm cores,
        to their high energy usage and resulting car-                           which are less expensive and power-hungry.
        bon emissions. Analysts suggest the carbon   EETE: How can we make these   By removing that CPU bottleneck, we also
        footprint of a single AI query could be 4× to   models more performant than their   increase the DLA utilization.
        5× that of a typical search engine query.    predecessors without imposing a heavier   Taken together, all these contributors make
        With daily consumption estimated at    toll on the environment?         the AI-centric solutions run better without
        1.17 million GPU-hours, equating to    Tanach: We have a strong sense of urgency   imposing a heavier toll on the environment.

                                                                                   www.eetimes.eu | NOVEMBER 2023
   20   21   22   23   24   25   26   27   28   29   30