Page 20 - EE Times Europe Magazine – November 2023
P. 20

20 EE|Times EUROPE




        OPINION | GREENER ELECTRONICS | PROCESSING
                                                                                  According to informal sources, OpenAI’s
        Server Processors in                                                    GPT-2, introduced in 2019, was trained on
                                                                                300 million tokens of text data and had
                                                                                1.5 billion parameters. OpenAI’s GPT-3, also
        the AI Era: Can They                                                    known as ChatGPT, was trained on about
                                                                                400 billion tokens of text data and had
                                                                                175 billion parameters. The details of the
        Go Greener?                                                             recent ChatGPT model, GPT-4, have not been
                                                                                publicly disclosed, but estimates of its size
                                                                                range from 400 billion to 1 trillion parameters
                                                                                and a humongous dataset of about 8 trillion
        By Avi Messica and Ziv Leshem, NeoLogic                                 text tokens for training.  Put another way, the
                                                                                                 3
                                                                                workload of training GPT-3 is about 150,000×
                                                                                as much as GPT-2’s, and training GPT-4
                                               “Just when I thought I was out, they   requires about 50× to 120× more computing
                                               pull me back in,” Michael Corleone (Al   than GPT-3. OpenAI has also capped the
                                               Pacino) says in “The Godfather Part III.”   number of messages that users can send to
                                               Much the same might be said of server   GPT-4 because inference puts a strain on
                                               processors: The more powerful and    compute resources. 4
                                               power-efficient they get, the more the
                                               data center’s workload pulls them back to
                                               a more distant starting point.
                                                 As data centers continue to expand
        in scale, complexity and connectivity, their power consumption increases as well. According to
        the International Energy Agency, data centers and data transmission networks are responsible
        for 1% of energy-related greenhouse gas emissions. The estimated global data center electric-
        ity consumption in 2022 was 240 TWh to 340 TWh, or about 1% to 1.3% of global electricity
        consumption, excluding energy that was spent for cryptocurrency mining.  According to some
                                                             1
        sources, it reaches 3% and tops industries like aviation, shipping, and food and tobacco.
          Despite great efforts to improve processors’ efficiency, the rapid growth of AI workloads has
        resulted in a substantial increase in energy consumption over the past decade, growing by 20%
        to 40% annually. The combined electricity consumption of the Amazon, Microsoft, Google and
        Meta clouds has more than doubled between 2017 and 2021, rising to about 72 TWh in 2021. 1
          The current major AI workloads in data centers are deep learning, machine learning, com-
        puter vision and streaming video, recommender systems, and natural-language processing, a
        recent addition. AI tasks are computing power hogs, and large language models are especially
        demanding. Google’s PaLM language model is relatively efficient. However, its training required   Figure 1: Typical power breakdown of
        2.5 billion petaFLOPS of computation; that is, its training is more than 5 million times more   a GPU: cores (50%), memory controller
        computation-intensive than AlexNet, the convolutional neural network that was introduced in   (20%) and DRAM (30%)
        2012 for machine-vision tasks, heralding the AI era. 2                  (Source: Zhao et al., 2013 ) 5

                                                                                  Most AI tasks’ workloads are associ-
                                                                                ated with arithmetic operations (typically
                                                                                matrix-matrix or matrix-vector multiplica-
                                                                                tion), either on training or inference (apart
                                                                                from data fetching). The computational
                                                                                intensity of training an AI model equals
                                                                                the product of the training time, number
                                                                                of computing instances used, peak FLOPS
                                                                                and utilization rate. Therefore, power
                                                                                consumption is linearly dependent on time
                                                                                (training or inference), the number of par-
                                                                                allel computing instances (CPU, GPU, TPU,
                                                                                AI accelerator and the like), the computing
                                                                                power of an instance (e.g., FLOPS) and the
                                                                                utilization rate (i.e., the fraction of the time
                                                                                a GPU is running tasks when the model is
                                                                                trained).
                                                                                  Figure 1 illustrates the power breakdown
                                                                                of a typical GPU,  in which the cores consume
                                                                                            5
                                                                                about 50% of the total power, and off-chip   IMAGE: ADOBE STOCK
                                                                                memory and memory controller consume the
                                                                                remaining 50% (the breakdown is similar for
                                                                                CPUs).

        NOVEMBER 2023 | www.eetimes.eu
   15   16   17   18   19   20   21   22   23   24   25