Page 24 - EE Times Europe Magazine – November 2023
P. 24
24 EE|Times EUROPE
GREENER ELECTRONICS | PROCESSING
How to Make Generative AI Greener
By Anne-Françoise Pelé
rtificial intelligence is an unstoppable force that is starting to per- Second, all neural network models,
meate all aspects of our society. no matter which ones they are, must be
trained to perform their intended tasks.
The advent of ChatGPT and similar generative AI tools has taken The developer feeds their model a curated
A the world by storm. While many have raved about the capabilities of dataset so that it can “learn” everything
these generative AI tools, the environmental costs and impact of these models it needs to about the type of data it will
analyze. ChatGPT [generative pre-trained
are too often ignored. The development and use of these systems have been transformer] excels at analyzing and then
extremely energy-intensive, and their physical infrastructure requires a great generating human-like text. ChatGPT was
deal of energy. trained with all the data from the internet.
Once it consumed all that internet and found
all the connection points between different
letters and words, all that data became struc-
NeuReality’s tured inside ChatGPT.
Moshe Tanach Third, once it is frozen and using new
context or input, you are doing inference—the
process of using a trained model. To under-
stand inference, imagine teaching someone to
identify musical instruments by their sound.
You start by playing a guitar, a violin and a
ukulele, and you explain that these instru-
ments produce different sounds. Later, when
you introduce a banjo, the person can infer
that it produces a unique sound similar to
the guitar, violin and ukulele since they’re all
string instruments.
NeuReality is specifically focused on the
inference phase, not the training of complex
AI models. Instead, we create the under-
lying architecture and technology stack
for AI-centric inference in the data center
to achieve the best performance at lower
cost and energy consumption—and make it
easy to use and deploy so all businesses can
benefit.
Deploying AI creates massive technical EE TIMES EUROPE: What exactly is
challenges for the traditional CPU-centric inference AI, and how does it relate EETE: How does NeuReality’s inference
computing architecture. Data is moved to generative AI with large language AI solution help solve generative AI
multiple times between the network, CPU models [LLMs] like ChatGPT? problems?
and deep-learning accelerator (DLA) with Moshe Tanach: I’ll break it all down to Tanach: Imagine billions of AI queries made
software-based management and data explain why inference AI and NeuReality’s daily on an LLM like ChatGPT and others
control. This creates multiple conflicts specific technology system is relevant to like it.
between parallel commands, which limits the the economic viability of generative AI and The amount of computer power required to
DLA’s utilization, wastes valuable hardware ChatGPT—and other LLMs like it. classify, analyze and answer those AI queries
resources and increases costs and power First, any neural network model always is astronomical, as are the system costs,
consumption. complies with an underlying architecture, inefficiencies and carbon emissions compared
How can we harness the benefits of AI such as CNN [convolutional neural network], with traditional models. It’s well-publicized
while mitigating its carbon footprint? In a RNN [recurrent neural network], LSTM [long from Microsoft and OpenAI themselves that
discussion with EE Times Europe, Moshe short-term memory] and now it costs millions of dollars per day to run
Tanach, CEO and co-founder of transformer-based models [encoder/decoder] ChatGPT alone.
NeuReality, said the key to reducing AI’s used in LLMs and generative AI. With it, you In fact, generative AI requires 10× less
carbon emissions lies in streamlining oper- can generate language, images and other input than general-purpose CPU-centric sys-
ations and bolstering efficiency. He argued possibilities in the future, and you can let it tems. NeuReality has designed its networked
that the transition from a resource-intensive run as long as you want, giving it new context addressable processing units [NAPUs] to
CPU-centric model to NeuReality’s or new input. That’s why in ChatGPT, you see operate on far less power. Therefore, we help
AI-centric model and server-on-a-chip the “regenerate” function. So generative AI companies save resources while softening
solution can lower cost, reduce energy con- is yet another example of a neural network the burden on the world’s energy systems—
sumption and increase throughput. model or AI category. validated in test cases with IBM Research.
NOVEMBER 2023 | www.eetimes.eu