Page 26 - EE Times Europe Magazine – November 2023
P. 26
26 EE|Times EUROPE
How to Make Generative AI Greener
EETE: What is the carbon footprint from US$700,000 daily to run ChatGPT because allocation, using resources only when
running inference? the underlying architecture was not built for needed at 100% utilization.
Tanach: Let’s use an example from Google, inference [NeuReality would infer even more • Running the complete AI pipeline of
with its massive data centers handling a now that ChatGPT can browse the internet tasks [not just the DLA model] offloads
variety of tasks from Google Search to Google to provide information no longer limited to intensive tasks to our NR1 hardware in
Bard. Machine-learning training and infer- data before September 2021]. ChatGPT is parallel with heterogeneous compute
ence account for only 10% to 15% of Google’s simply too expensive and energy-intensive, engines rather than software appli-
total energy use for each of the last three but it’s also likely to hit a performance ceil- cations, making our solutions more
years, according to Google Research from ing sooner than later. power-efficient.
February 2022. And each year, it’s split 2/5 for Our solution stack was designed specifi- • Reduced inference time—achieved
training and 3/5 for inference. cally for AI inference in all its forms, whether through hardware offloading and leading
Like other big players with big data centers, for cloud computing, virtual reality, deep to lower inference latency—makes it
Google’s total energy use increases each year, learning, cybersecurity or natural-language suitable for real-time or low-latency
according to Statista and their own internal processing. The market and our customers applications.
sources. Machine-learning workloads have have an urgent need to make generative AI
especially grown rapidly, as has the computa- profitable, which NeuReality can deliver with EETE: Can you explain NeuReality’s long-
tion per training run. 10× performance at a fraction of the cost—in term vision and ambition? Where does
While inference AI is already a smaller other words, US$200,000 per day rather than the company stand today?
percentage of overall energy use, it is growing US$1 million. Tanach: In a few words, NeuReality aims to
in popularity to support hungry generative AI NeuReality addresses today’s challenges— make AI easy. Our ultimate vision is a sustain-
apps. It’s critical to select the right energy- both economically [total customer value or able AI digital world where we democratize AI
efficient infrastructure to optimize models total cost of ownership] and environmentally and accelerate human achievement through
and to implement software tools and algo- [less power consumption and less carbon AI technology.
rithms that reduce computational workload footprint]. What makes our AI-centric archi- We are a young company with the vision
during the inference process. This is exactly tecture different revolves around four model to make AI accessible to all innovators by
what NeuReality is doing with our new NR1 characteristics: empowering their efforts to cure diseases,
shipping at the end of 2023. • Hardening of data movement and improve public safety and bring innovative
processing AI-based ideas to fruition.
EETE: How can we achieve greener • Hardening of sequencing currently per- Today, we have real products and partners
generative AI with more sustainable formed in software and CPUs—hence, our creating a value chain to help bring our prod-
inference? What are the options? AI hypervisor ucts to market. Our team has worked hard for
Tanach: NeuReality demonstrated foresight • Efficient networking for data manage- the last three years to prototype [NR1-P] and
when we began this journey three years ago. ment between clients and server then design an entirely new NR1 chip [NR1],
The problem we set out to solve was how to • Heterogeneous computing incorpo- which was validated and sent to a TSMC man-
design the best AI technology at a system rating decoders, DSPs, DLAs and Arm ufacturing facility in September 2023. Those
level and software tools suited precisely to processors, all optimized for effi- chips will ship by the end of 2023.
the growing needs of inference AI. cient operation and scaled to ensure Our inference AI solution includes three
Efforts to achieve high-performance, continuous utilization of the DLA com- additional components:
affordable and accessible AI—with less plemented by versatile multi-purpose • The NRI-M module is a full-height
environmental impact—should be part of processors double-wide PCIe card containing one
a broader sustainability strategy where These features are built into NR1 chip and a network-attached infer-
businesses large and small consider the NeuReality’s AI solution stack to reduce ence service, which can connect to an
environmental impacts throughout their AI energy per inference operation—making it a external DLA.
models’ life cycle. greener and more efficient approach. • The NR1-S inference server is a proto-
There are multiple factors to weigh— type design for an inference server with
including energy-efficient hardware for both EETE: NeuReality claims its system-level NR1-M modules and an NR1 chip, which
training and inference. This includes GPUs, AI-centric approach simplifies running AI enables truly disaggregated AI service.
TPUs and custom DLAs designed to perform inference at scale. How does it do that, The system shows not only lower cost
AI workloads with better energy efficiency. and what makes it less energy-intensive? and power performance by up to 50× but
Of course, NeuReality knows that these chips Tanach: NeuReality collaborated with IBM does not require IT to implement it for
have not been optimal and offers a clear researchers to test our inference AI solution. business end users.
alternative with smaller models that consume Results showed 10× the level of performance • We have also built out software tools and
less energy. compared with a conventional CPU-server– APIs to make it easy to develop, deploy
based solution. Moving from the time- and and manage our AI interface.
EETE: NeuReality was founded in 2019 resource-intensive CPU to NeuReality’s NeuReality’s larger vision is to make AI
with the aim of developing a new NAPU also lowers cost and power consump- sustainable—economically and environmen-
generation of AI inference solutions tion—which is good for top-line revenue, tally. We intend to keep anticipating and
that break free from traditional CPU- bottom-line cost management and the building the future through our substantial
centric architectures and deliver high environment. systems engineering expertise. As we stay in
performance, low latency and energy Many factors work together to make sync with customers and partners inside and
efficiency. Why is it essential to develop NeuReality’s system architecture less outside of tech, we can start to design and
alternatives to CPU-centric AI inference energy-intensive: build the kind of technology infrastructures
architectures? • NAPUs enable disaggregation and com- and systems that are needed one, three, five
Tanach: Right now, it costs at least patibility of the AI computing resource or 10 years into the future. ■
NOVEMBER 2023 | www.eetimes.eu