Lumi: CSC’s Manninen on Managing Europe’s Biggest Supercomputer—and AI’s Expectations
Like all supercomputers, Lumi is unique projects. There’s a very big spectrum of differ- how different components are accessed—for
and does not follow the blueprint of any other ent applications that run on the system. But example, file I/O. And bad code will slow
system. But the design philosophy is similar the distribution is very top heavy, in the sense things down no matter how powerful the
to what you see in the current Perlmutter that a handful of applications consume 95% computer is. It’s very often an algorithmic
at NERSC [the National Energy Research of the resources. problem and then a software problem.
Scientific Computing Center] in California or About 50% of Lumi’s capacity now goes to While there is a set of computational
Leonardo at Cineca in Italy. It also has many AI, which here refers to training deep neural problems that won’t ever be run on anything
things in common with Piz Daint at CSCS [the networks for different purposes, especially other than serial computing, it’s not a very
Swiss National Computing Center], one of for large language models but also for things big set. For many problems, we will still need
Lumi’s partners. like image recognition. This is clearly a higher to scale workloads over tightly connected
Lumi is more heterogeneous in terms of share of the workload than we anticipated nodes and even connected systems. I’m not
node types and its storage solution than, for back in 2019, when we were expecting simu- sure if it’s necessary—or even possible—to
instance, its big brother Frontier at Oak Ridge lations to take the biggest share. Traditional keep building larger supercomputers in terms
National Laboratory in Tennessee. But it’s more HPC would be running things like molecular of node counts. One solution may be to take
homogeneous than some other leadership-class dynamics and CFD [computational fluid several large supercomputers and have them
systems, such as MareNostrum 5, at the dynamics], so that’s what we thought would interoperate in a federated fashion to speed
Barcelona Supercomputing Center in Spain. be the big use cases. up suitable workflows.
Fortunately, the system we built turned AI is a good example of something that’s
About 50% of Lumi’s out to be a very suitable system for AI. I think very nicely parallelizable. It relies on dense
AI will impact all fields of science, becom-
linear algebra with many layers of paral-
capacity now goes to AI. ing complementary to simulations—or even lelism we can exploit. The AI computing
… This is clearly a higher replacing them in some domains. This use demand is something we can handle with
case clearly needs a lot of computing power
the multicore CPUs and GPUs—no problem
share of the workload than and ultra-fast data access, which can only be there. And the reason GPUs are so well-
we anticipated back in 2019, delivered by large supercomputers. suited is that they just get more FLOPS
We didn’t need to make so many hardware
for the same power budget—that is, better
when we were expecting adjustments, but there were some cultural performance per watt or per dollar than
differences we had to address. The AI commu-
traditional x86 CPUs. The recent GPU surge
simulations to take the nity has different expectations than what we happened because they are very good for AI,
biggest share. were used to in the HPC world—for example, but in fact, the adoption of GPUs in scientific
computing started well before the founda-
in the way the system is accessed.
In traditional HPC, people tend to submit
— PEKKA MANNINEN batch jobs; they put a job in a queue, the tional models and other extreme computing
needs of AI.
machine executes it, and [they] go and check An interesting story worth mentioning
EE TIMES EUROPE: How do you decide the results. AI people, on the other hand, is about floating-point operations—IEEE
who gets to use Lumi, and what are want much more interactivity, and they would floating-point arithmetic. In the HPC world,
some of the applications that are run? like to have a big part of the machine for we are used to working with 64-bit arith-
Manninen: The EuroHPC JU grants 50% of themselves for several weeks. That’s a cultural metic, but the GPUs were actually never
the access time, and the Lumi consortium clash that takes quite a lot of expectation built for that. They originate from computer
countries grant the other 50%, in proportion management to make sure everybody benefits games, where you don’t really care if a pixel
to their contributions to the total cost of own- from the machine. is slightly off, and a pixel can be repre-
ership. Each of them has a slightly different sented by only 4 or 8 bits. To bring GPUs
role for Lumi in their national computing EE TIMES EUROPE: Will supercomputers to high-performance computing, it took
infrastructures. as they are currently designed be able 10 years for system vendors and the HPC
Scientists and companies use Lumi for tens to keep up with future demand? If not, community to get GPUs excelling in 64-bit
of different computing problems and related what needs to change? precision, too. But now, with AI, there is a
applications. Obviously, the main reason for Manninen: Clearly, the demand for com- big workload that doesn’t need the 64-bit
building large systems is to tackle computing pute is in a rapid increase, especially given precision, so there is a trend toward lower-
problems that are intractable with smaller the surge of AI in science and in commercial precision arithmetic, which puts pressure
systems. For instance, Lumi has run the applications. on traditional simulation software to work
most accurate climate simulations to date, Not everything lends itself to parallel with the lower-precision arithmetic.
comprising 30 years of coupled Earth system processing. But we hit the wall a long time One more thing on the topic of meeting
model scenarios at a 5-km global resolution. ago on how fast we can make one serial future demands: I mentioned the importance
Compare that with the previous state of the execution unit. Nobody even thinks about of algorithm software in serial processing,
art in climate modeling, which was at around 10-GHz processors anymore. In fact, we have and the same is true in distributed processing.
a global resolution of 100 km. We have seen had to gun down a lot on the clock frequency One of the biggest bottlenecks in supercom-
similar leaps in accuracy and fidelity in solar to build multiple execution units. This is most puting is in the application software.
magnetosphere modeling and plasma physics heavily manifested by today’s GPUs, which are The scientific community loves tested and
simulations run with Lumi. essentially big parallel processing units. proven legacy software. But the programming
Very often, the science communities There are many other factors that solutions that were developed 30 years ago
develop and maintain their own applications determine the speed of processing of a are often suboptimal on present-day hard-
and use the supercomputer as a platform-as- non-parallelizable workload. It’s not nec- ware. Code modernization and good software
a-service in a cloud manner. Lumi has around essarily things like the clock frequency; it engineering are needed to keep up with devel-
3,000 user accounts and a couple of hundred can be things like memory access speed, or opments in supercomputing. ■
