Page 46 - EE Times Europe Magazine – June 2024
P. 46

46 EE|Times EUROPE

        Lumi: CSC’s Manninen on Managing Europe’s Biggest Supercomputer—and AI’s Expectations


          Like all supercomputers, Lumi is unique   projects. There’s a very big spectrum of differ-  how different components are accessed—for
        and does not follow the blueprint of any other   ent applications that run on the system. But   example, file I/O. And bad code will slow
        system. But the design philosophy is similar   the distribution is very top heavy, in the sense   things down no matter how powerful the
        to what you see in the current Perlmutter   that a handful of applications consume 95%   computer is. It’s very often an algorithmic
        at NERSC [the National Energy Research   of the resources.              problem and then a software problem.
        Scientific Computing Center] in California or   About 50% of Lumi’s capacity now goes to   While there is a set of computational
        Leonardo at Cineca in Italy. It also has many   AI, which here refers to training deep neural   problems that won’t ever be run on anything
        things in common with Piz Daint at CSCS [the   networks for different purposes, especially   other than serial computing, it’s not a very
        Swiss National Computing Center], one of   for large language models but also for things   big set. For many problems, we will still need
        Lumi’s partners.                    like image recognition. This is clearly a higher   to scale workloads over tightly connected
          Lumi is more heterogeneous in terms of   share of the workload than we anticipated   nodes and even connected systems. I’m not
        node types and its storage solution than, for   back in 2019, when we were expecting simu-  sure if it’s necessary—or even possible—to
        instance, its big brother Frontier at Oak Ridge   lations to take the biggest share. Traditional   keep building larger supercomputers in terms
        National Laboratory in Tennessee. But it’s more   HPC would be running things like molecular   of node counts. One solution may be to take
        homogeneous than some other leadership-class   dynamics and CFD [computational fluid   several large supercomputers and have them
        systems, such as MareNostrum 5, at the    dynamics], so that’s what we thought would   interoperate in a federated fashion to speed
        Barcelona Supercomputing Center in Spain.  be the big use cases.        up suitable workflows.
                                              Fortunately, the system we built turned   AI is a good example of something that’s
        About 50% of Lumi’s                 out to be a very suitable system for AI. I think   very nicely parallelizable. It relies on dense
                                            AI will impact all fields of science, becom-
                                                                                linear algebra with many layers of paral-
        capacity now goes to AI.            ing complementary to simulations—or even   lelism we can exploit. The AI computing
        … This is clearly a higher          replacing them in some domains. This use   demand is something we can handle with
                                            case clearly needs a lot of computing power
                                                                                the multicore CPUs and GPUs—no problem
        share of the workload than          and ultra-fast data access, which can only be   there. And the reason GPUs are so well-
        we anticipated back in 2019,        delivered by large supercomputers.  suited is that they just get more FLOPS
                                              We didn’t need to make so many hardware
                                                                                for the same power budget—that is, better
        when we were expecting              adjustments, but there were some cultural   performance per watt or per dollar than
                                            differences we had to address. The AI commu-
                                                                                traditional x86 CPUs. The recent GPU surge
        simulations to take the             nity has different expectations than what we   happened because they are very good for AI,
        biggest share.                      were used to in the HPC world—for example,   but in fact, the adoption of GPUs in scientific
                                                                                computing started well before the founda-
                                            in the way the system is accessed.
                                              In traditional HPC, people tend to submit
               — PEKKA MANNINEN             batch jobs; they put a job in a queue, the   tional models and other extreme computing
                                                                                needs of AI.
                                            machine executes it, and [they] go and check   An interesting story worth mentioning
        EE TIMES EUROPE: How do you decide   the results. AI people, on the other hand,   is about floating-point operations—IEEE
        who gets to use Lumi, and what are   want much more interactivity, and they would   floating-point arithmetic. In the HPC world,
        some of the applications that are run?  like to have a big part of the machine for   we are used to working with 64-bit arith-
        Manninen: The EuroHPC JU grants 50% of   themselves for several weeks. That’s a cultural   metic, but the GPUs were actually never
        the access time, and the Lumi consortium   clash that takes quite a lot of expectation   built for that. They originate from computer
        countries grant the other 50%, in proportion   management to make sure everybody benefits   games, where you don’t really care if a pixel
        to their contributions to the total cost of own-  from the machine.     is slightly off, and a pixel can be repre-
        ership. Each of them has a slightly different                           sented by only 4 or 8 bits. To bring GPUs
        role for Lumi in their national computing   EE TIMES EUROPE: Will supercomputers   to high-performance computing, it took
        infrastructures.                    as they are currently designed be able   10 years for system vendors and the HPC
          Scientists and companies use Lumi for tens   to keep up with future demand? If not,   community to get GPUs excelling in 64-bit
        of different computing problems and related   what needs to change?     precision, too. But now, with AI, there is a
        applications. Obviously, the main reason for   Manninen: Clearly, the demand for com-  big workload that doesn’t need the 64-bit
        building large systems is to tackle computing   pute is in a rapid increase, especially given   precision, so there is a trend toward lower-
        problems that are intractable with smaller   the surge of AI in science and in commercial   precision arithmetic, which puts pressure
        systems. For instance, Lumi has run the   applications.                 on traditional simulation software to work
        most accurate climate simulations to date,   Not everything lends itself to parallel    with the lower-precision arithmetic.
        comprising 30 years of coupled Earth system   processing. But we hit the wall a long time   One more thing on the topic of meeting
        model scenarios at a 5-km global resolution.   ago on how fast we can make one serial   future demands: I mentioned the importance
        Compare that with the previous state of the   execution unit. Nobody even thinks about   of algorithm software in serial processing,
        art in climate modeling, which was at around   10-GHz processors anymore. In fact, we have   and the same is true in distributed processing.
        a global resolution of 100 km. We have seen   had to gun down a lot on the clock frequency   One of the biggest bottlenecks in supercom-
        similar leaps in accuracy and fidelity in solar   to build multiple execution units. This is most   puting is in the application software.
        magnetosphere modeling and plasma physics   heavily manifested by today’s GPUs, which are   The scientific community loves tested and
        simulations run with Lumi.          essentially big parallel processing units.  proven legacy software. But the programming
          Very often, the science communities   There are many other factors that   solutions that were developed 30 years ago
        develop and maintain their own applications   determine the speed of processing of a   are often suboptimal on present-day hard-
        and use the supercomputer as a platform-as-  non-parallelizable workload. It’s not nec-  ware. Code modernization and good software
        a-service in a cloud manner. Lumi has around   essarily things like the clock frequency; it   engineering are needed to keep up with devel-
        3,000 user accounts and a couple of hundred   can be things like memory access speed, or   opments in supercomputing. ■

        JUNE 2024 | www.eetimes.eu
   41   42   43   44   45   46   47   48   49   50   51