Page 45 - EE Times Europe Magazine – June 2024
P. 45

EE|Times EUROPE   45

        Lumi: CSC’s Manninen on Managing Europe’s

        Biggest Supercomputer—and AI’s Expectations

        By Pat Brans

                 oused in the Advanced Computing Facility at CSC’s IT Center for    we named it Lumi, it was retrofitted with a
                 Science in Finland, Lumi has been in operation since 2021 and   slightly clumsy “backronym”: Large Unified
                 reached its full capacity in 2023. According to the most recent edi-  Modern Infrastructure.
                                                                                  Lumi is housed in what used to be a paper
       Htion of the Top500 list, it’s now the fifth-largest supercomputer in    mill, which was owned by UPM, one of the
        the world and the biggest in Europe.                                    biggest pulp and paper producers in the
                                                                                world. UPM decided to close the plant in 2008
          To find out more about the creation of Europe’s fastest supercomputer, its   because of a worldwide overproduction of
        management and the applications it runs, we spoke to Pekka Manninen, who,   paper. Because the paper mill required over
        as director of science and technology at the Advanced Computer Facility, is   230 MW of energy, the power capacity we
                                                                                needed was already there when we moved in.
        responsible for the underlying technology.                                By the way, the local district heating plant
                                                                                is in the same area, so we push excess heat to
        EE TIMES EUROPE: What is the story   countries that supported the idea and agreed   that system. The result is that Lumi heats 15%
        behind Lumi, and how did it get its name?  to contribute funding. [These were Finland,   to 20% of Kajaani’s homes—and during certain
        Pekka Manninen: Obviously, many people   Denmark, Estonia, Sweden, Norway, Belgium,   times, such as when heating plant mainte-
        were involved in the decision-making and   Czech Republic, Switzerland and Poland.] In   nance breaks, we can heat the whole city.
        preparation in many countries and at many   Finland, the key contributors were the    CSC Finland operates the facility and takes
        levels. But if I had to name a single per-  Ministry of Education and Culture and the   care of system administration. Keeping the
        son, it would be CSC’s managing director,   Ministry of Economic Affairs and    data center open requires only around
        Kimmo Koski, who has relentlessly advo-  Employment, which both helped with fund-  10 people. Then there are a lot of other people
        cated for pan-European investments in HPC   ing. Thanks to the efforts of all these groups,   doing other work to ensure the usefulness
        [high-performance computing], starting well   the EuroHPC JU granted the Hosting Agree-  and productivity of the supercomputer. Our
        before the European Commission introduced   ment to the Lumi consortium. Two other   consortium countries run user support, and
        plans for a European HPC infrastructure in   countries [Iceland and the Netherlands] have   special interest and collaboration groups
        the form of the Europe HPC Joint    since joined the consortium, which pays half   address things like public relations, AI and
        Undertaking [EuroHPC JU].           of the total cost of over €202 million. The EU   cybersecurity.
          Once EuroHPC JU was launched, Kimmo   contributes the other half.
        harnessed the interest of our organization   As for the name Lumi, I was actually the   EE TIMES EUROPE: What are the main
        and of our partners. We all got behind the   one who came up with that. It means “snow”   architectural features of Lumi, and how
        idea of proposing CSC’s data center in the city   in Finnish. I thought a white supercomputer   does it compare with other world-class
        of Kajaani, which was already the home of   would look cool, and then figured that snow   supercomputers?
        Finnish supercomputers.             would be a strong reference to northern   Manninen: Lumi consists of around
          Back then, there was a consortium of nine   Europe and white at the same time. After   3,000 GPU nodes, each with four GPUs, and
                                                                                around 2,000 CPU nodes, with two CPUs each.
                                                                                Its sustained compute capability toward a
                                                                                single binary is around 380 petaFLOPS at
                                                                                fp64 precision. Due to ambitious rack engi-
                                                                                neering, it’s possible to condense this and all
                                                                                the necessary storage into a rather compact
                                                                                space, which in terms of square meters is
                                                                                around the size of two tennis courts.
                                                                                  Lumi uses all-AMD node technology, with
                                                                                AMD MI250X GPUs and AMD Milan 64-core
                                                                                CPUs. Both computing partitions and all
                                                                                the auxiliary resources—like data analyt-
                                                                                ics partitions plus all the storage and data
                                                                                management solutions—are tightly intercon-
                                                                                nected in the same HPE Slingshot network.
                                                                                Slingshot is based on high-radix switches,
                                                                                which enable exascale and hyperscale data
                                                                                center networks with at most three switch-
                                                                                to-switch hops. It uses an optimized Ethernet
                                                                                protocol, which allows it to interoperate with
                                                                                standard Ethernet devices while providing
        Lumi supercomputer (Source: Fade Creative)                              high performance to HPC applications.

                                                                                | JUNE 2024
   40   41   42   43   44   45   46   47   48   49   50