Page 53 - EE Times Europe November 2021 final
P. 53

EE|Times EUROPE   53

                                                 Benchmarking Neuromorphic Computing: Devil Is in the Details


           progress requires skills and knowledge totally different from what’s   SpiNNaker 2) and so fared poorly in terms of power efficiency. Other
           needed to evaluate an adult. Children and immature technologies both   low-power neuromorphic chips, such as Intel’s Loihi, were not tested.
           progress counterintuitively and sometimes even appear to regress   Given that SpiNNaker is part of the Human Brain Project, in which
           (losing baby teeth or entering adolescence).”         FZI is a participant, it’s not surprising that the researchers used what
             Neuromorphic engineering is at a more advanced stage than   was available. Indeed, these might well have been the right compari-
           quantum computing; practical systems exist, albeit mostly on a small   sons for their specific purposes. Whether the results really represent a
           scale. But Blume-Kohout’s point remains valid for an adolescent   useful benchmarking exercise is less clear.
           technology. Just as over-testing children at school can make them   Finally, a project  at the University of Dresden, in collaboration with
                                                                              7
           proficient at passing tests but poor at independent study, using the   the creators of Nengo and SpiNNaker, was much less ambitious in its
           wrong benchmarks at this formative stage can skew the development of   goals: comparing SpiNNaker 2 with Loihi for keyword spotting and
           neuromorphic engineering in the wrong direction.      adaptive control tasks. (Spoiler alert: SpiNNaker was more energy-
                                                                 efficient for the former and Loihi for the latter.) Comparing just two
           LEVELING THE PLAYING FIELD                            systems may seem to make this a less important benchmarking study
           The report also points to another, much earlier, paper  that also warned   (though it fulfilled some other important goals). But it may also have
                                                 4
           of the dangers of bad benchmarking. Grappling with the best ways to   been the only way the researchers could generate a fair and useful
           evaluate computers in the burgeoning digital computer industry of the   comparison. That demonstrates the difficulty well.
           1980s, Jack Dongarra of Argonne National Laboratory and his co-
           authors write: “The value of a computer depends on the context in   THE PLAY’S THE THING
           which it is used, and that context varies by application, by workload,   In a 2018 commentary  on neuromorphic benchmarking,
                                                                                 8
           and in terms of time. An evaluation that is valid for one site may not be   Mike Davies, head of Intel’s Loihi project, suggests a suite of tasks
           good for another, and an evaluation that is valid at one time may not   and metrics that could be used to measure performance. These include
           hold true just a short time later.”                   everything from keyword spotting to classification of the Modified
             Then there’s this warning from the same paper: “Although bench-  National Institute of Standards and Technology database digits, playing
           marks are essential in performance evaluation, simple-minded   Sudoku, gesture recognition, and moving a robotic arm.
           application of them can produce misleading results. In fact, bad bench-  Perhaps Davies’ most compelling suggestion, however, is that we
           marking can be worse than no benchmarking at all.”    pursue the grander kind of challenge that we know from robotics and
             In exploring why comparing “like with like” is often so hard, we’ve   AI: creating contests in which machines can compete directly against
           seen that, in practice, researchers tend to choose a benchmark metric   each other (RoboCup soccer) or even against humans (chess or Go).
           that suits their particular technology, then treat the result as the only   Even foosball has emerged as a potential interim challenge but seems
           figure of merit that matters. Of course, in the absence of any alterna-  unlikely, in the long run, to present sufficient complexity to demon-
           tive, it’s hard to criticize that approach.           strate any advantages offered by neuromorphic engineering.
             There is another option, however, and it has become an increasing   Among the advantages of competitions is that, rather than standard-
           trend over the past few years: Enlist evaluators who are not directly   ize in arbitrary ways, individual research groups can use their creativity
           involved in the technology development itself. Three papers published   to forge the best system, optimized for their hardware, encoding
           this year describe efforts to do just that. Although they have a lot to com-  method, learning rules, network architecture, and neuron/synapse type.
           mend them, they also illustrate just how difficult it is to get this right.  Where flexibility in the rules is needed, accommodations can be made
                                                                 or rejected in consultation with other players — who may themselves
           APPLES AND ORANGES                                    require restrictions to be lifted or relaxed.
           In a paper  issued by Oak Ridge National Laboratory, the authors   Done well, that approach could provide a more creative and
                  5
           selected different machine-learning tasks that neuromorphic simula-  higher-level playing field that could help push the discipline forward. ■
           tors should be able to run. They then measured performance as well
           as how much power the tasks consumed. The chosen tasks were varied   REFERENCES
           and therefore should have provided a well-rounded view of the sys-  1 bit.ly/3Fk8kU2
           tems. Tested were NEST, Brian, Nengo, and BindsNET, all of which are   2 bit.ly/2ZzMUBX
           used to design and simulate different kinds of networks. They were run   3 Blume-Kohout, R., and Young, K. Metrics and Benchmarks for Quantum
           on a PC and accelerated using various methods, including GPUs (which   Processors: State of Play. 2018. bit.ly/3D0hRxT
           one of the platforms supported) but not boards with neuromorphic   4 Dongarra, J., Martin, J. L., and Worlton, J. Computer Benchmarking: Paths and
           hardware (which some of the others could have used). For practical   Pitfalls. IEEE Spectrum 24, 38–43. 1987. bit.ly/3a5BFDg
           reasons, runtime was limited to 15 minutes.           5 Kulkarni, S. R., Parsa, M., Mitchell, J.P., and Schuman, C.D. Benchmarking the
             According to co-author Catherine Schuman, the hardware choice   performance of neuromorphic and spiking neural network simulators.
           reflected the investigators’ desire to ensure the study was relevant to   Neurocomputing (Amsterdam) 447, 145–160. 2021. bit.ly/3mucYGq
           those without advanced equipment. That’s a reasonable goal, even if   6 Steffen, L., et al. Benchmarking Highly Parallel Hardware for Spiking Neural
           optimizing neuromorphic simulators on classical hardware could be   Networks in Robotics. Frontiers in Neuroscience 15, 1–17. 2021. bit.ly/3Ae8tEQ
           seen as a bit of a contradiction. Completing the study in weeks rather   7 Yan, Y., et al. Comparing Loihi with a SpiNNaker 2 Prototype on Low-Latency
           than months (hence, the runtime limit) also seems like an obvious   Keyword Spotting and Adaptive Robotic Control. Neuromorphic Computing and
           decision. However, the result was that only two-fifths of the machines   Engineering. 2021. doi:10.1088/2634-4386/abf150. bit.ly/3a7xfMi
           completed some of the tasks, leaving big gaps in the data.  8 Davies, M. Benchmarks for progress in neuromorphic computing. Nature
             An experiment  on robotic path planning from FZI Research Center   Machine Intelligence 1, 386–388. 2019. go.nature.com/3msgSzJ
                       6
           for Information Technology in Karlsruhe, Germany, confronted a
           different problem. The SpiNNaker system from the University of   Sunny Bains teaches at University College London and is the author of
           Manchester was chosen as a representative neuromorphic technology,   “Explaining the Future: How to Research, Analyze, and Report on
           then compared with a system using Nvidia’s Jetson boards, designed to   Emerging Technologies.” She writes the Brains and Machines blog for
           accelerate machine learning. SpiNNaker was originally designed more   EE Times. This article was originally published in two parts on EE Times
           as a simulator than as actual neuromorphic hardware (in contrast to   and may be viewed at bit.ly/3lmkrIi and bit.ly/3Fu9xYZ.

                                                                                     www.eetimes.eu | NOVEMBER 2021
   48   49   50   51   52   53   54   55   56   57   58