Page 53 - EE Times Europe November 2021 final
P. 53
EE|Times EUROPE 53
Benchmarking Neuromorphic Computing: Devil Is in the Details
progress requires skills and knowledge totally different from what’s SpiNNaker 2) and so fared poorly in terms of power efficiency. Other
needed to evaluate an adult. Children and immature technologies both low-power neuromorphic chips, such as Intel’s Loihi, were not tested.
progress counterintuitively and sometimes even appear to regress Given that SpiNNaker is part of the Human Brain Project, in which
(losing baby teeth or entering adolescence).” FZI is a participant, it’s not surprising that the researchers used what
Neuromorphic engineering is at a more advanced stage than was available. Indeed, these might well have been the right compari-
quantum computing; practical systems exist, albeit mostly on a small sons for their specific purposes. Whether the results really represent a
scale. But Blume-Kohout’s point remains valid for an adolescent useful benchmarking exercise is less clear.
technology. Just as over-testing children at school can make them Finally, a project at the University of Dresden, in collaboration with
7
proficient at passing tests but poor at independent study, using the the creators of Nengo and SpiNNaker, was much less ambitious in its
wrong benchmarks at this formative stage can skew the development of goals: comparing SpiNNaker 2 with Loihi for keyword spotting and
neuromorphic engineering in the wrong direction. adaptive control tasks. (Spoiler alert: SpiNNaker was more energy-
efficient for the former and Loihi for the latter.) Comparing just two
LEVELING THE PLAYING FIELD systems may seem to make this a less important benchmarking study
The report also points to another, much earlier, paper that also warned (though it fulfilled some other important goals). But it may also have
4
of the dangers of bad benchmarking. Grappling with the best ways to been the only way the researchers could generate a fair and useful
evaluate computers in the burgeoning digital computer industry of the comparison. That demonstrates the difficulty well.
1980s, Jack Dongarra of Argonne National Laboratory and his co-
authors write: “The value of a computer depends on the context in THE PLAY’S THE THING
which it is used, and that context varies by application, by workload, In a 2018 commentary on neuromorphic benchmarking,
8
and in terms of time. An evaluation that is valid for one site may not be Mike Davies, head of Intel’s Loihi project, suggests a suite of tasks
good for another, and an evaluation that is valid at one time may not and metrics that could be used to measure performance. These include
hold true just a short time later.” everything from keyword spotting to classification of the Modified
Then there’s this warning from the same paper: “Although bench- National Institute of Standards and Technology database digits, playing
marks are essential in performance evaluation, simple-minded Sudoku, gesture recognition, and moving a robotic arm.
application of them can produce misleading results. In fact, bad bench- Perhaps Davies’ most compelling suggestion, however, is that we
marking can be worse than no benchmarking at all.” pursue the grander kind of challenge that we know from robotics and
In exploring why comparing “like with like” is often so hard, we’ve AI: creating contests in which machines can compete directly against
seen that, in practice, researchers tend to choose a benchmark metric each other (RoboCup soccer) or even against humans (chess or Go).
that suits their particular technology, then treat the result as the only Even foosball has emerged as a potential interim challenge but seems
figure of merit that matters. Of course, in the absence of any alterna- unlikely, in the long run, to present sufficient complexity to demon-
tive, it’s hard to criticize that approach. strate any advantages offered by neuromorphic engineering.
There is another option, however, and it has become an increasing Among the advantages of competitions is that, rather than standard-
trend over the past few years: Enlist evaluators who are not directly ize in arbitrary ways, individual research groups can use their creativity
involved in the technology development itself. Three papers published to forge the best system, optimized for their hardware, encoding
this year describe efforts to do just that. Although they have a lot to com- method, learning rules, network architecture, and neuron/synapse type.
mend them, they also illustrate just how difficult it is to get this right. Where flexibility in the rules is needed, accommodations can be made
or rejected in consultation with other players — who may themselves
APPLES AND ORANGES require restrictions to be lifted or relaxed.
In a paper issued by Oak Ridge National Laboratory, the authors Done well, that approach could provide a more creative and
5
selected different machine-learning tasks that neuromorphic simula- higher-level playing field that could help push the discipline forward. ■
tors should be able to run. They then measured performance as well
as how much power the tasks consumed. The chosen tasks were varied REFERENCES
and therefore should have provided a well-rounded view of the sys- 1 bit.ly/3Fk8kU2
tems. Tested were NEST, Brian, Nengo, and BindsNET, all of which are 2 bit.ly/2ZzMUBX
used to design and simulate different kinds of networks. They were run 3 Blume-Kohout, R., and Young, K. Metrics and Benchmarks for Quantum
on a PC and accelerated using various methods, including GPUs (which Processors: State of Play. 2018. bit.ly/3D0hRxT
one of the platforms supported) but not boards with neuromorphic 4 Dongarra, J., Martin, J. L., and Worlton, J. Computer Benchmarking: Paths and
hardware (which some of the others could have used). For practical Pitfalls. IEEE Spectrum 24, 38–43. 1987. bit.ly/3a5BFDg
reasons, runtime was limited to 15 minutes. 5 Kulkarni, S. R., Parsa, M., Mitchell, J.P., and Schuman, C.D. Benchmarking the
According to co-author Catherine Schuman, the hardware choice performance of neuromorphic and spiking neural network simulators.
reflected the investigators’ desire to ensure the study was relevant to Neurocomputing (Amsterdam) 447, 145–160. 2021. bit.ly/3mucYGq
those without advanced equipment. That’s a reasonable goal, even if 6 Steffen, L., et al. Benchmarking Highly Parallel Hardware for Spiking Neural
optimizing neuromorphic simulators on classical hardware could be Networks in Robotics. Frontiers in Neuroscience 15, 1–17. 2021. bit.ly/3Ae8tEQ
seen as a bit of a contradiction. Completing the study in weeks rather 7 Yan, Y., et al. Comparing Loihi with a SpiNNaker 2 Prototype on Low-Latency
than months (hence, the runtime limit) also seems like an obvious Keyword Spotting and Adaptive Robotic Control. Neuromorphic Computing and
decision. However, the result was that only two-fifths of the machines Engineering. 2021. doi:10.1088/2634-4386/abf150. bit.ly/3a7xfMi
completed some of the tasks, leaving big gaps in the data. 8 Davies, M. Benchmarks for progress in neuromorphic computing. Nature
An experiment on robotic path planning from FZI Research Center Machine Intelligence 1, 386–388. 2019. go.nature.com/3msgSzJ
6
for Information Technology in Karlsruhe, Germany, confronted a
different problem. The SpiNNaker system from the University of Sunny Bains teaches at University College London and is the author of
Manchester was chosen as a representative neuromorphic technology, “Explaining the Future: How to Research, Analyze, and Report on
then compared with a system using Nvidia’s Jetson boards, designed to Emerging Technologies.” She writes the Brains and Machines blog for
accelerate machine learning. SpiNNaker was originally designed more EE Times. This article was originally published in two parts on EE Times
as a simulator than as actual neuromorphic hardware (in contrast to and may be viewed at bit.ly/3lmkrIi and bit.ly/3Fu9xYZ.
www.eetimes.eu | NOVEMBER 2021

