Page 17 - EE Times Europe Magazine | February 2020
P. 17
EE|Times EUROPE 15
eC n t s ea ard to cceed it otic ard are
operation into a matrix multiplication, includ-
ing convolutions and fully connected nets,” A Neural-Network Processing
said LeCun. “[It] is a challenge for the hardware
community to create architectures that don’t Timeline
lose performance by using batch size = 1. That
applies to training, of course; the optimal size Late 1980s: Resistor arrays are used to do matrix multiplication. By the late
of batch for training is 1. We use more because s the arrays have gained ampli ers and converters around them ut are
our hardware forces us to do so.” still uite primitive y today s standards. he limitation is how ast data can e
ed into the chip.
SELF-SUPERVISED LEARNING 1991: he rst chip designed or convolutional neural networks s is uilt.
Another challenge for hardware is that the he chip is capa le o giga operations per second on inary data
learning paradigms we currently use will with digital shi t registers that minimi e the amount o e ternal tra c needed
change, and this will happen imminently, to per orm a convolution there y speeding up operation. he chip does not see
according to LeCun. use eyond academia.
“There is a lot of work [being done] on
trying to get machines to learn more like 1992: A A an analog neural network A chip de uts. esigned or s with
humans and animals, and humans and animals it weights and it activations A A contains transistors in . m
don’t learn by supervised learning or even by M . t is used or optical character recognition o handwritten te t.
reinforcement learning,” he said. “They learn 1996: A A a digital version o A A is released. ut with neural networks
by something I call self-supervised learning, alling out o avor y the mid s A A is eventually repurposed or signal
which is mostly by observation.” processing in cellphone towers.
LeCun described a common approach to 2009–2010: esearchers demonstrate a hardware neural network accelerator
self-supervised learning in which a piece on an A the ilin irte . t runs a demo or semantic segmentation or
of the sample is masked and the system is automated driving and is capa le o at a out . . he team rom
trained to predict the content of the masked urdue niversity tries to make an A ased on this work ut the pro ect
piece based on the part of the sample that’s proves unsuccess ul.
available. This is commonly used with images,
wherein part of the image is removed, and ource ann e un ace ook
text, with one or more words blanked out.
Work so far has shown that it is particularly
effective for NLP; the type of networks used, abundant, we can train very large networks in consumption, and he questioned whether this
transformers, have a training phase that uses terms of data. Hardware requirements for the will always be possible.
self-supervised learning. final system will be much, much bigger than LeCun described himself as “skeptical” of
The trouble from a hardware perspective they currently are. The hardware race will not futuristic new approaches such as spiking
is that transformer networks for NLP can be stop any time soon.” neural networks and neuromorphic comput-
enormous: The biggest ones today have ing in general. There is a need to prove that
5 billion parameters and are growing fast, HARDWARE TRENDS the algorithms work before building chips for
said LeCun. The networks are so big that they New hardware ideas that use techniques such them, he said.
don’t fit into GPU memories and have to be as analog computing, spintronics, and optical “Driving the design of such systems
broken into pieces. systems are on LeCun’s radar. He cited com- through hardware, hoping that someone will
“Self-supervised learning is the future — munication difficulties — problems converting come up with an algorithm that will use this
there is no question [about that],” he said. signals between novel hardware and the rest hardware, is probably not a good idea,” LeCun
“But this is a challenge for the hardware com- of the required computing infrastructure — as said. ■
munity because the memory requirements are a big drawback. Analog implementations, he
absolutely gigantic. Because these systems said, rely on making activations extremely Sally Ward-Foxton is a staff correspondent
are trained with unlabeled data, which is sparse in order to gain advantages in energy at AspenCore.
www.eetimes.eu | FEBRUARY 2020

