Page 30 - EE Times Europe Magazine – June 2024
P. 30
30 EE|Times EUROPE
AUTONOMOUS VEHICLES | ARTIFICIAL INTELLIGENCE
Overcoming Unbalanced Training Data for
Safer Autonomous Driving
By Pat Brans
ccording to a Sweden-based expert run out in front of a vehicle, for example. how to apply this concept to autonomous
on autonomous systems, self-driving “One strategy for dealing with underrepre- vehicles. We have made some first attempts,
cars won’t be available for at least sented data is to augment the recorded data where we demonstrated the feasibility, but
Aanother 10 years—and part of the with slight changes, such as geometric or there’s still a lot of research to be done.
holdup is data. chromatic variations, [made] Some people argue that large
“Many of the remaining challenges stand- directly in the training data,” language models and large
ing in the way of fully autonomous vehicles Felsberg said. “But I would go multimodal models will be
have to do with the quality of data used to so far as to say that it’s better a big help in manipulating
train the neural networks that control a to adjust the bias of the clas- internal representations, and
vehicle,” said Michael Felsberg, full professor sifiers that result from the we are certainly considering
and head of the Computer Vision Laboratory training than to apply fake that option.”
at Sweden’s Linköping University. To ensure data to the training process.”
that AVs react appropriately to real-world Some people have CORRECTING
road conditions and events, researchers are floated the idea of using UNBALANCED DATASETS
working on ways to fill the gaps in training generative AI to produce Aside from filling in the gaps
data and correct for biases in the datasets, he supplemental training data in training data, a remaining
told EE Times Europe. for scenarios that are not challenge lies in minimizing
Felsberg serves as a member of the close enough to the real data Wallenberg AI’s Michael biases—or, more accurately,
Wallenberg AI, Autonomous Systems and to be represented by slight Felsberg adjusting the biases in ways
Software Program (WASP) executive commit- manipulations. Felsberg that produce desired out-
tee, representing Linköping University. He also thinks this approach would be catastrophic, comes. “Some cases are much more common
collaborates with industrial players, including however, given the propensity of generative than others, and you would like to have
car and truck manufacturers and companies AI to create absurd representations of the real control over the bias induced by this effect,”
that produce support systems for vehicles. world. When unrealistic data is used to train Felsberg said. “That requires methods and
Recording the data needed for training an autonomous system, the resulting network tools for adjusting biases in a trained model.”
is expensive—and so is labeling it, largely becomes unpredictable. When training data is collected for pedestri-
because the task still requires human inter- A better method, according to Felsberg, ans, for example, children or wheelchair users
vention. According to Felsberg, to bring down would be to move toward explainable AI— might be insufficiently accounted for because
costs, the role of humans in the labeling internal representations that reflect real they are encountered less frequently than
process must be minimized through some scenarios by combining expert models (con- pedestrians in other categories. To ensure that
form of weakly supervised learning whereby structed from what human experts think are an AV responds appropriately to all
labels are assigned automatically or at least real-world scenarios) with machine-learned pedestrians—as any reasonable person would
semi-automatically. But while many academic models (constructed from data collected in expect it to do—the system has to overcome
researchers and industrial players have been the field). Instead of altering the data used to skews in the data that might otherwise lead to
experimenting with weakly supervised learn- train the model, a highly skilled technician skews in object recognition. “It’s impossible
ing, thus far, none of the methods are ready could analyze and possibly alter the internal to have fully balanced datasets for whatever
for widescale use. representation directly as part of a new step in you want to do,” Felsberg said. “But you can
The monetary cost of collecting and label- the training process. measure the unbalance and have your system
ing data is not the only issue. An even bigger “If you understand the scenarios and have adjust accordingly.”
obstacle to reliable self-driving cars has to the right tools to manipulate the model’s Felsberg has been asserting that self-
do with ensuring that the data used to train internal representations of external objects, driving cars are at least 10 years out ever since
the neural networks will get the vehicle to do you have a more powerful way of influenc- he began working on the technology in 2007,
the right things. Not only is it impossible for ing the outcome,” Felsberg said. “You could and it remains his prediction today. But when
AV manufacturers to collect enough data to place cars in slightly different positions [than the day does come for AVs to be sold or rented
cover all conceivable situations, but the data in] the real data or make them drive in a to the general public, manufacturers should be
they do collect is likely to include biases that, different direction than they did in the real required to demonstrate that their self-driving
if left uncorrected, can produce undesired data, to model situations that never occurred cars have overcome potential biases that result
behavior. in the dataset. Prototypes of this kind of from unbalanced data, he said.
system already exist, and we are starting to Felsberg proposes amending Euro NCAP—a
COMPENSATING FOR participate in that kind of research with our safety rating system to help consumers
UNDERREPRESENTED SCENARIOS industrial partner Zenseact.” select cars based on their reactions to a set
Many of the most dangerous driving situations Felsberg continued, “The combination of of real-life accident scenarios—to include
involve circumstances so rare that they are model-based knowledge and data-driven tests designed to do just that. “Those changes
unlikely to be fully represented in real-world knowledge is a hybrid learning approach that should be made now,” he said. ■
training data. No AV developer can expect to is very popular in many domains where pre-
record enough real-world images and videos to dictions are required—for example, making Pat Brans is a contributing writer for EE
cover all the different ways a pedestrian might climate predictions. But we don’t yet know Times Europe.
JUNE 2024 | www.eetimes.eu