Page 13 - EETEurope FlipBook February

P. 13

EE|Times EUROPE 13
Performance-Regression Pitfalls Every Project Should Avoid

We devised a solution by creating an extension to the open-source not account for the incremental performance improvements over the
Phoronix Test Suite called Phoronix Test Extensions. These are clearly year of development.
enumerated and identified Phoronix-compatible tests that never Manually adjusting performance baseline criteria would be costly
change, can easily be communicated, can be stored in a database, and and error-prone. Our in-house system automatically adjusts baselines
present output in a standardized format for easy and uniform process- based on every result collected. The more test results it collects, the
ing. This type of approach streamlines the process and dramatically smarter the system becomes.
improves the quality and reliability of results.
For example, the above FIO command line might be packaged in Remember, the test process can affect system performance.
a Phoronix-compatible test, called ptx-io-fio-randread-4k-libaio- It’s an unfortunate reality of performance testing that the measure-
iod256-000001, that gets codified in a source code repository from ment process itself can affect the results. Capturing system data such
which it can be referenced and run. Because the test is fully compat- as clock frequencies, active processes, and CPU utilization can eat up
ible with the Phoronix test runner, it can be run anywhere Phoronix system resources, reducing workload performance in some (but not all)
runs, making it extremely portable and flexible. It also outputs a configurations. For the unwary, this can lead to time wasted chasing
standard composite.xml results format, as defined in the Phoronix phantom regressions.
test runner — making the results of any test in the library uniform The solution is to abstract the hardware monitoring process from the
and parsable. performance measurement process. For example, you could do four test
runs for each configuration. Use the first three datasets in the
performance-regression analysis. The
fourth run would measure hardware
behavior. The results of the fourth run
would be used strictly to provide system
measurement information and would not
be used in the regression analysis.

Develop effective, standardized
reporting.
The best test infrastructure is useless if the
results are not presented in an actionable
way. Poor data science practices can easily
misrepresent performance or obscure
patterns. Data plots with inconsistent
and non-zero scales can prevent easy
comparison. Showing single run changes
without also showing variance can also be
problematic or misleading. Some tests are
hyper-consistent — a delta of 1% is huge.
For others, ±2% would be a normal intra-
run deviation. Data presentation must make
those differences easy to detect in context.
The sheer volume of data produced by
continuous performance-regression test-
ing demands an easy format for visualizing
Figure 4: Plots of measured versus expected performance demonstrate how a significant results. We suggest a standardized
regression (15%, bottom) that remains above a baseline could be hidden. performance-regression report that
everyone consumes. This centralizes data
science best practices and creates a consis-
Don’t miss mild/moderate performance changes. tent visual language that everyone can become familiar with.
Another trap that can be overlooked when dealing with continuous Data that isn’t actionable isn’t worth looking at.
performance-regression activities is the reality that people are often
working on performance improvements. This is especially true in sili- CONCLUSION
con development, where performance is one of the highest priorities. Continuous performance-regression testing is well-known among
This means that a baseline for performance regression needs to shift as software developers, especially those in the web development domains.
work is done in the software stack or ecosystem. It can also be a powerful tool for hardware or lower-level software
Imagine that you collect a load of data for a workload and have a projects. Most of the modern development practices that software
high level of confidence in a baseline result. Over the course of the developers have embraced as mainstays are not widely practiced in
year, your teams push performance gradually higher. This is objectively hardware development.
good news, but the potential pitfall is that it creates a gap that can hide Test results are only as good as the planning, procedures, and exe-
regressions that remain above the baseline (see Figure 4). cution of the tests themselves. Applying the techniques I’ve described
The red line in the figure signifies the performance-regression base- will enable you to remain alert to potential pitfalls. ■
line (failure criteria) set at the beginning of the project. The top chart
shows a significant incremental performance improvement throughout
the first year of development (into December). The bottom chart shows Travis Lazar is a senior staff engineer and team lead for Software
a large performance regression in the new January. The baseline cri- Release, Continuous Performance Regression, DevOps, and Strategic
teria will not flag this as a regression, however, because the criteria do Software Projects at Ampere Computing.

www.eetimes.eu | FEBRUARY 2021

8 9 10 11 12 13 14 15 16 17 18