Bugs will fall. Differential testing.

Code written by people has proven to contain flaws even after extensive testing. Deadlines, frequent context switches, and short Time-To-Market (TTM) are only some of the reasons why developers often leave bugs in their code. Finding the bugs or even vulnerabilities, however,
is not a trivial task. It requires skills, time, and a deep understanding of the program’s flow
and architecture. Is there anything we can do to spend less time and resources and find more bugs? One method to facilitate the bug search is called Differential Testing.

Differential testing was first introduced in 1998 [1]. The technique aims to overcome the ugliest problem in testing: evaluating the test results (also known as Oracle). Even if thousands of tests are written, one issue remains—quality. Are the tests useful? Do they actually test what we need to test? What if the tester made a mistake during testing? Writing good tests also requires a deep knowledge of the system. Differential testing addresses this problem. Thousands of tests may be written without considering the test results beforehand.
How is this possible, and how does it work? If a single test is fed to several comparable programs (e.g., several compilers), and one program gives a different result, a bug may have just been exposed. There’s no need to think of expected results. The other program (a second implementation) acts as an oracle for the first one, and vice versa. It is now feasible to generate millions of tests, where a few differences can result in a substantial stream of detected bugs.

C compilers, TLS, and Ethereum nodes all have several implementations. The latter has versions in C++, Rust, Golang, Java, and several other languages. If one node acts
differently for the same input (a transaction, in this case), there is likely a bug in at least one
of the implementations. The Ethereum environment is perfect for differential testing, at the very least, because the continuous flow of transactions is constant testing.

As an ex- Senior protocol engineer at Waves [2] (a proof-of-stake blockchain), I have personally seen the number of bugs we have found using this concept. Around 30% of the time when the node in Golang and the node in Scala acted differently for the same transactions, there was either a bug or a vulnerability.

Gulzar et al. [3] evaluated the technique by collecting information about its effectiveness from engineering teams at Google, which adopted differential testing across all major areas. Despite the great results of the technique, one major problem was the effectiveness of the input data. Thus, the tester who adopts this technique must think of certain inputs that would cover as many different paths in the program as possible. If the first step in differential testing was the random input generation (also known as “fuzzer”) (Figure 1),“smart” inputs are becoming the second step in advancing differential testing design to trigger vulnerable
code more likely.

Figure 1. The old differential testing approach

The tradeoff, however, is that if one doesn’t have another comparable program, they have to implement it. Apart from that, depending on the system’s requirements, not all output differences are bugs. Nevertheless, in large financial systems, this tradeoff is less significant compared to the potential disaster of the financial damage.

[1] William M McKeeman. 1998. Differential testing for software. Digital Technical
Journal 10, 1 (1998), 100–107.
[2] https://docs.waves.tech/en/blockchain/
[3] Muhammad Ali Gulzar, Yongkang Zhu, and Xiaofeng Han. 2019. Perception and
practices of differential testing. In 2019 IEEE/ACM 41st International Conference on
Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 71–80.

Figure 1. The old differential testing approach



The tradeoff, however, is that if one doesn’t have another comparable program, they have to implement it. Apart from that, depending on the system’s requirements, not all output differences are bugs. Nevertheless, in large financial systems, this tradeoff is less significant compared to the potential disaster of the financial damage.

[1] William M McKeeman. 1998. Differential testing for software. Digital Technical
Journal 10, 1 (1998), 100–107.
[2] https://docs.waves.tech/en/blockchain/
[3] Muhammad Ali Gulzar, Yongkang Zhu, and Xiaofeng Han. 2019. Perception and
practices of differential testing. In 2019 IEEE/ACM 41st International Conference on
Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 71–80.

Alexandr Dolgavin

Security auditor
LET'S GO!