Google Developing "SiliFuzz" For Fuzzing CPUs To Uncover Electrical Defects
With OSS-Fuzz for continuous fuzzing of open-source projects and along with working on the various sanitizers for compilers, Google has been doing a lot for proactively uncovering software defects in key open-source projects. Now though a group of their engineers have been working on SiliFuzz for software aiming to discover new CPU defects.
The way SiliFuzz works is by targeting software proxies such as CPU simulators and disassemblers. Once targeting those simulators/disassemblers, the accumulated test input is run on various CPUs at "a large scale" to try to uncover defects. The focus at this point is on finding electrical defects -- as opposed to logic bugs in the design -- in processor cores across Google's massive server fleet.
With a whitepaper published yesterday, the SiliFuzz focus appears to be on x86_64 CPUs "where we do not have the RTL design" and with a particular focus on electrical defects either from the start or physical wear-out of particular chips that could lead to silent data corruption.
This "fuzzing by proxy" approach with SiliFuzz is an entirely user-space based software solution. Using SiliFuzz the engineers were able to find multiple faulty machines across Google's massive production fleet of systems. With SiliFuzz they found about 45% of their discoveries are unique and were not previously identified by any other tool or automation available to Google.
Moving forward Google is going to be working on scaling SiliFuzz, speeding up the rate at which it can potentially find electrical defects, further enhancing the automation, and improving the quality of the work.
The SiliFuzz whitepaper concludes with, "We have detected a large number of defects, analyzed four of them in detail, and analyzed common patterns among the others. We expect this and similar technologies to be in widespread use in the coming years since CPU defects are here to stay."
The way SiliFuzz works is by targeting software proxies such as CPU simulators and disassemblers. Once targeting those simulators/disassemblers, the accumulated test input is run on various CPUs at "a large scale" to try to uncover defects. The focus at this point is on finding electrical defects -- as opposed to logic bugs in the design -- in processor cores across Google's massive server fleet.
With a whitepaper published yesterday, the SiliFuzz focus appears to be on x86_64 CPUs "where we do not have the RTL design" and with a particular focus on electrical defects either from the start or physical wear-out of particular chips that could lead to silent data corruption.
This "fuzzing by proxy" approach with SiliFuzz is an entirely user-space based software solution. Using SiliFuzz the engineers were able to find multiple faulty machines across Google's massive production fleet of systems. With SiliFuzz they found about 45% of their discoveries are unique and were not previously identified by any other tool or automation available to Google.
Moving forward Google is going to be working on scaling SiliFuzz, speeding up the rate at which it can potentially find electrical defects, further enhancing the automation, and improving the quality of the work.
The SiliFuzz whitepaper concludes with, "We have detected a large number of defects, analyzed four of them in detail, and analyzed common patterns among the others. We expect this and similar technologies to be in widespread use in the coming years since CPU defects are here to stay."
17 Comments