How We Are Using Btrfs To Find Regressions Incredibly Fast
In previous articles I have hinted that at Phoronix we are working to take advantage of the Btrfs file-system within the Phoronix Test Suite and Phoromatic to provide an interesting feature that will further expand our automated testing capabilities, but how does this file-system come into play? Well, here is what's being worked on and it should be of terrific value to many people.
One of the features of Btrfs not found in other Linux file-systems (like EXT4) is support for copy-on-write snapshots / sub-volumes. With this Btrfs snapshotting support is the ability to mount different snapshots in a very easy manner and since they are copy-on-write, the disk storage requirements are not extreme. The Btrfs snapshot ability is being taken advantage of by Red Hat to provide Linux system rollback support whereby a Btrfs snapshot is created before a yum/RPM transaction takes place and if the installed package(s) wreak havoc on the system or something goes awry, you can simply choose an earlier snapshot to set your system's software back to an earlier state following a system reboot. We though are leveraging Btrfs snapshots in a different manner.
First off, for those not familiar with Phoromatic, it is a Phoronix Test Suite component that provides a remote management system and "allows the automatic scheduling of tests, remote installation of new tests, and the management of multiple test systems all through an intuitive, easy-to-use web interface. Tests can be scheduled to automatically run on a routine basis across multiple test systems. The test results are then available from this central, secure location." Part of Phoromatic is the Phoromatic Tracker, which uses Phoromatic to track the performance of a software component over the course of time in hopes of spotting any regressions and providing better analysis of how code changes are affecting the overall performance. Phoromatic / PTS / Phoromatic Tracker can also be used for functional and unit testing.
We have provided public implementations of Phoromatic Tracker for tracking the performance of the mainline Linux kernel and bleeding edge Ubuntu packages on a timed daily basis, but Phoromatic Tracker can also test software changes on a per-commit basis or when triggered by an external script. Our automated software testing stack has found performance regressions in the past with these trackers, but then we, the developers, or third-parties still need to go back and find the commit that caused the problem (for trackers running on a timed or non-per-commit basis), etc. Within the Phoronix Test Suite we already have a module for autonomously bisecting regressions by leveraging Git, which we have used already to locate regressions within the Linux kernel with next to no manual intervention, but by having Phoromatic/PTS play with Btrfs, we are working on something much greater.
Coming to Phoromatic will be the ability for the Phoronix Test Suite to automatically create a Btrfs snapshot on each of the test nodes whenever a test is to be run whether it be on a daily basis with some of the trackers or on a per-commit basis with some of the future public trackers and how Phoromatic is being used privately by some organizations. For example, with our Ubuntu Tracker this will leave us with Btrfs snapshots of the latest Ubuntu packages each day and for the Linux kernel this will leave us with the kernel from that respective day. Since the snapshots are copy-on-write, the disk requirements are not too big and thus quite a number of snapshots will be able to be kept around locally.
In other words, users will be able to access their tracked component(s) from any state in the past. Unlike using git-bisect and relying upon the package source and rebuilding any software and reinstalling it or even just reinstalling Deb/RPM packages, with this approach it's just a matter of remounting the Btrfs partition and in some cases a reboot. This makes it incredibly faster to do operations like automatically bisecting regressions in a binary search and more reliable as you are using the software in the same exact state that it was left previously when running tests on a given date/commit.
From the Phoromatic web-interface we will additionally be exposing a new set of options whereby when a regression is spotted in the tracked component, when there is "downtime" on a given test node (i.e. there are no scheduled tests at the moment) and a regression was recently spotted whether it be minutes or days ago, to be able to rollback the system temporarily to that spot to be able to run an expanded set of tests. For example, with the Linux kernel tracker we could be running just a few CPU tests as part of the scheduled test execution queue, but when a performance drop is spotted in one of the CPU tests, next time that test node is free it could rollback the system to the regressed spot and automatically run a greater number of CPU tests in hopes of being able to provide the developer with more information and numbers regarding the regression to better narrow down the scope of the problem or quantifying the repercussions of an intended code change. As another example, when a graphics regression is spotted, the Phoromatic-controlled systems could then go back and run more graphics tests or to run the same graphics test(s) with different resolutions and/or other test options. This would save on having to run a huge number of tests everyday or with every commit, but testing now becomes much smarter when a potential problem is detected.
But there's still more...