The rolling-ness also makes it difficult to reproduce benchmark results, and to pinpoint why something changed (did the benchmark result for some software...