Phoronix does not benchmark <insert game here> because it does not have a benchmark mode for linux. Phoronix does benchmark unigine, which is a modern game engine. So improvements in unigine's performance would mean improvements in Metro: last light, TF2, etc.

The older open source games use more basic graphical features, so improvements in them will also translate to performance improvements. Until the open source drivers catch up performance wise with these older games, there is plenty of value benchmarking them. There are plenty of indie games that are on linux or a coming that won't be much more complicated graphically than these open source engines, so knowing how well the hardware can handle them is important.
If you've done any benchmarking yourself you know this is complete bullshit. Performance improvements/regressions had in one title on an updated driver/kernel needn't translate into performance improvements/regressions on another title. This is a FACT.

Another fact is that people don't care too much about precise reproducibility. They care about some rough idea of how a game THEY ACTUALLY PLAY performs on hardware roughly comparable to what they have. Add a disclaimer to the benchmarks saying that they were not automated, though most probably wouldn't even care! But just benchmark games people care about!