The only real, real loads are what people use from day to day. The day to usage in most cases varies across a large set of loads. Mixed use benchmarking is almost impossible to do (can't make everyone happy). Having tests that serve as proxies for pathological cases is actually quite important for people who really need to take first steps in filesystem selection.
Under real loads, most filesystems perform reasonably well for the real loads that users use, that's a fact. The differences come in with data integrity and failures and the way the file systems respond to pathological scenarios. Put a different way, you characterize the dimensions of a filesystem and then you can make some assumptions of how it will react to your real-world load. My real-world load is vastly different from your real-world load.
For instance, the Apache benchmark gives you an indication of bandwidth loaded scenarios. The SQLITE test gives you an indication of the sync-bound scenarios. Your real-world scenario is going to be a blend of particular workloads.
Of course we are assuming that people who have a particular singularly focused scenario can test their scenario. But these scenarios are very broad and by their nature specific to individuals.