> Benchmarking is hard and a lot of reports are bogus. However they are still very useful for a lot of developers.
In this case we were presented with a benchmark setup that failed to perform the task it supposedly benchmarked. That's not hard to avoid, and it makes the benchmark completely useless and misleading.
I don't think that this problem can be generalized in the sense you do here, since the problem with benchmarks usually isn't a complete failure to perform the task to be benchmarked, but things like finding a set of tests that give a fair representation of what you'd typically use the subjects for, or performing the tasks in idiomatic and optimal ways.
In this case we were presented with a benchmark setup that failed to perform the task it supposedly benchmarked. That's not hard to avoid, and it makes the benchmark completely useless and misleading.
I don't think that this problem can be generalized in the sense you do here, since the problem with benchmarks usually isn't a complete failure to perform the task to be benchmarked, but things like finding a set of tests that give a fair representation of what you'd typically use the subjects for, or performing the tasks in idiomatic and optimal ways.