Credibility of benchmarking within virtual machines?
How credible are the benchmarks carried out within a virtual machine, as opposed to real hardware?
Let's dissect a specific situation. Assume we want to benchmark the performance impact of a recent code change. Assume for simplicity that the workload is fully CPU bound (though IO bound and mixed workloads are also of interest). Assume that the machine is running under VirtualBox because it's the best one ;)
Assume that we measured the original code and the new code, and the new code was 5% faster (when benchmarked in virtual machine). Can we safely claim that it will be at least 5% faster on real hardware too?
And even more important part, assume that the new code is 3% slower. Can we be completely sure that on real hardware it will be 3% or less slower, but definitely not worse than 3%?
UPDATE: what I'm most interested in is your battlefield results. Ie. can you witness a case when code that was 10% slower in VM performed 5% faster on real iron, or vice versa? Or was it always consistent (ie. if it's faster/slower in VM, it's always proportionally faster/slower on real machine)? Mine are more or less consistent so far; at the very least, always going in the same direction.
If you are comparing results on a VM to results not run on a VM, then no, the results are not credible.
On the other hand, if both tests were run in the same environment, they yes, the results are credible. Both tests will be slower inside a VM, but the difference should still be credible.
All things considered, using Fair Witness principals, all you can assert is how well the application performs in a VM, because that is what you are actually measuring.
Now, if you wish to try and extrapolate what you observe based on the environment, then, assuming you're running a native VM (vs an emulated one, PPC on x86 for example), a CPU bound task is a CPU bound task even in a VM because the CPU is doing most of the heavy lifting.
Arguably there may be some memory management issues involved that can distinguish between a VM and a native application, but once the memory is properly mapped, I can't think there would be dramatic differences in CPU bound run times between a VM and a native machine.
So, I think it is fair to intuit that performance change from one instance of the application to another when run on a VM would have a similar performance change, particularly with a CPU heavy application, when run on a native machine.
However, I don't think you can fairly say that "you know" unless you actually test it your self on the correct environment.
I don't think there is anything that special about a VM for this. Even on a 'real' machine, you are still running with virtual memory and sharing the CPU(s) with other processes, so similar considerations apply.
The ONLY way to get credible performance results between a testing and production environment is to run IDENTICAL hardware and software. Right down to hardware version and software patch levels.
Otherwise you are pretty much wasting your time.
As an example, some memory sticks perform better than others which could easily account for a 5% throughput difference on otherwise identical boxes.
With regards to software the VM software will ALWAYS have an impact; and certain operations may be impacted more than others depending on so many different factors that there is no possible way to compare them.