If you look at the graphs, they corroborate this. The "native" disk never really exceeds 500 to 600 MB/s, which is about as fast as my SSD goes. The hypervisors, however, are exceeding multiple GB/s. It must be RAM.
Also, re: "I'm not sure what was actually benchmarked" The method of benchmarking is covered in the bottom of the post. I realize it isn't extremely detailed. If you have any questions, I'd be happy to answer.
A typical thing that performance benchmarks do to negate guest OS caching is to process significantly more data as what the available RAM is set to. For example, if your guest OS RAM is set 512MB, process 10GB of random data. Of course then the question is how-to get random data as you don't want to end up testing the random generator ;) or your host OS caching.
Another way to make sure you test data committed to disk could be to include a "shutdown guest OS" part and measure total time until the guest has been fully shut down.
I know that at least VMware has the ability to turn off disk caching (in Fusion, select Settings, advanced "Hard Disk Buffering" <- disable). I am not aware of a similar feature in Virtual Box, it might exist though.
Even while you tested the same guest OS, we don't even know if the hard disk adapters where both using the same hard disk drivers. Performance differs between IDE/Sata/SCSI drivers. SCSI drivers have queue depths, IDE drivers have not.
Due to page cache usage it's hard to say what this benchmark is comparing. The I/O pattern seen by the actual disk, shared folder, or NFS may be different between benchmark runs. It all depends on amount of RAM available, state of cache, readahead, write-behind, etc.
Please rerun the benchmark with -I to get an apples-to-apples comparison.
...
Spoke too soon. A Google search shows there are some methods [1], but their use cases are different.
Just think of Linux kernel and Linux distributions.
Perhaps this support can be added to some later version of vagrant
So i suppose, the host system caches the reads. Also, how could it possibly be true that native writes are slower then virtual writes?
It is interesting that sometimes the native filesystem within the virtual machine outperforms the native filesystem on the host machine. This test uses raw read system calls with zero user-space buffering. It is very likely that the hypervisors do buffering for reads from their virtual machines, so they’re seeing better performance from not context switching to the native kernel as much. This theory is further supported by looking at the raw result data for fread benchmarks. In those tests, the native filesystem beats the virtual filesystems every time.
We ended up using the synchronization feature in PyCharm to continually rsync files from native FS into the VirtualBox instance. Huge perf improvement but a little more cumbersome for the developers. But so far it has been working good, PyCharm's sync feature does what it is supposed to.
Thought it might be of interest on HN as well.