> The problem with NFS is that all of those queues are sharing the filesystem, making the filesystem a point of failure and scaling pain.
It is not NFS that is a SPOF, it is a single NFS server that is a SPOF. There exists distributed NFS systems (OneFS, Panasas) that can tolerate the loss of up to N servers before the service gets disrupted.
I suspect distributed NFS won't help here when the problem is that a server gets slow / overloaded. In particular, this setup was actually I/O bound and there wasn't more I/O to be had.
> In particular, this setup was actually I/O bound and there wasn't more I/O to be had.
No more I/O to be had from that particular NFS server. If your mounts are distributed over 4+ servers, then you have potentially 4x the available operations available.