We have a small site we're hosting on Linode. It's a single app box, and a single db box. Thanks to datadog I noticed some occasional (random) slow requests. Digging into it revealed it was slow responses from the (MySQL) database eventually working out that it correlated exactly with disk latency spikes.
I then ran `iostat` in a loop to catch the slowdowns when it happened, and I can see `w_await` times of up to 2seconds at times!!. After contacting support they moved a few of our noisiest neighbours away which has reduced the issue a lot.
My question is: Is this just normal in cloud/shared infrastructure? Would moving to AWS (or similar) help at all? Maybe I should just forget about a handful (10-30) slow requests per week which are impacted by this (we have maybe 800K requests served by the app server a week, so it's a tiny percentage). I just find it annoying that for some people they get 5-10second requests when it ought to be 200-300ms.
Any insights most welcome!