2. They wrote several blog posts explaining what happened and what is going to happen now (fixing) and in the future (more fixing)
3. They fixed their documentation
4. They helped a third party service to adapt their offering to better help their customers (NewRelic)
5. They offered their advice for better solutions for affected customers (Unicorn)
This sounds a lot like fixing to me.
And from what they did until know, this probably won't be the end of it. So why not just talk to them directly and see if it's enough for you - and if not just go somewhere else?
In no way have they solved the actual issue (a poor queuing strategy). And so even if you now know that you're getting awful performance due to queuing and you even try to get a multi-threaded strategy going per their suggestion, you will see the exact same issue at scale. That is not a fix.
Their stance on actually implementing a strategy that removes the root issue has been one of silence. Suggesting that "this probably won't be the end of it" isn't useful if you're running a business that relies on Heroku. If that isn't the end of it, then they should be far more communicative about the steps they're taking. Given their blog posts, we have no evidence that further solutions to this problem are being worked on or that they even acknowledge it's something they should fix.
So no, I do not agree with you that that is a lot of "fixing".
Actually more threads of execution does solve the problem. The difference with just doubling the number of dynos is that on a single dyno requests can be routed intelligently. The reason why random routing sucks is that request processing times have a fat tailed distribution: there is a small but still significant chance that a request takes really long. If you have that request routed to a random single threaded dyno, then all further requests routed to that dyno have to wait very long before they can be processed. If however you had multiple threads of execution on the dyno, the other requests would simply go to the other thread of execution. So now there would only be blocking if a single dyno gets N really long requests at roughly the same time, where N is the number of concurrent threads the dyno is running. The probability of getting N expensive requests to the same dyno at approximately the same time decreases very fast with increasing N.
Hand waving ahead! Lets say the probability of an expensive request blocking a dyno is p = 2%. Then if you double the number of dynos the probability of blocking a dyno is now p/2 = 1%. If however you have two execution threads on each dyno, the probability of blocking a dyno is now p^2 = 0.01%. If you have 10 execution threads it is p^10 which is very small indeed.
Here is a paper about it which makes that intuition precise and shows that even N=2 is a massive improvement over N=1: http://www.eecs.harvard.edu/~michaelm/postscripts/handbook20...
The problem is that this only works if each concurrent process of your application doesn't use too much memory, since the available memory on one dyno is quite low. For many applications you can't easily have multiple threads of execution on one dyno. The real solution is to have some form of intelligent routing. As the hand waving and the paper above shows, you can make groups of dynos, and then the main router routes to a random group, and within each group requests are routed intelligently. You can take the size of a group to be a small constant, say 10 dynos. So there shouldn't be any scalability problems with this routing approach. If you take the group size small enough, you could even run each group of dynos on a single physical machine, which would make intelligent routing among them even simpler.
I never expected them to completely rebuild their service because of some customers (very small minority, I assume) aren't totally happy and satisfied with their product. That clearly sucks for the affected people.
It's reason for them to leave the product and platform and go somewhere else, where the problem is not an integral part of the produt. But it's not a reason to be a dick.
Unless I misunderstand the situation, NewRelic's heroku reporting isn't some one sided third party service but rather something that at least seems to be jointly produced by Heroku and NewRelic.
NewRelic can't report something that isn't offered up and it would seem to me that Heroku needs to deliberately expose metrics to the NewRelic plugin for it to be able to pick them up.
As it seems to be that these queue times weren't reported anywhere developer accessible it also stands to reason that they weren't exposed to NewRelic.
So no heroku didn't fix some third party service, they fixed their own service (in this regard).
So yeah, probably Heroku fixed their part and made sure NewRelic reflected that.