I missed that interpretation :(
IMHO a fat binary written from scratch would have been a way worse choice than to use a standard stack, both in terms of bugs and time, let alone Open Source contributions or any scalability.
In terms of number of services, what do you get rid of that produce a better result? maybe RMQ and use a worse queue?, celery and write your own task manager or use another dependency?