undefined | Better HN

0 pointstempest_4y ago0 comments

Python async is co-operative multi-tasking (as opposed to per-emptive)

There is an event loop that goes through all the tasks and runs them.

The issue is the event loop can only move on to the next task when you reach an await. So if you run a lot of code (say an ML model) between awaits no other task can advance during this time.

This is why it is co-operative, it is up to a task to release the event loop, by hitting an await, so other tasks can get work done.

This is fine when you have async libs that often hit awaits at things that are IO related like say db, or http calls.

FastAPI will spawn controllers that are not defined as async functions on a thread pool but it is still a python so GIL and all that.

You should do as the sibling comment says and decouple your http from your ML and feed the ML with something like Celery. This way your server is always there to respond to things (even if just a 429) to hit a cache or whatever else.

0 comments

headlessvictim24y ago

Thanks for the FastAPI explanation. This makes sense.

Right now, we use ELB (Elastic Load Balancer) to sit in front of multiple GPU instances.

Is this sufficient or do you suggest adding Celery to this architecture?