A unified efficient open-source LLM deployment engine for both cloud server and local use cases.
It comes with full OpenAI-compatible API that runs directly with Python, iOS, Android, browsers. Supporting deploying latest large language models such as Qwen2, Phi3, and more.