1Real-time LLM Inference on Standard GPUs: 3k tokens/s per request (opens in new tab)(blog.kog.ai)51NicoConstant10h ago34
2Kog AI – Building a Real-Time Inference Stack on AMD Instinct GPUs [video] (opens in new tab)(youtube.com)8NicoConstant14d ago0