3Better than DeepSeek R1? MiniMax-M1:open-weight hybrid-attention reasoning model (opens in new tab)(huggingface.co)6helloericsf9mo ago0
5DeepSeek Open Source Optimized Parallelism Strategies, 3 repos (opens in new tab)(github.com)103helloericsf1y ago8
6DeepSeek Open Source DeepGEMM – FP8 GEMM Library(300 lines for 1350+ FP8 TFLOPS) (opens in new tab)(twitter.com)4helloericsf1y ago1
7Alibaba Open Source Large-Scale Video Generative Models: Wan2.1 (opens in new tab)(twitter.com)8helloericsf1y ago2
8DeepSeek open source DeepEP – library for MoE training and Inference (opens in new tab)(github.com)536helloericsf1y ago71
9DeepSeek Open Source FlashMLA – MLA Decoding Kernel for Hopper GPUs (opens in new tab)(github.com)441helloericsf1y ago108
10New Qwen2.5-Max Outperforms DeepSeek V3 in Benchmarks (opens in new tab)(twitter.com)3helloericsf1y ago2
11Longest context up to 4M, MiniMax-01 hybrid 456B Open source model (opens in new tab)(github.com)19helloericsf1y ago1