0 pointsnotsahil2y ago0 comments Model Stats
- Architecture: LLAMA-like model with multi-query attention
- Objectives Fill-in-the-Middle, Chat
- Tokens context: 4096
- Pretraining tokens: 1.2T
- Finetuning tokens: 40B
- Precision: bfloat16
- GPUs 64 NVidia A5000
- Training time 28 days