1
I've been using LLMs a lot lately to generate code, and code quality is a mixed bag. Sometimes it will run straight out of the box or with a few manual tweaks, and others it just straight up won't compile. Keen to hear what workarounds others have used to solve this (e.g. re-prompting, constraining generations, etc).
1. What's the largest model you can load onto your GPU? 2. Which Apple GPU do you have (M1-M3 Max)? 3. How much memory (RAM) do you have? 4. (ideally) how many tokens/s can you generate at?