Are you aware that having ram doesn't matter when your tokens/second is slow as shit?
You don't need to run large models, Gemma QAT 27B fits on one GPU and is quite good. Other models like Qwen3 are great for coding.
3090 gets 100+ tokens/second for QWEN, very close to what you would see with a cloud based model.
M3 ultra gets ~30.
Congrats, you played yourself.