Any "eli5" tutorial on how to do so, if so?
I want to give these models a run but I have no powerful GPU to run them on so don't know where to start.
Matthew Berman has a tutorial on YT showing how to use TheBloke's docker containers on runpod. Sam Witteveen has done videos on together and replicate, they both offer cloud-hosted LLM inference as a service.