When checking this model, I found out [1] it's based on llama-2 ?
``` Expand Llama 3 70B Instruct AWQ Parameters and Internals LLM Name Llama 3 70B Instruct AWQ Repository Open on Base Model(s) Llama 2 70B Instruct quantumaikr/llama-2-70B-instruct Model Size 70b ```
I added a question [2] on Hugging Face to learn more about this.
Anyone could explain to me what this means? Does it mean that it has been trained on the version 2 and wrongly named version 3? Or is it something that is not well intended?
[1] https://llm.extractum.io/model/casperhansen%2Fllama-3-70b-in...
[2] https://huggingface.co/casperhansen/llama-3-70b-instruct-awq...
Go look at the model config, you can clearly see it's Llama 3.
And more seriously, it seems like the LLM could be used to precreate lots of filler prefixes that correspond to the rag’d document that are being sent to the model.
While it wouldn’t work if you’re GPU’d bound, multiple prompts could be run in parallel with different pieces of context and then have the model chose the most appropriate response (which could be done in parallel too).
skipping the LLM would be tough because there are so many devices in my house, not to mention it would take away from the personality of the assistant.
however, a recommendation algorithm would actually work great since i could augment the LLM prompt with it regardless of the prompt.
https://news.ycombinator.com/item?id=38985152 ( 187 comments , 2024-01-13 )
I hate the introduction to the response. That’s not even trying to be human, i don’t know something more like a deranged patronizing butler.