DeepSeek Coder: Let the Code Write Itself (opens in new tab)

(deepseekcoder.github.io)

208 pointsfintechie2y ago61 comments

61 comments

Hello, I would like to take this opportunity and ask for help here, about using A.I. with my own codebase.

Context: I missed [almost] the entire A.I. wave, but I knew that one day I would have to learn something about and/or use it. That day has come. I'm allocated in one team, that is migrating to another engine, let's say "engine A → engine B". We are looking from the perspective of A, to map the entries for B (inbound), and after the request to B is returned, we map back to A's model (outbound). This is a chore, and much of the work is repetitive, but it comes with its edge cases that we need to look out for and unfortunately there isn't a solid foundation of patterns apart from the Domain-driven design (DDD) thing. It seemed like a good use case for an A.I.

Attempts: I began by asking to ChatGPT and Bard, with questions similar to: "how to train LLM on own codebase" and "how to get started with prompt engineering using own codebase".

I concluded that, fine-tuning is expensive, for large models, unrealistic for my RTX 3060 with 6Gb VRAM, no surprise there; so, I searched here, in Hacker News, for keywords like "llama", "fine-tuning", "local machine", etc, and I found out about ollama and DeepSeek.

I tried both ollama and DeepSeek, the former was slow but not as slow as the latter, which was dead slow, using a 13B model. I tried the 6/7B model (I think it was codellama) and I got reasonable results and speed. After feeding it some data, I was on my way to try and train on the codebase when a friend of mine came and suggested that I use Retrieval-Augmented Generation (RAG), I have yet to try it, with a setup Langchain + Ollama.

Any thoughts, suggestions or experiences to share?

I'd appreciate it.

WeMoveOn2y ago

> much of the work is repetitive, but it comes with its edge cases

for the repetitive stuff, just use copilot embedded in whatever editor you use.

the edge cases are tricky, to actually avoid these the model would need an understanding of both the use case (which is easy to describe to the model) and the code base itself (which is difficult, since description/docstring is not enough to capture the complex behaviors that can arise from interactions between parts of your codebase).

idk how you would train/finetune a model to somehow have this understanding of your code base, I doubt just doing next token prediction would help, you'd likely have to create chat data discussing the intricacies of your code base and do DPO/RLFH to bake it into your model.

look into techniques like qlora that'll reduce the needed memory during tuning. look into platforms like vast ai to rent GPUs for cheap.

RAG/Agents could be useful but probably not. could store info about functions in your codebase such as the signature, the function it calls, its docstring, and known edge cases associated with it. if you don't have docstrings using a LLM to generate them is feasible.

anotherpaulg2y ago

You could try using aider [0], which is my open source ai coding tool. You need an OpenAI API key, and aider supports any of the GPT 3.5 or 4 models. Aider has features that make it work well with existing code bases, like git integration and a "repository map". The repo map is used to send GPT-4 a distilled map of your code base, focused specifically on the coding task at hand [1]. This provides useful context so that GPT can understand the relevant parts of the larger code base when making a specific change.

[0] https://github.com/paul-gauthier/aider

[1] https://aider.chat/docs/repomap.html

jstummbillig2y ago

I gave aider an afternoon this week (which is not a lot of time to learn anything, of course). It did too many wild things to the project and repository I was using it with (Rails 7 base) to comfortably explore this further – for now.

Paul, if you are up for that, it would be tremendously helpful to have a video(series) that shows what aider can realistically do given a boring, medium sized CRUD code base. The logs in the examples are too narrow and also build not intuition about what to do when things go wrong.

1 more reply

pests2y ago

I second this. Great tool. I always plug your tool in these threads too when relevant. The tree-sitter repo map was a great change. Thank you.

nl2y ago

Unclear exactly what you are expecting to do here, but in any case you shouldn't need to train on your own codebase.

The idea is you put your code into the best possible model (GPT4) and tell it what you want and it generates code.

rickstanley2y ago

My goal is to (1) try running an A.I. locally and see if it works, out curiosity, and (2) delve into A.I. concepts. I do not intend to use it as the definitive tool to code for me, and maybe I shouldn't.

Realistically, since we are in a Azure ecosystem, I would use Codex to try out a solution.

2 more replies

rickstanley2y ago

Ever since I started doing this exercise, I've been excited about the future, with LLMs helping us.

Now I definitely share Linus' sentiment [1] on this topic.

It would be incredible to feed an A.I. some code and request a bug tracking from it.

[1]: https://blog.mathieuacher.com/LinusTorvaldsLLM/

Buttons8402y ago

Is writing the code the hard part, or is ensuring what you've written is correct the hard part? I'd guess the latter and AI will not ensure the code is correct.

Can you record input and output at some layers of your system and then use that data to test the ported code? Make sure the inputs produce the same outputs.

skybrian2y ago

Yes, and I also imagine some kind of AI thing would be useful for reading logs and writing other tests that document what the system does in a nice bdd style.

But you still have to read the tests and decide if that's what you want the code to do, and make sure the descriptions aren't gobbledygook.

mgaunard2y ago

If the code to write is repetitive, then just write some code that does it; no AI needed.

Presumably what matters in this project is correctness, not how many unnecessary cycles you can burn.

mhb2y ago

Maybe this is not relevant to you, but would it make any sense to first try Copilot with IntelliJ or Visual Studio?

high_priest2y ago

Copilot has such a narrow input space, that it is not going to help in this case. Here, just saved you $$

2 more replies

candiddevmike2y ago

> We are looking from the perspective of A, to map the entries for B (inbound), and after the request to B is returned, we map back to A's model (outbound). This is a chore, and much of the work is repetitive, but it comes with its edge cases that we need to look out for and unfortunately there isn't a solid foundation of patterns apart from the Domain-driven design (DDD) thing.

This sounds like a job for protobufs or some kind of serialization solution. And you already know there are dragons here, so letting a LLM try and solve this is just going to mean more rework/validation for you.

If you don't understand the problem space, hire a consultant. LLMs are not consultants (yet). Either way, I'd quit wasting time on trying to feed your codebase into a LLM and just do the work.

rickstanley2y ago

Thanks. I would be interesting though, to see how things plays out. Fortunately, it's no a requirement, just a "side quest".

wokwokwok2y ago

> much of the work is repetitive, but it comes with its edge cases that we need to look out for

Then don't use AI for it.

Bluntly.

This is a poor use-case; it doesn't matter what model you use, you'll get a disappointing result.

These are the domains where using AI coding currently shines:

1) You're approaching a new well established domain (eg. building an android app in kotlin), and you already know how to build things / apps, but not specifically that exact domain.

Example: How do I do X but for an android app in kotlin?

2) You're building out a generic scaffold for a project and need some tedious (but generic) work done.

Example: https://github.com/smol-ai/developer

3) You have a standard, but specific question regarding your code, and although related Q/A answers exist, nothing seems to specifically target the issue you're having.

Example: My nginx configuration is giving me [SPECIFIC ERROR] for [CONFIG FILE]. What's wrong and how can I fix it?

The domains where it does not work are:

1) You have some generic code with domain/company/whatever specific edge cases.

The edge cases, broadly speaking, no matter how well documented, will not be handled well by the model.

Edge cases are exactly that; edge cases; the common medium of 'how to x' does not cover edge cases; the edge cases will not be covered and the results will require you to review and complete them manually.

2) You have some specific piece of code you want to refactor 'to solve xxx', but the code is not covered well by tests.

LLMs struggle to refactor existing code, and the difficulty is proportional to the code length. There are technical reasons for this (mainly randomizing token weights), but tldr; it's basically a crap shot.

Might work. Might not. If you have no tests who knows? You have to manually verify both the new functionality and the old functionality, but maybe it helps a bit, at scale, for trivial problems.

3) You're doing something obscure or using a new library / new version of the library.

The LLM will have no context for this, and will generate rubbish / old deprecated content.

Obscure requirements have an unfortunate tendency to mimic the few training examples that exist, and may generate verbatim copies, depending on the model you use.

...

So. Concrete advice:

1) sigh~

> a friend of mine came and suggested that I use Retrieval-Augmented Generation (RAG), I have yet to try it, with a setup Langchain + Ollama.

Ignore this advice. RAG and langchain are not the solutions you are looking for.

2) Use a normal coding assistant like copilot.

This is the most effective way to use AI right now.

There are some frameworks that let you use open source models if you don't want to use openAI.

3) Do not attempt to bulk generate code.

AI coding isn't at that level. Right now, the tooling is primitive, and large scale coherent code generation is... not impossible, but it is difficult (see below).

You will be more effective using an existing proven path that uses 'copilot' style helpers.

However...

...if you do want to pursue code generation, here's a broad blueprint to follow:

- decompose your task into steps

- decompose you steps in functions

- generate or write tests and function definitions

- generate an api specification (eg. .d.ts file) for your function definitions

- for each function definition, generate the code for the function passing the api specification in as the context. eg. "Given functions x, y, z with the specs... ; generate an implementation of q that does ...".

- repeated generate multiple outputs for the above until you get one that passes the tests you wrote.

This approach broadly scales to reasonably complex problems, so long as you partition your problem into module sized chunks.

I personally like to put something like "you're building a library/package to do xxx" or "as a one file header" as a top level in the prompt, as it seems to link into the 'this should be isolated and a package' style of output.

However, I will caveat this with two points:

1) You generate a lot of code this way, and that's expensive if you use a charge-per-completion API.

2) The results are not always coherent and functions tend to (depending on the model, eg. 7B mistral) inline implementations for 'trivial' functions instead of using functions (eg. if you define Vector::add, the model will 50/50 just go a = new Vector(a.x + b.x, a.y + b.y)).

I've found that the current models other than GPT4 are prone to incoherence as the problem size scales.

7B models, specifically, perform significantly worse than larger models.

eurekin2y ago

Very well researched!

I'd add the MR review use case.

I have limited success with feeding a LLM (dolphin finetune of mixtral) a content of a merge request coming from my team. It was few thousand lines of added integration test code and I just couldn't be bothered/had little time to really delve.

I slapped the diff and used about 10 prompt strategies to get anything meaningful. So my first initial impressions were: clearly it was finetuned on too short responses. It kept putting in "etc.", "and other input parameters", "and other relevant information". At one point I was ready to give up; it clearly hallucinated.

Or that's what I thought: turned out there was some new edge case of a existing functionality added that was added, without ever me noticing (despite being on the same meetings).

I think it actually saved me a lot of hours or pestering other team members.

_boffin_2y ago

Been using DeepSeek Coder 33B Q8 on my work laptop for a bit now. I like it, but am still finding myself going to GPT-4's API for the more nuanced things.

They just released a v1.5 (https://huggingface.co/deepseek-ai/deepseek-coder-7b-instruc...), but for some reason, they reduced the context length from ~16k to ~4k.

busfahrer2y ago

Interesting, what are your laptop specs? I imagine the 33B Q8 to need a bit of memory - are you sharing VRAM and RAM?

M4v3R2y ago

ARM based Macbooks with 48 GB of RAM or more are capable of running 33B models just fine, thanks to their "unified memory" architecture (single memory subsystem for RAM & VRAM).

_boffin_2y ago

M2 Max 64GB MBP 16"

sestinj2y ago

We've been playing with the 1.3b model for continue.dev's autocomplete and it's quite impressive. One unclear part is whether the license really permits commercial usage, but regardless it's exciting to see the construction of more complex datasets. They mention that training on multiple tasks (FIM + normal completion) improves performance...wonder whether training to output diffs would be equally helpful (this is the holy grail needed to generate changes in O(diff length) time)

DeepSeek2y ago

Hello sestinj, I work at DeepSeek, and I'm glad to hear that our models work for you! DeepSeek-Coder models are under a permissive license that allows for both research and unrestricted commercial use. We claim no rights on and take no responsibility from the output generated by the model induced by user prompts. Feel free to deploy and use DeepSeek models for any creative projects. We are a starup company and would like to concentrate on building better models, so it would be best if the users can help create a healthy ecosystem. Should you have any questions or requirements, I'm always happy to support.

objactivate2y ago

The latest 7B model demonstrated impressive performance on benchmarks.

However, I have a question regarding its specific deployment method: How can I merge the parts of the Safetensors format? Specifically, I'm referring to files named 'model-00001-of-00002.safetensors' and 'model-00002-of-00002.safetensors'.

My motivation is straightforward: I aim to combine the Safetensor 'shards' and then utilize the 'convert.py' script from the llama.cpp project to transform a single .safetensors file into the GGUF format. This conversion facilitates running the models on WasmEdge.

I appreciate any guidance on this matter. Thank you.

1 more reply

explorigin2y ago

> This code repository is licensed under the MIT License. The use of DeepSeek Coder models is subject to the Model License. DeepSeek Coder supports commercial use.

Says so on https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instr...

They have their own license to prevent things like propaganda or military use.

elwebmaster2y ago

Mixtral > Codellama > DeepSeek Coder. Very weird model, writes super long comments on one line, definitely not at the level of Codellama, benchmarks be damned.

apawloski2y ago

Which mixtral are you using with ollama? I have 32GB M1 MacBook Pro and can’t seem to load it / get any time-realistic responses

chrisweekly2y ago

tangent: I often have a hard time disambiguating the ">" in comparisons like yours: (A) greater than (ie, Mixtral superior to DeepSeek, w/ Codellama in between) vs (B) arrow/sequence (ie, start w/ Mixtral, progress to Codellama, finally land on DeepSeek as the culmination).

I'd love to hear of a less ambiguous way to represent these.

mcny2y ago

I personally use a -> b -> c for sequence, a > b >> c for nested hierarchy.

In this context, I read it as a is better than b which is better than c.

mooreds2y ago

How does

Mixtral >= Codellama >= DeepSeek

Work for you?

1 more reply

babyshake2y ago

Just curious, why do you think your evaluation is the opposite of what is shown in this evaluation? https://evalplus.github.io/leaderboard.html

svantana2y ago

Not OP but these evaluations should be taken with a huge grain of salt. It's almost impossible to rule out data leakage of open datasets such as HumanEval.

high_priest2y ago

Mixtral looks interesting, but I haven't dabbled in locally hosted LLMs.

Would you mind linking to a concise text which could lead me through setting up Mixtral on my own machine?

djjtiro82y ago

It took me just as long to setup llama.cpp as it did to get other tools working well (ollama or other frontend that abstract away the actual config)

It’s always read HOWTO, attempt to recreate state, so I prefer sticking with low level where I also learn a bit more about the internals

C/C++ user friendliness has come as far as all the other languages and the ecosystems. Really the only reason to “fear” it is propagated memes to do so. It’s not a gun.

So I’d suggest just compile llama.cpp and install huggingface-cli to download GGUF format models, which is all ollama is doing but with even more dependencies and much more opaque outcome

findjashua2y ago

LM Studio is the easiest way to do it

1 more reply

aussieguy12342y ago

I'm using Mixtral, but rather than shell out for a gaming laptop with an expensive GPU, I simply run it via Together.ai APIs which works out alot cheaper. There's a few similar services out there.

baq2y ago

Had zero experience, too. Turns out ollama does everything, literally. You just tell it to run a model and wait a bit for it to download. One (1) shell command total.

elwebmaster2y ago

Ollama gui in WSL2

djjtiro82y ago

Agreed. Mixtral is freaky good. I have one 3090 and it flies.

DSC on the other hand crawls its way to poor answers and injected snippets of unrelated code into output for me.

findjashua2y ago

do you find Mixtral to be better than the new 70B one that Meta released a couple days back as well?

findjashua2y ago

did a comparison on LM Studio - the answers are eerily similar, but Mixtral is way way faster. Codellama-70B is slow to the point of being unusable.

(M1 Max, 64 GB RAM)

maxlamb2y ago

Is Mixtral better overall or for coding specifically (or both)?

_boffin_2y ago

Reduce the repetition penalty to 1 and that should fix it.

Havoc2y ago

I’ve been using their 7B with tabbyML.

Works well but closer to a very smart code complete rather than generating much novel blocks of code

illusive40802y ago

Fells like all Gen AI is ‘very good code complete’ because if you give it a broad problem it’ll make mistakes.

chii2y ago

Just tried it by asking how to create a game that is turn based, using an ECS system, and how to add a decision tree, and a save/load system, in the language Haxe.

It outputs relatively correct haxe code, but it did halucinate that there is a library called 'haxe-tiled' to read tmx map files...

hackerlight2y ago

In the benchmarks, are they using the base GPT-4, or are they using a GPT like Grimoire which will be better at coding? If they aren't using Grimoire, isn't it unfair to compare their fine tuned model to base GPT-4?

BugsJustFindMe2y ago

Is it unfair if GPT4 still beats them handily?

anakaine2y ago

It's probably unfair to the readers that they are only presenting performance against GPT3.5 turbo and not against GPT 4. We could all make our numbers look excellent if we compared them to lesser models.

1 more reply

byyoung32y ago

looks like code llama 70B outperforms on humaneval I believe

SparkyMcUnicorn2y ago

Looks like even deepseek-coder 1.3b benchmarks higher than CodeLlama 70b.

https://evalplus.github.io/leaderboard.html

j / k navigate · click thread line to collapse

61 comments

rickstanley2y ago

Hello, I would like to take this opportunity and ask for help here, about using A.I. with my own codebase.

Attempts: I began by asking to ChatGPT and Bard, with questions similar to: "how to train LLM on own codebase" and "how to get started with prompt engineering using own codebase".

Any thoughts, suggestions or experiences to share?

I'd appreciate it.

WeMoveOn2y ago

> much of the work is repetitive, but it comes with its edge cases

for the repetitive stuff, just use copilot embedded in whatever editor you use.

look into techniques like qlora that'll reduce the needed memory during tuning. look into platforms like vast ai to rent GPUs for cheap.

anotherpaulg2y ago

[0] https://github.com/paul-gauthier/aider

[1] https://aider.chat/docs/repomap.html

jstummbillig2y ago

1 more reply

pests2y ago

I second this. Great tool. I always plug your tool in these threads too when relevant. The tree-sitter repo map was a great change. Thank you.

nl2y ago

Unclear exactly what you are expecting to do here, but in any case you shouldn't need to train on your own codebase.

The idea is you put your code into the best possible model (GPT4) and tell it what you want and it generates code.

rickstanley2y ago

Realistically, since we are in a Azure ecosystem, I would use Codex to try out a solution.

2 more replies

rickstanley2y ago

Ever since I started doing this exercise, I've been excited about the future, with LLMs helping us.

Now I definitely share Linus' sentiment [1] on this topic.

It would be incredible to feed an A.I. some code and request a bug tracking from it.

[1]: https://blog.mathieuacher.com/LinusTorvaldsLLM/

Buttons8402y ago

Is writing the code the hard part, or is ensuring what you've written is correct the hard part? I'd guess the latter and AI will not ensure the code is correct.

Can you record input and output at some layers of your system and then use that data to test the ported code? Make sure the inputs produce the same outputs.

skybrian2y ago

Yes, and I also imagine some kind of AI thing would be useful for reading logs and writing other tests that document what the system does in a nice bdd style.

But you still have to read the tests and decide if that's what you want the code to do, and make sure the descriptions aren't gobbledygook.

mgaunard2y ago

If the code to write is repetitive, then just write some code that does it; no AI needed.

Presumably what matters in this project is correctness, not how many unnecessary cycles you can burn.

mhb2y ago

Maybe this is not relevant to you, but would it make any sense to first try Copilot with IntelliJ or Visual Studio?

high_priest2y ago

Copilot has such a narrow input space, that it is not going to help in this case. Here, just saved you $$

2 more replies

candiddevmike2y ago

If you don't understand the problem space, hire a consultant. LLMs are not consultants (yet). Either way, I'd quit wasting time on trying to feed your codebase into a LLM and just do the work.

rickstanley2y ago

Thanks. I would be interesting though, to see how things plays out. Fortunately, it's no a requirement, just a "side quest".

wokwokwok2y ago

> much of the work is repetitive, but it comes with its edge cases that we need to look out for

Then don't use AI for it.

Bluntly.

This is a poor use-case; it doesn't matter what model you use, you'll get a disappointing result.

These are the domains where using AI coding currently shines:

1) You're approaching a new well established domain (eg. building an android app in kotlin), and you already know how to build things / apps, but not specifically that exact domain.

Example: How do I do X but for an android app in kotlin?

2) You're building out a generic scaffold for a project and need some tedious (but generic) work done.

Example: https://github.com/smol-ai/developer

3) You have a standard, but specific question regarding your code, and although related Q/A answers exist, nothing seems to specifically target the issue you're having.

Example: My nginx configuration is giving me [SPECIFIC ERROR] for [CONFIG FILE]. What's wrong and how can I fix it?

The domains where it does not work are:

1) You have some generic code with domain/company/whatever specific edge cases.

The edge cases, broadly speaking, no matter how well documented, will not be handled well by the model.

2) You have some specific piece of code you want to refactor 'to solve xxx', but the code is not covered well by tests.

Might work. Might not. If you have no tests who knows? You have to manually verify both the new functionality and the old functionality, but maybe it helps a bit, at scale, for trivial problems.

3) You're doing something obscure or using a new library / new version of the library.

The LLM will have no context for this, and will generate rubbish / old deprecated content.

Obscure requirements have an unfortunate tendency to mimic the few training examples that exist, and may generate verbatim copies, depending on the model you use.

...

So. Concrete advice:

1) sigh~

> a friend of mine came and suggested that I use Retrieval-Augmented Generation (RAG), I have yet to try it, with a setup Langchain + Ollama.

Ignore this advice. RAG and langchain are not the solutions you are looking for.

2) Use a normal coding assistant like copilot.

This is the most effective way to use AI right now.

There are some frameworks that let you use open source models if you don't want to use openAI.

3) Do not attempt to bulk generate code.

AI coding isn't at that level. Right now, the tooling is primitive, and large scale coherent code generation is... not impossible, but it is difficult (see below).

You will be more effective using an existing proven path that uses 'copilot' style helpers.

However...

...if you do want to pursue code generation, here's a broad blueprint to follow:

- decompose your task into steps

- decompose you steps in functions

- generate or write tests and function definitions

- generate an api specification (eg. .d.ts file) for your function definitions

- repeated generate multiple outputs for the above until you get one that passes the tests you wrote.

This approach broadly scales to reasonably complex problems, so long as you partition your problem into module sized chunks.

However, I will caveat this with two points:

1) You generate a lot of code this way, and that's expensive if you use a charge-per-completion API.

I've found that the current models other than GPT4 are prone to incoherence as the problem size scales.

7B models, specifically, perform significantly worse than larger models.

eurekin2y ago

Very well researched!

I'd add the MR review use case.

Or that's what I thought: turned out there was some new edge case of a existing functionality added that was added, without ever me noticing (despite being on the same meetings).

I think it actually saved me a lot of hours or pestering other team members.

_boffin_2y ago

Been using DeepSeek Coder 33B Q8 on my work laptop for a bit now. I like it, but am still finding myself going to GPT-4's API for the more nuanced things.

They just released a v1.5 (https://huggingface.co/deepseek-ai/deepseek-coder-7b-instruc...), but for some reason, they reduced the context length from ~16k to ~4k.

busfahrer2y ago

Interesting, what are your laptop specs? I imagine the 33B Q8 to need a bit of memory - are you sharing VRAM and RAM?

M4v3R2y ago

ARM based Macbooks with 48 GB of RAM or more are capable of running 33B models just fine, thanks to their "unified memory" architecture (single memory subsystem for RAM & VRAM).

_boffin_2y ago

M2 Max 64GB MBP 16"

sestinj2y ago

DeepSeek2y ago

objactivate2y ago

The latest 7B model demonstrated impressive performance on benchmarks.

I appreciate any guidance on this matter. Thank you.

1 more reply

explorigin2y ago

> This code repository is licensed under the MIT License. The use of DeepSeek Coder models is subject to the Model License. DeepSeek Coder supports commercial use.

Says so on https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instr...

They have their own license to prevent things like propaganda or military use.

elwebmaster2y ago

Mixtral > Codellama > DeepSeek Coder. Very weird model, writes super long comments on one line, definitely not at the level of Codellama, benchmarks be damned.

apawloski2y ago

Which mixtral are you using with ollama? I have 32GB M1 MacBook Pro and can’t seem to load it / get any time-realistic responses

chrisweekly2y ago

I'd love to hear of a less ambiguous way to represent these.

mcny2y ago

I personally use a -> b -> c for sequence, a > b >> c for nested hierarchy.

In this context, I read it as a is better than b which is better than c.

mooreds2y ago

How does

Mixtral >= Codellama >= DeepSeek

Work for you?

1 more reply

babyshake2y ago

Just curious, why do you think your evaluation is the opposite of what is shown in this evaluation? https://evalplus.github.io/leaderboard.html

svantana2y ago

Not OP but these evaluations should be taken with a huge grain of salt. It's almost impossible to rule out data leakage of open datasets such as HumanEval.

high_priest2y ago

Mixtral looks interesting, but I haven't dabbled in locally hosted LLMs.

Would you mind linking to a concise text which could lead me through setting up Mixtral on my own machine?

djjtiro82y ago

It took me just as long to setup llama.cpp as it did to get other tools working well (ollama or other frontend that abstract away the actual config)

It’s always read HOWTO, attempt to recreate state, so I prefer sticking with low level where I also learn a bit more about the internals

C/C++ user friendliness has come as far as all the other languages and the ecosystems. Really the only reason to “fear” it is propagated memes to do so. It’s not a gun.

So I’d suggest just compile llama.cpp and install huggingface-cli to download GGUF format models, which is all ollama is doing but with even more dependencies and much more opaque outcome

findjashua2y ago

LM Studio is the easiest way to do it

1 more reply

aussieguy12342y ago

I'm using Mixtral, but rather than shell out for a gaming laptop with an expensive GPU, I simply run it via Together.ai APIs which works out alot cheaper. There's a few similar services out there.

baq2y ago

Had zero experience, too. Turns out ollama does everything, literally. You just tell it to run a model and wait a bit for it to download. One (1) shell command total.

elwebmaster2y ago

Ollama gui in WSL2

djjtiro82y ago

Agreed. Mixtral is freaky good. I have one 3090 and it flies.

DSC on the other hand crawls its way to poor answers and injected snippets of unrelated code into output for me.

findjashua2y ago

do you find Mixtral to be better than the new 70B one that Meta released a couple days back as well?

findjashua2y ago

did a comparison on LM Studio - the answers are eerily similar, but Mixtral is way way faster. Codellama-70B is slow to the point of being unusable.

(M1 Max, 64 GB RAM)

maxlamb2y ago

Is Mixtral better overall or for coding specifically (or both)?

_boffin_2y ago

Reduce the repetition penalty to 1 and that should fix it.

Havoc2y ago

I’ve been using their 7B with tabbyML.

Works well but closer to a very smart code complete rather than generating much novel blocks of code

illusive40802y ago

Fells like all Gen AI is ‘very good code complete’ because if you give it a broad problem it’ll make mistakes.

chii2y ago

Just tried it by asking how to create a game that is turn based, using an ECS system, and how to add a decision tree, and a save/load system, in the language Haxe.

It outputs relatively correct haxe code, but it did halucinate that there is a library called 'haxe-tiled' to read tmx map files...

hackerlight2y ago

BugsJustFindMe2y ago

Is it unfair if GPT4 still beats them handily?

anakaine2y ago

1 more reply

byyoung32y ago

looks like code llama 70B outperforms on humaneval I believe

SparkyMcUnicorn2y ago

Looks like even deepseek-coder 1.3b benchmarks higher than CodeLlama 70b.

https://evalplus.github.io/leaderboard.html

j / k navigate · click thread line to collapse