Yi-Coder: A Small but Mighty LLM for Code (opens in new tab)

yumraj1y ago

There’s no company info on DeepSeek’s website. Looking at the above, and considering that, it seems very sketchy indeed.

Maybe OK for trying out stuff, a big no no for real work.

https://aider.chat/docs/leaderboards/

The names of their researchers are on this recent paper: https://arxiv.org/pdf/2408.15664

Their terms of service say "The DeepSeek Open Platform is jointly owned and operated by Hangzhou DeepSeek Artificial Intelligence Co., Ltd., Beijing DeepSeek Artificial Intelligence Co., Ltd. "

And they're funded by https://www.high-flyer.cn/en/fund/ which the FT did an article on: https://www.ft.com/content/357f3c68-b866-4c2e-b678-0d075051a...

In terms of the personal data you share when using their models, I can't see why they would be any more or less nefarious than big Western tech companies.

That said, if you're using a model based in China then by providing them with data and feedback you are in a very small way helping researchers in China catch up with/keep up with/overtake researchers in the West. Maybe in the long term that could end badly. And if you are concerned about the environment, it's entirely possible their training and inference is run using coal power stations.

rfoo1y ago

> There’s no company info on DeepSeek’s website.

It's backed solely by a hedge fund who do not want to draw attention to their business. So yeah, as sketchy as DESRES.

dotancohen1y ago

Might be good for contributing to open source projects. But not for clients' projects.

stavros1y ago

I'm making a small calendar renderer for e-ink screens (https://github.com/skorokithakis/calumny) which Claude basically wrote all of, so I figured I'd try DeepSeek. I had it add a small circle to the left of the "current day" line, which it added fine, but it couldn't solve the problem of the circle not being shown over another element. It tried and tried, to no avail, until I switched to Claude, which fixed the problem immediately.

43x cheaper is good, but my time is also worth money, and it unfortunately doesn't bode well for me that it's stumped by the first problem I throw at it.

jadbox1y ago

What's the better plug-in among Continue Dev, Aider and Claude Dev?

phren0logy1y ago

You probably already know that Aider is not a plugin, but just in case - it's a program that runs from the terminal. I think the results are very impressive, and it readily handles multiples source files for context.

FloatArtifact1y ago

What kind of hardware is required for the local llm for the continue stack?

jodrellblank1y ago

Uh, what price format is "14 million cents per 28 million cents"?

sabbaticaldev1y ago

14 cents per million input/28 cents per million output tokens

jodrellblank1y ago

Oh right, thanks.

anotherpaulg1y ago

Yi-Coder scored below GPT-3.5 on aider's code editing benchmark. GitHub user cheahjs recently submitted the results for the 9b model and a q4_0 version.

Yi-Coder results, with Sonnet and GPT-3.5 for scale:

  77% Sonnet
  58% GPT-3.5
  54% Yi-Coder-9b-Chat
  45% Yi-Coder-9b-Chat-q4_0

Full leaderboard:

Palmik1y ago

The difference between (A) software engineers reacting to AI models and systems for programming and (B) artists (whether it's painters, musicians or otherwise) reacting to AI models for generating images, music, etc. is very interesting.

I wonder what's the reason.

suprjami1y ago

Because code either works or it doesn't. Nobody is replacing our entire income stream with an LLM.

You also need a knowledge of code to instruct an LLM to generate decent code, and even then it's not always perfect.

Meanwhile plenty of people are using free/cheap image generation and going "good enough". Now they don't need to pay a graphic artist or a stock photo licence

Any layperson can describe what they want a picture to look like so the barrier to entry and successful exit is a lot lower for LLM image generation than for LLM code generation.

jodrellblank1y ago

> "Meanwhile plenty of people are using free/cheap image generation and going "good enough". Now they don't need to pay a graphic artist or a stock photo licence"

and getting sandwich photos of ham blending into human fingers:

https://www.reddit.com/r/Wellthatsucks/comments/1f8bvb8/my_l...

saurik1y ago

And yet, even knowing what I was looking for, I didn't see it long enough that I guessed I misunderstood and swiped to the second image, where it was pointed out specifically. Even if I had noticed myself--presumably because I was staring at it for way too long in the restaurant--I can't imagine I would have guessed what was going on, BUT EVEN THEN it just wouldn't have mattered... clearly, this is more than merely a "good enough" image.

datavirtue1y ago

At best it's a prototype and concept generator. It would have to yield assets with layers that can be exported by an illustration or bitmap tool of choice. AI generated images are almost completely useless as-is.

suprjami1y ago

I agree there are plenty of images with garbled text and hands with 7 fingers, but text to image has freely available generators which create almost perfect images for some prompts. Certainly good enough to replace an actor holding a product, a stock photo, and often a stylised design.

aithrowaway19871y ago

Look at who the tools are marketed towards. Writing software involves a lot of tedium, eye strain, and frustration, even for experts who have put in a lot of hours practicing, so LLMs are marketed to help developers make their jobs easier.

This is not the case for art or music generators: they are marketed towards (and created by) laypeople with who want generic content and don't care about human artists. These systems are a significant burden on productivity (and fatal burden on creativity) if you are an honest illustrator or musician.

Another perspective: a lot of the most useful LLM codegen is not asking the LLM to solve a tricky problem, but rather to translate and refine a somewhat loose English-language solution into a more precise JavaScript solution (or whatever), including a large bag of memorized tricks around sorting, regexes, etc. It is more "science than art," and for a sufficiently precise English prompt there is even a plausible set of optimal solutions. The LLM does not have to "understand" the prompt or rely on plagiarism to give a good answer. (Although GPT-3.5 was a horrific F# plagiarist... I don't like LLM codegen but it is far more defensible than music generation)

This is not the case with art or music generators: it makes no sense to describe them as "English to song" translators, and the only "optimal" solutions are the plagiarized / interpolated stuff the human raters most preferred. They clearly don't understand what they are drawing, nor do they understand what melodies are. Their output is either depressing content slop or suspiciously familiar. And their creators have filled the tech community with insultingly stupid propaganda like "they learn art just like human artists do." No wonder artists are mad!

rahimnathwani1y ago

What you say may be true about the simplest workflow: enter a prompt and get one or more finished images.

But many people use diffusion models in a much more interactive way, doing much more of the editing by hand. The simplest case is to erase part of a generated image, and prompt to infill. But there are people who spend hours to get a single image where they want it.

eropple1y ago

This is true, and there's some really cool stuff there, but that's not who most of this is marketed at. Small wonder there's backlash from artists and people who appreciate artists when the stated value proposition is "render artists unemployed".

datavirtue1y ago

": they are marketed towards (and created by) laypeople with who want generic content and don't care about human artists"

Good. The artists I know have zero interest in doing that work. I have sacrificed a small fortune to invest in my wife's development as an artist so she never had to worry about making any money. She uses AI to help with promoting and "marketing" herself.

She and all of her colleagues all despise commissioned work and they get a constant stream of them. I always tell her to refuse them. Some pay very well.

If you are creating generic "art" for corporations I have little more than a shrug for your anxiety over AI.

tcdent1y ago

It's just gatekeeping.

Artists put a ton of time into education and refining their vision inside the craft. Amateur efforts to produce compelling work always look amateur. With augmentation, suddenly the "real" artists aren't as differentiated.

The whole conversation is obviously extremely skewed toward digital art, and the ones talking about it most visibly are the digital artists. No abstract painter thinks AI is coming for their occupation or cares wether it is easier to create anime dreamscapes this year or the next.

rty321y ago

Coding Assistants are not good enough (yet). Inline suggestions and chats are incredibly helpful and boost productivity (and only to those who know to use them well), but that's as fast as they go today.

If they can take a Jira ticket, debug the code, create a patch for a large codebase and understand and respect all the workarounds in a legacy codebase, I would have a problem with it.

xvector1y ago

Except they can't do the equivalent for art yet either, and I am fairly familiar with the state of image diffusion today.

I've commissioned tens of thousands of dollars in art, and spent many hundreds of hours working with Stable Diffusion, Midjourney, and Flux. What all the generators are missing is intentionality in art.

They can generate something that looks great at surface level, but doesn't make sense when you look at the details. Why is a particular character wearing a certain bracelet? Why do the windows on that cottage look a certain way? What does a certain engraving mean? Which direction is a character looking, and why?

The diffusers do not understand what they are generating, so they just generates what "looks right." Often this results in art that looks pretty but has no deeper logic, world building, meaning, etc.

And of course, image generators cannot handle the client-artist relationship as well (even LLMs cannot), because it requires an understanding of what the customer wants and what emotion they want to convey with the piece they're commissioning.

So - I rely on artists for art I care about (art I will hang on my walls), and image generators for throwaway work (such as weekly D&D campaign images.)

rty321y ago

Of course the "art" art -- the part that is all about human creativity -- will always be there.

But lots of people in the art business aren't doing that. If you didn't have midjourney etc, what would you be doing for the throwaway work? Learn to design the stuff yourself, hire someone to do it on Upwork, or just not do it all? Some money likely will exchange hands there.

viraptor1y ago

Have you seen https://www.swebench.com/ ?

Once you engage agentic behaviour, it can take you way further than just the chats. We're already in the "resolving JIRA tickets" area - it's just hard to setup, not very well known, and may be expensive.

IshKebab1y ago

> We're already in the "resolving JIRA tickets" area

For very simple tasks maybe, but not for the kinds of things I get paid to do.

I don't think it will be able to get to the level of reliably doing difficult programming tasks that require understanding and inferring requirements without having AGI, in which case society has other things to worry about than programmers losing their jobs.

rty321y ago

Looks like the definition of "resolving a ticket" here is "come up with a patch that ensures all tests pass", which does not necessarily include "add a new test", "make sure the patch is actually doing something meaningful", "communicate how this is fixed". Based on my experience and what I saw in the reports in the logs, a solution could be just hallucinating completely useless code -- as long as it doesn't fail a test.

Of course, it is still impressive, and definitely would help with the small bugs that require small fixes, especially for open source projects that have thousands of open issues. But is it going to make a big difference? Probably not yet.

Also, good luck doing that on our poorly written, poorly documented and under tested codebase. By any standard django is a much better codebase than the one I work on every day.

https://github.com/01-ai/Yi-Coder

mrklol1y ago

But that’s not that far. Like sure, currently it’s not. But "reading a ticket with a description, find the relevant code, understand the code (often better than human), test it, return the result" is totally doable with some more iterations. It’s already doable for smaller projects, see GitHub workspaces etc.

denizener1y ago

I love art and code, IMO is because Cursor is really good and AI art is not that good.

There isn't a good metaphor for the problem with AI art. I would say it is like some kind of chocolate cake that the first few bites seem like the best cake you have ever had and then progressive bites become more and more shit until you stop even considering eating it. Then at some point even the thought of the cake makes you want to puke.

I say this as someone who thought we reached the art singularity in December 2022. I have no philosophical or moral problem with AI art. It just kind of sucks.

Cursor/Sonnet on the other hand just blew my mind earlier today.

GaggiX1y ago

There are really good models for AI art, if people care. I think that AI is better at making an image from start to finish than making some software from start to finish.

And I use Claude 3.5 Sonnet myself.

datavirtue1y ago

AI art is an oxymoron. It will never give me chills or make me cry.

crimsoneer1y ago

I mean, it's supply and demand right.

- There is a big demand for really complex software development, and an LLM can't do that alone. So software devs have to do lots of busywork, and like the opportunity to be augmented by AI

- Conversely, there is a huge demand for not very high level art. - eg, lots of people want a custom logo or a little jingle, but no many people want to hire a concert pianist or comission the next Salvadore Dali.

So most artists spend a lot of time doing a lot of low level work to pay the bills, while software devs spend a lot of time doing low level code monkey work so they can get to the creative part of their job.

viraptor1y ago

Is it really? I know people who love using LLMs, people who are allergic to the idea of even taking about AI usability and lots of others in between. Same with artists hating the idea, artists who spend hours crafting very specific things with SD, and many in between.

I'm not sure I can really point out a big difference here. Maybe the artists are more skewed towards not liking AI since they work with medium that's not digital in the first place, but the range of responses really feels close.

theshrike791y ago

> Continue pretrained on 2.4 Trillion high-quality tokens over 52 major programming languages.

I'm still waiting for a model that's highly specialised for a single language only - and either a lot smaller than these jack of all trades ones or VERY good at that specific language's nuances + libraries.

rfoo1y ago

An unfortunate fact is, similar to human with infinite time, LLMs usually have better performance on your specific langauge when they are not limited to learn or over-sample one single language. Not unlike the common saying "learning to code in Haskell makes you a better C++ programmer".

Of course, this is far from trivial, you don't just add more data and expect it to automatically be better for everything. So is time management for us mere mortals.

deely31y ago

> usually have better performance on your specific langauge when they are not limited to learn or over-sample one single language.

Source? Im very curious how learning one language helps model to generate code in language with different paradigms. Java, Markdown, JSON, HTML, Fortran?

cztomsik1y ago

I think around the BLOOM models (2022) it was found out that if you train english-only, the model performs worse than if you have even little mixture of other languages.

Also, there were other papers (one epoch is all you need) where it was shown that diverse data is better than multiple epochs, and finally, there was paper (textbooks is all you need) for famous Phi model, with conclusion that high-quality data > lots of data.

This by itself is not a proof for your specific question but you can extrapolate.

imjonse1y ago

Unclear how much of their coding knowledge is in the space of syntax/semantics of a given language and how much in the latent space that generalizes across languages and logic in general. If I were to guess I'd say 80% is in the latter for the larger capable models. Even very small models (like in Karpathy's famous RNN blog) will get syntax right but that is superficial knowledge.

sitkack1y ago

The models benefit immensely from being trained with more data from other languages, even if you only ever use it in one.

You could finetune it on your codebases and specific docs for added perf.

rty321y ago

I don't know if that will happen, but there are tools that at least try to improve performance for specific languages, especially "underrepresented" languages, e.g. https://sourcegraph.com/blog/enhancing-code-completion-for-r...

richardw1y ago

I’d be interested to know if that trade off ends up better. There’s probably a lot of useful training that transfers well between languages, so I wouldn’t be that surprised if the extra tokens helped across all languages. I would guess a top quality single language model would need to be very well supported, eg Python or JavaScript. Not, say, Clojure.

mark_l_watson1y ago

I get your point. Models that support many dozens of human languages seem not what I personally need because I only speak English.

However, I enjoy using various Lisp languages and I was pleased last night when I set up Emacs + ellama + Ollama + Yi-Coder. I experimented with Cursor last weekend, and it was nice for Python, not so great for Common Lisp.

karagenit1y ago

Yep, been waiting for the same thing. Maybe at some point it’ll be possible to use a large multilingual model to translate the dataset into one programming language, then train a new smaller model on just that language?

terminalcommand1y ago

Isn't microsoft phi specifically trained for Python? I recall that Phi 1 was advertised as a Python coding helper.

It's a small model trained only by quality sources (ie textbooks).

wiz21c1y ago

If the LLM training makes the LLM generalize things between languages, then it is better to leave it like it is...

kamphey1y ago

I wonder what those 52 languages are.

richardw1y ago

According to the repo README: 'java', 'markdown', 'python', 'php', 'javascript', 'c++', 'c#', 'c', 'typescript', 'html', 'go', 'java_server_pages', 'dart', 'objective-c', 'kotlin', 'tex', 'swift', 'ruby', 'sql', 'rust', 'css', 'yaml', 'matlab', 'lua', 'json', 'shell', 'visual_basic', 'scala', 'rmarkdown', 'pascal', 'fortran', 'haskell', 'assembly', 'perl', 'julia', 'cmake', 'groovy', 'ocaml', 'powershell', 'elixir', 'clojure', 'makefile', 'coffeescript', 'erlang', 'lisp', 'toml', 'batchfile', 'cobol', 'dockerfile', 'r', 'prolog', 'verilog'

Y_Y1y ago

They're playing a dangerous game if they assume that a single language or even family of similar languages is referred to by e.g. "assembly", "shell", "lisp".

(I also note that several of these are markup or config languages which are explicitly not for programming.)

JediPig1y ago

I tested this out on my workload ( SRE/Devops/C#/Golang/C++ ). it started responding about non-sense on a simple write me boto python script that changes x ,y,z value.

Then I tried other questions in my past to compare... However, I believe the engineer who did the LLM, just used the questions in benchmarks.

One instance after a hour of use ( I stopped then ) it answered one question with 4 different programming languages, and answers that was no way related to the question.

tmikaeld1y ago

I have the same experience, hallucinates and rambles on and on about "solutions" that are not related.

Unfortunately, this has always been my experience with all open source code models that can be self-hosted.

Gracana1y ago

It sounds like you are trying to chat with the base model when you should be using a chat model.

tmikaeld1y ago

No, I’m using 9b-chat-q8_0 on a 4090

tarruda1y ago

Have you ran the model in full FP16? It is possible a lot of performance is lost when running quantized versions.

mtrovo1y ago

I'm new to this whole area and feeling a bit lost. How are people setting up these small LLMs like Yi-Coder locally for tab completion? Does it work natively on VSCode?

Also for the cloud models apart from GitHub Copilot, what tools or steps are you all using to get them working on your projects? Any tips or resources would be super helpful!

cassianoleal1y ago

You can run this LLM on Ollama [0] and then use Continue [1] on VS Code.

The setup is pretty simple:

* Install Ollama (instructions for your OS on their website - for macOS, `brew install ollama`)

* Download the model: `ollama pull yi-coder`

* Install and configure Continue on VS Code (https://docs.continue.dev/walkthroughs/llama3.1 <- this is for Llama 3.1 but it should work by replacing the relevant bits)

[0] https://ollama.com/

[1] https://www.continue.dev/

suprjami1y ago

If you have a project which supports OpenAI API keys, you can point it at a LocalAI instance:

This is easy to get "working" but difficult to configure for specific tasks due to docs being lacking or contradictory.

samstave1y ago

Can you post screens/configs on how setup success?

Or at least state what you configured toward and how?

suprjami1y ago

The documentation gives a quick start and many examples of integration with OpenAI projects like a chatbot. That's all I did.

smcleod1y ago

Weird they're comparing it to really old deepseek v1 models, even v2 has been out a long time now.

butterfly420691y ago

My exact thoughts, especially because DeepseekV2 is meant to be a massive improvement.

It seems to be an emerging trend people should look out for that model release sheets often contain comparisons with out of date models and don't inform so much as just try to make the model look "best."

It's an annoying trend. Untrustworthy metrics betray untrustworthy morals.

bubblyworld1y ago

My barely-informed guess is that they don't have the resources to run it (it's a 200b+ model).

regularfry1y ago

They could compare to DeepSeek-Coder-V2-Lite-Instruct. That's a 16B model, and it comes out at 24.3 on LiveCodeBench. Given the size delta they're respectably close - they're only just behind at 23.4. The full V2 is way ahead.

smcleod1y ago

That’s for the larger model, most people running it locally use the -lite model (both of which has lots of benchmarks published)

kleiba1y ago

What is the recommended hardware to run a model like that locally on a desktop PC?

tadasv1y ago

you can easily run 8b yi coder on 4090 rtx. Probably could do on a smaller gpu (16GB). I have 24gb, and run it through ollama.

NKosmatos1y ago

It would be good if LLMs were somehow packaged in an easy way/format for us "novice" (ok I mean lazy) users to try them out.

I'm not interested so much with the response time (anyone has a couple of spare A100s?), but it would be good to be able to try out different LLMs locally.

dizhn1y ago

I understand your situation. It sounds super simple to me now but I remember having to spend at least a week trying to get the concepts and figuring out what prerequisite knowledge I would need between a continium of just using chatgpt and learning relevant vector math etc. It is much closer to the chatgpt side fortunately. I don't like ollama per se (because i can't reuse its models with other frontends due to it compressing them in its own format) but it's still a very good place to start. Any interface that lets you download models as gguf from huggingface will do just fine. Don't be turned off by the roleplaying/waifu sounding frontend names. They are all fine. This is what I mostly prefer: https://github.com/oobabooga/text-generation-webui

PhilippGille1y ago

With Mozilla's llamafile you can run LLMs locally without installing anything: https://github.com/Mozilla-Ocho/llamafile

senko1y ago

LM Studio is pretty good: https://lmstudio.ai/

suprjami1y ago

One Docker command if you don't mind waiting minutes for CPU-bound replies:

You can also use several GPU options, but they are not as easy to get working.

hosteur1y ago

You should try GPT4all. It seems to be exactly what you’re asking for.

nusl1y ago

This is already possible. There are various tools online you can find and use.

gloosx1y ago

Can someone explain these Aider benchmarks to me? They pass same 113 tests through llm every time. Why they then extrapolate ability of llm to pass these 113 basic python challenges to the general ability to produce/edit code? For me it sounds like this or that model is 70% accurate in solving same hundred python training tasks, but why does it mean that it's good at other languages and arbitrary, private tasks as well? Does anyone ever tried to change them test cases or wiggle conditions a bit to see if it will still hit 70%?

tarruda1y ago

It seems this is the problem with most benchmarks, which is why benchmark performance doesn't mean much these days.

smokel1y ago

Does anyone know why the sizes of these models are typically expressed in number of weights (i.e 1.5B and 9B in this case), without mentioning the weight size in bytes?

For practical reasons, I often like to know how much GPU RAM is required to run these models locally. The actual number of weights seems to only express some kind of relative power, which I doubt is relevant to most users.

Edit: reformulated to sound like a genuine question instead of a complaint.

magnat1y ago

Because you can quantize a model e.g. from original 16 bits down to 5 bits per weight to fit your available memory constraints.

GaggiX1y ago

The weight size depends on the accuracy you are running the model at, you usually do not run a model at fp16 as it would be wasteful.

tarruda1y ago

Since most LLMs are released as FP16, just the number of parameters is enough to know the total required GPU RAM.

nathan_tarbert1y ago

This sounds really cool! I found this Reddit discussion... https://www.reddit.com/r/ArtificialInteligence/comments/1f9m...

Tepix1y ago

Sounds very promising!

I hope that Yi-Coder 9B FP16 and Q8 will be available soon for Ollama, right now i only see the 4bit quantized 9B model.

I'm assuming that these models will be quite a bit better than the 4bit model.

tmikaeld1y ago

Click on "View more" in the dropdown on their page, it has many many quantized versions to choose from.

Tepix1y ago

Found it, thanks

patrick-fitz1y ago

I'd be interested to see how it performs on https://www.swebench.com/

Using SWE-agent + Yi-Coder-9B-Chat.

cassianoleal1y ago

Is there an LLM that's useful for Terraform? Something that understands HCL and has been trained on the providers, I imagine.

lasermike0261y ago

Try this, https://ollama.com/jeffrymilan/aiac

bloopernova1y ago

Copilot writes terraform just fine, including providers.

cassianoleal1y ago

Thanks. I should have specified, LLMs that can be run locally is what interests me.

Havoc1y ago

Beats deepseek 33. That’s impressive

tuukkah1y ago

They used DeepSeek-Coder-33B-Instruct in comparisons, while DeepSeek-Coder-v2-Instruct (236B) and -Lite-Instruct (16B) are available since a while: https://github.com/deepseek-ai/DeepSeek-Coder-v2

EDIT: Granted, Yi-Coder 9B is still smaller than any of these.

lasermike0261y ago

First look seem good. I'll keep hacking with it.

ziofill1y ago

Are coding LLMs trained with the help of interpreters?

willvarfar1y ago

Google's Gemini does.

I can't find a post that I remember Google published just after all the ChatGPT SQL generation hype happened, but it felt like they were trying to counter that hype by explaining that most complex LLM-generated code snippets won't actually run or work, and that they were putting a code-evaluation step after the LLM for Bard.

(A bit like why did they never put an old fashioned rules-based grammar checker check stage in google translate results?)

Fast forward to today and it seems it's a normal step for Gemini etc https://ai.google.dev/gemini-api/docs/code-execution?lang=py...

[1] https://aider.chat/docs/leaderboards/

That's interesting! Where it says that is will "learn iteratively from the results until it arrives at a final output" I assume it's therefore trying multiple LLM generations until it finds one that works, which I didn't know about before.

However, AFAIK it's only ever at inference time, an interpreter isn't included during LLM training? I wonder if it would be possible to fine tune a model for coding with an interpreter. Though if noone has done it yet there is presumably a good reason why not.

littlestymaar1y ago

> Though if noone has done it yet there is presumably a good reason why not.

The field is vast, moving quickly and there are more directions to explore than researchers working at top AI labs. There's lots of open doors that haven't been explored yet but that doesn't mean it's not worth it, it's just not done yet.

zeroq1y ago

Everytime someone tells how AI 10x his programming capabilities I'm like "tell me you're bad at coding without telling me".

jodrellblank1y ago

Everytime someone posts a comment that is just "I'm better than other people", I'm like "what a waste of time reading that was".

coolspot1y ago

It allows me to move much faster, because I can write a comment describing something more high-level and get plausible code from it to review & correct.

j / k navigate · click thread line to collapse

115 comments

mythz1y ago

dsp_person1y ago

The following quotes from a reddit comment here https://www.reddit.com/r/LocalLLaMA/comments/1dkgjqg/comment...

> under How We Use Your Information (in the Privacy Policy): """ Carry out data analysis, research and investigations, and test the Services to ensure its stability and security; """

mythz1y ago

[1] https://ollama.com/library/deepseek-coder-v2

[2] https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-In...

[3] https://x.com/awnihannun/status/1814045712512090281

bilekas1y ago

Is that even legal in regards to EU users ?

onli1y ago

Of course not.

yumraj1y ago

There’s no company info on DeepSeek’s website. Looking at the above, and considering that, it seems very sketchy indeed.

Maybe OK for trying out stuff, a big no no for real work.

https://aider.chat/docs/leaderboards/

The names of their researchers are on this recent paper: https://arxiv.org/pdf/2408.15664

Their terms of service say "The DeepSeek Open Platform is jointly owned and operated by Hangzhou DeepSeek Artificial Intelligence Co., Ltd., Beijing DeepSeek Artificial Intelligence Co., Ltd. "

And they're funded by https://www.high-flyer.cn/en/fund/ which the FT did an article on: https://www.ft.com/content/357f3c68-b866-4c2e-b678-0d075051a...

In terms of the personal data you share when using their models, I can't see why they would be any more or less nefarious than big Western tech companies.

rfoo1y ago

> There’s no company info on DeepSeek’s website.

It's backed solely by a hedge fund who do not want to draw attention to their business. So yeah, as sketchy as DESRES.

dotancohen1y ago

Might be good for contributing to open source projects. But not for clients' projects.

stavros1y ago

43x cheaper is good, but my time is also worth money, and it unfortunately doesn't bode well for me that it's stumped by the first problem I throw at it.

jadbox1y ago

What's the better plug-in among Continue Dev, Aider and Claude Dev?

phren0logy1y ago

FloatArtifact1y ago

What kind of hardware is required for the local llm for the continue stack?

jodrellblank1y ago

Uh, what price format is "14 million cents per 28 million cents"?

sabbaticaldev1y ago

14 cents per million input/28 cents per million output tokens

jodrellblank1y ago

Oh right, thanks.

anotherpaulg1y ago

Yi-Coder scored below GPT-3.5 on aider's code editing benchmark. GitHub user cheahjs recently submitted the results for the 9b model and a q4_0 version.

Yi-Coder results, with Sonnet and GPT-3.5 for scale:

  77% Sonnet
  58% GPT-3.5
  54% Yi-Coder-9b-Chat
  45% Yi-Coder-9b-Chat-q4_0

Full leaderboard:

Palmik1y ago

I wonder what's the reason.

suprjami1y ago

Because code either works or it doesn't. Nobody is replacing our entire income stream with an LLM.

You also need a knowledge of code to instruct an LLM to generate decent code, and even then it's not always perfect.

Meanwhile plenty of people are using free/cheap image generation and going "good enough". Now they don't need to pay a graphic artist or a stock photo licence

Any layperson can describe what they want a picture to look like so the barrier to entry and successful exit is a lot lower for LLM image generation than for LLM code generation.

jodrellblank1y ago

> "Meanwhile plenty of people are using free/cheap image generation and going "good enough". Now they don't need to pay a graphic artist or a stock photo licence"

and getting sandwich photos of ham blending into human fingers:

https://www.reddit.com/r/Wellthatsucks/comments/1f8bvb8/my_l...

saurik1y ago

datavirtue1y ago

suprjami1y ago

aithrowaway19871y ago

rahimnathwani1y ago

What you say may be true about the simplest workflow: enter a prompt and get one or more finished images.

eropple1y ago

datavirtue1y ago

": they are marketed towards (and created by) laypeople with who want generic content and don't care about human artists"

She and all of her colleagues all despise commissioned work and they get a constant stream of them. I always tell her to refuse them. Some pay very well.

If you are creating generic "art" for corporations I have little more than a shrug for your anxiety over AI.

tcdent1y ago

It's just gatekeeping.

rty321y ago

If they can take a Jira ticket, debug the code, create a patch for a large codebase and understand and respect all the workarounds in a legacy codebase, I would have a problem with it.

xvector1y ago

Except they can't do the equivalent for art yet either, and I am fairly familiar with the state of image diffusion today.

The diffusers do not understand what they are generating, so they just generates what "looks right." Often this results in art that looks pretty but has no deeper logic, world building, meaning, etc.

So - I rely on artists for art I care about (art I will hang on my walls), and image generators for throwaway work (such as weekly D&D campaign images.)

rty321y ago

Of course the "art" art -- the part that is all about human creativity -- will always be there.

viraptor1y ago

Have you seen https://www.swebench.com/ ?

IshKebab1y ago

> We're already in the "resolving JIRA tickets" area

For very simple tasks maybe, but not for the kinds of things I get paid to do.

rty321y ago

Also, good luck doing that on our poorly written, poorly documented and under tested codebase. By any standard django is a much better codebase than the one I work on every day.

https://github.com/01-ai/Yi-Coder

mrklol1y ago

denizener1y ago

I love art and code, IMO is because Cursor is really good and AI art is not that good.

I say this as someone who thought we reached the art singularity in December 2022. I have no philosophical or moral problem with AI art. It just kind of sucks.

Cursor/Sonnet on the other hand just blew my mind earlier today.

GaggiX1y ago

There are really good models for AI art, if people care. I think that AI is better at making an image from start to finish than making some software from start to finish.

And I use Claude 3.5 Sonnet myself.

datavirtue1y ago

AI art is an oxymoron. It will never give me chills or make me cry.

crimsoneer1y ago

I mean, it's supply and demand right.

- There is a big demand for really complex software development, and an LLM can't do that alone. So software devs have to do lots of busywork, and like the opportunity to be augmented by AI

viraptor1y ago

theshrike791y ago

> Continue pretrained on 2.4 Trillion high-quality tokens over 52 major programming languages.

rfoo1y ago

Of course, this is far from trivial, you don't just add more data and expect it to automatically be better for everything. So is time management for us mere mortals.

deely31y ago

> usually have better performance on your specific langauge when they are not limited to learn or over-sample one single language.

Source? Im very curious how learning one language helps model to generate code in language with different paradigms. Java, Markdown, JSON, HTML, Fortran?

cztomsik1y ago

I think around the BLOOM models (2022) it was found out that if you train english-only, the model performs worse than if you have even little mixture of other languages.

This by itself is not a proof for your specific question but you can extrapolate.

imjonse1y ago

sitkack1y ago

The models benefit immensely from being trained with more data from other languages, even if you only ever use it in one.

You could finetune it on your codebases and specific docs for added perf.

rty321y ago

richardw1y ago

mark_l_watson1y ago

I get your point. Models that support many dozens of human languages seem not what I personally need because I only speak English.

karagenit1y ago

terminalcommand1y ago

Isn't microsoft phi specifically trained for Python? I recall that Phi 1 was advertised as a Python coding helper.

It's a small model trained only by quality sources (ie textbooks).

wiz21c1y ago

If the LLM training makes the LLM generalize things between languages, then it is better to leave it like it is...

kamphey1y ago

I wonder what those 52 languages are.

richardw1y ago

Y_Y1y ago

They're playing a dangerous game if they assume that a single language or even family of similar languages is referred to by e.g. "assembly", "shell", "lisp".

(I also note that several of these are markup or config languages which are explicitly not for programming.)

JediPig1y ago

I tested this out on my workload ( SRE/Devops/C#/Golang/C++ ). it started responding about non-sense on a simple write me boto python script that changes x ,y,z value.

Then I tried other questions in my past to compare... However, I believe the engineer who did the LLM, just used the questions in benchmarks.

One instance after a hour of use ( I stopped then ) it answered one question with 4 different programming languages, and answers that was no way related to the question.

tmikaeld1y ago

I have the same experience, hallucinates and rambles on and on about "solutions" that are not related.

Unfortunately, this has always been my experience with all open source code models that can be self-hosted.

Gracana1y ago

It sounds like you are trying to chat with the base model when you should be using a chat model.

tmikaeld1y ago

No, I’m using 9b-chat-q8_0 on a 4090

tarruda1y ago

Have you ran the model in full FP16? It is possible a lot of performance is lost when running quantized versions.

mtrovo1y ago

I'm new to this whole area and feeling a bit lost. How are people setting up these small LLMs like Yi-Coder locally for tab completion? Does it work natively on VSCode?

Also for the cloud models apart from GitHub Copilot, what tools or steps are you all using to get them working on your projects? Any tips or resources would be super helpful!

cassianoleal1y ago

You can run this LLM on Ollama [0] and then use Continue [1] on VS Code.

The setup is pretty simple:

* Install Ollama (instructions for your OS on their website - for macOS, `brew install ollama`)

* Download the model: `ollama pull yi-coder`

* Install and configure Continue on VS Code (https://docs.continue.dev/walkthroughs/llama3.1 <- this is for Llama 3.1 but it should work by replacing the relevant bits)

[0] https://ollama.com/

[1] https://www.continue.dev/

suprjami1y ago

If you have a project which supports OpenAI API keys, you can point it at a LocalAI instance:

This is easy to get "working" but difficult to configure for specific tasks due to docs being lacking or contradictory.

samstave1y ago

Can you post screens/configs on how setup success?

Or at least state what you configured toward and how?

suprjami1y ago

The documentation gives a quick start and many examples of integration with OpenAI projects like a chatbot. That's all I did.

smcleod1y ago

Weird they're comparing it to really old deepseek v1 models, even v2 has been out a long time now.

butterfly420691y ago

My exact thoughts, especially because DeepseekV2 is meant to be a massive improvement.

It's an annoying trend. Untrustworthy metrics betray untrustworthy morals.

bubblyworld1y ago

My barely-informed guess is that they don't have the resources to run it (it's a 200b+ model).

regularfry1y ago

smcleod1y ago

That’s for the larger model, most people running it locally use the -lite model (both of which has lots of benchmarks published)

kleiba1y ago

What is the recommended hardware to run a model like that locally on a desktop PC?

tadasv1y ago

you can easily run 8b yi coder on 4090 rtx. Probably could do on a smaller gpu (16GB). I have 24gb, and run it through ollama.

NKosmatos1y ago

It would be good if LLMs were somehow packaged in an easy way/format for us "novice" (ok I mean lazy) users to try them out.

I'm not interested so much with the response time (anyone has a couple of spare A100s?), but it would be good to be able to try out different LLMs locally.

dizhn1y ago

PhilippGille1y ago

With Mozilla's llamafile you can run LLMs locally without installing anything: https://github.com/Mozilla-Ocho/llamafile

senko1y ago

LM Studio is pretty good: https://lmstudio.ai/

suprjami1y ago

One Docker command if you don't mind waiting minutes for CPU-bound replies:

You can also use several GPU options, but they are not as easy to get working.

hosteur1y ago

You should try GPT4all. It seems to be exactly what you’re asking for.

nusl1y ago

This is already possible. There are various tools online you can find and use.

gloosx1y ago

tarruda1y ago

It seems this is the problem with most benchmarks, which is why benchmark performance doesn't mean much these days.

smokel1y ago

Does anyone know why the sizes of these models are typically expressed in number of weights (i.e 1.5B and 9B in this case), without mentioning the weight size in bytes?

Edit: reformulated to sound like a genuine question instead of a complaint.

magnat1y ago

Because you can quantize a model e.g. from original 16 bits down to 5 bits per weight to fit your available memory constraints.

GaggiX1y ago

The weight size depends on the accuracy you are running the model at, you usually do not run a model at fp16 as it would be wasteful.

tarruda1y ago

Since most LLMs are released as FP16, just the number of parameters is enough to know the total required GPU RAM.

nathan_tarbert1y ago

This sounds really cool! I found this Reddit discussion... https://www.reddit.com/r/ArtificialInteligence/comments/1f9m...

Tepix1y ago

Sounds very promising!

I hope that Yi-Coder 9B FP16 and Q8 will be available soon for Ollama, right now i only see the 4bit quantized 9B model.

I'm assuming that these models will be quite a bit better than the 4bit model.

tmikaeld1y ago

Click on "View more" in the dropdown on their page, it has many many quantized versions to choose from.

Tepix1y ago

Found it, thanks

patrick-fitz1y ago

I'd be interested to see how it performs on https://www.swebench.com/

Using SWE-agent + Yi-Coder-9B-Chat.

cassianoleal1y ago

Is there an LLM that's useful for Terraform? Something that understands HCL and has been trained on the providers, I imagine.

lasermike0261y ago

Try this, https://ollama.com/jeffrymilan/aiac

bloopernova1y ago

Copilot writes terraform just fine, including providers.

cassianoleal1y ago

Thanks. I should have specified, LLMs that can be run locally is what interests me.

Havoc1y ago

Beats deepseek 33. That’s impressive

tuukkah1y ago

They used DeepSeek-Coder-33B-Instruct in comparisons, while DeepSeek-Coder-v2-Instruct (236B) and -Lite-Instruct (16B) are available since a while: https://github.com/deepseek-ai/DeepSeek-Coder-v2

EDIT: Granted, Yi-Coder 9B is still smaller than any of these.

lasermike0261y ago

First look seem good. I'll keep hacking with it.

ziofill1y ago

Are coding LLMs trained with the help of interpreters?

willvarfar1y ago

Google's Gemini does.

(A bit like why did they never put an old fashioned rules-based grammar checker check stage in google translate results?)

Fast forward to today and it seems it's a normal step for Gemini etc https://ai.google.dev/gemini-api/docs/code-execution?lang=py...