Exposed DeepSeek database leaking sensitive information, including chat history (opens in new tab)

(wiz.io)

707 pointstalhof81y ago474 comments

474 comments

This is probably an incredibly stupid, off-topic question, but why are their database schemas and logs in English?

Like, when a DeepSeek dev uses these systems as intended, would they also be seeing the columns, keys, etc. in English? Is there usually a translation step involved? Or do devs around the world just have to bite the bullet and learn enough English to be able to use the majority of tools?

I'm realizing now that I'm very ignorant when it comes to non English-based software engineering.

david-gpu1y ago

> Or do devs around the world just have to bite the bullet and learn enough English to be able to use the majority of tools?

That is precisely what happens. It is not unusual for code and databases to be written in English, even when the developers are from a non-English speaking country. Think about it: the toolchain, programming language and libraries are all based on English anyway.

londons_explore1y ago

Interestingly, in the world of electronics this used to be true too. The first Diode on a circuit board would be marked "D1", no matter which country produced it. Datasheets for components would be in english. Any text on a circuit board would be in english (ie. "Voltage Select Switch" or "Copyright 2025".).

However, a few years back it became common for most datasheets to be available in mandarin and english, and this year most PCB fabrication houses have gained support for putting chinese characters onto a circuit board (requires better quality printing, due to more definition needed for legibility).

Now there are a decent number of devices where the only documentation is only available in mandarin, and the design process was clearly done with little or no english involved.

Not everything changes though - gold plating thickness is measured by the micro-inch. Components often still use 0.1 inch pin spacing. Model numbers of chinese chips often are closely linked to the western chip they replace, the names of registers (in the cpu register sense) are often still english etc.

3 more replies

miki1232111y ago

In my experience, you usually get English variable names / db schemas, localized chats and tickets, with internal docs, log messages and comments being a mixed bag.

For some kinds of software, localized names make a lot more sense, e.g. when you're dealing with very subtle distinctions between legal terms that don't have direct English equivalents.

bryanrasmussen1y ago

I have worked in a couple places where some of the code was not in English, and it was incredibly annoying, like an affectation.

1 more reply

edudobay1y ago

Considering Brazil and the Spanish-speaking people whom I've worked with, it's common for English coding to be the norm for the company/project, but many people are far from being proficient in English, so we end up with funny names that are often confusing or nonsense - I've seen an "evaluation service" that is actually a "rating service" (both could translate to the same in Portuguese). They often translate to false cognates too.

There are some business concepts that are very unique to a place (country-specific or even company-specific) with no precise translation to the English-speaking world, and so I sometimes prefer to keep them in their native language.

impulsivepuppet1y ago

It might seem less credible to encounter English in a place where it’s less expected, but think of it this way: would a Yandex-developed ClickHouse database be adopted by Chinese devs if everything in it were written in Russian?

There is some merit in asking your question, for there’s an unspoken rule (and a source of endless frustration) that business-/domain-related terms should remain in the language of their origin. Otherwise, (real-life story) "Leitungsauskunft" could end up being translated as "line information" or even "channel interface" ("pipeline inquiry" should be correct, it's a type of document you can procure from the [German] government).

Ironically, I’m currently working in an environment where we decided to translate such terms, and it hasn’t helped with understanding of the business logic at all. Furthermore, it adds an element of surprise and a topic for debate whenever somebody comes up with a "more accurate translation".

So if anything, English is a sign of a battle-hardened developer, until they try to convert proper names.

denysvitali1y ago

In the wild I've seen a company returning a JSON key "ankunftTime" in one of their APIs

3 more replies

rcruzeiro1y ago

Someone who worked on a non-English environment years ago here: sometimes you do use the local language in some contexts, but, more often than not, you end up using English for the majority of stuff since it's a bit off-putting to mix another language with the English of programming languages and APIs.

heelix1y ago

Our US company sent me to France to help out with an implementation. The guy I worked with spoke very little English and my French is terrible. Both of us had done Latin, however - so the comments were hilarious as we used that as our common link. One of those projects I'd expect to show on the daily WTF at some point.

I did try my hand at a translation tool, as it was all i18n up proper. Watched one guy blow coffee through his nose when I demo'ed - and the 'BACK' navigation was the French word for a persons back or something like that.

2 more replies

icepat1y ago

Yep, myself as well. I've heard non-English programmers who've worked with non-English codebases call them "very weird".

2 more replies

sghiassy1y ago

Dumb question, but it would then seem that you have to know English to program??

9 more replies

bri3d1y ago

Almost all software engineers learn a passing amount of English - truly localized programming environments are quite esoteric and not really available for most mainstream use cases I can think of.

Depending on the company culture and policy, the most common thing to see is a mix of English variable and function names with native-language comments. Occasionally you will see native-language variable and function names. This is much more common in Latin character set languages (especially among Spanish and Portuguese speakers) in my experience; almost all Chinese code seems to use approximately-English variable and function names.

buu7001y ago

I've also seen a codebase with a mix of English and Portuguese variable/function names and comments. In that particular case, the Portuguese variable/function names were basically treated as technical debt, with a gradual ongoing transition to consistent English naming.

0xcde4c3db1y ago

> Or do devs around the world just have to bite the bullet and learn enough English to be able to use the majority of tools?

I'm a native English speaker, but from looking at various code bases written by people who aren't, I gather that it's basically this. It wasn't too long ago that one couldn't even reliably feed non-ASCII comments to a lot of compilers, let alone variable and function names.

lukan1y ago

"Or do devs around the world just have to bite the bullet and learn enough English to be able to use the majority of tools?"

Yes, that's what we did and do.

Depending on the project, I do use german variable names and comments at times, but stopped using all special characters like öüäß, they mess things up, despite in theory should just work fine.

Nowdays even chrome dev tools come in german, but experience shows, translated programming tools (or any software really) usually just have the UI a bit translated. But any errors you encounter or any advanced stuff will be in english anyway. And if you google issues of your translated UI, you won't find much, so better just use the original version.

So english it is.

(And it is the lingua franca in most parts of the world anyway)

maeil1y ago

Your country's biggest SW company is SAP, world infamous for their German column names, haha. Pretty sure it's the most widely used product in the world with non-English internals that people actually interact with - I'm sure there's some Realtek firmware with billions of installs that's in Chinese but barely anyone has to look at that.

nemoniac1y ago

Not only that, DeepSeek "thinks" in English!

When I interact with it by asking it a question in Spanish, the parts between the <think> ... </think> are in English before it goes on to answer in Spanish.

Give it a try in your favourite language.

I went on to ask it if it "thinks" in English, Spanish or Chinese but it just gives the pat answer that, being an LLM, it doesn't think in any language.

chromanoid1y ago

I assume that there is a prompt that asks the LLM to generate its thoughts. This prompt is probably in English.

dreilide1y ago

interestingly that hasn't been my experience. did you use their web interface or the API?

https://ibb.co/chYPXNDw

victorbjorklund1y ago

I'm from Sweden (okay not same thing as China due to english being more common here) but I always code in english. Even if it is a script just for myself I will use english for variable names etc

2mlWQbCK1y ago

I do that as well and also in almost all my personal documents on most (but not all) topics. All the books and most online forums I read are in English. I rather have documents uniformly in Swedish English (en-SE?) than some Swenglish mess of Swedish mixed with English words.

It also helps on the rare occasions some random notes evolve into a proper project that will have to be in English eventually anyway. There is no need for an extra translation step between initial idea and final product. All my vague hobby gamedev ideas are in English for instance.

sedatk1y ago

As a Turkish developer, I can say that all developers learn at least some English in order to be able to grasp documentation and also programming languages since syntactic elements are in English too.

That said, many developers might still prefer Turkish for naming DB tables, fields, variables, types and so forth if that’s the preference of the team. It wouldn’t be an exceptional situation. It’s quite easy too since Turkish also uses a Latin alphabet. May not be as easy or preferable in Chinese.

sakras1y ago

From what I’ve seen, code usually comes in one of two languages: English or French. Somehow everyone but the French speaks enough English to write code!

Nab4431y ago

Please come in Belgium, there are places where you can see code in Dutch, French and Englis within the same file. I suppose you even should be able to find code with some additional German in it..

sharpy1y ago

Worked for a french company once. The code was in English, but the comments were in French. I guess this happens, because all the language keywords are English, so it might be strange to mix and match langauges there. But comments were fair game.

1 more reply

0x4571y ago

Unless they're using a programming language that isn't English-based (for example, Russian 1С system uses Russian keywords and the whole codebase is usually in Russian), then most of the code stays English.

This way, you don't have to change keyboard layout while writing code.

Anyway, you're forced to learn some English when doing any real software development.

kdmtctl1y ago

Anecdotally, a lot of 1C developers are not proficient in anything else because they don't need English in the main field, platform docs included, and can only get scarce translated versions of anything else. And blogs in Russian, which are not plenty and not always correct.

This makes some of their infra work and common misconceptions a little bit ... esoteric. So, English is crucial not just to do the job but to get best practices and CS info in general. It really helps a lot.

amonith1y ago

I've been doing SWE for 10+ years in Poland and I encountered non-English language in code precisely once - in a German project, lol. Some guys do leave Polish comments here and there, or in commit messages or in other docs/jira tickets/whatever - but in db schema, variables, properties, methods etc? Never, ever. English is 100% a requirement for every developer job offer I've ever encountered in Poland. Not necessarily a very high level for programmers (if you don't speak directly with the client), but you wouldn't get an offer at all if you're very far below B1.

I mean we're kind of an outsourcing hub so it makes sense. Even some of our companies outsource further to the east so you really can't avoid it.

Lanolderen1y ago

In Germany I've seen both. Whenever possible I push for having everything in English. Code comments, general documentation, databases, etc simply because the german developers know English but the non-german developers sometimes don't know German. It also puts everyone at roughly the same language level since we don't have many English natives.

PS: I remember quite a while back when Wargaming's World of Tanks became a big thing they had to translate everything from Russian to English because they wanted to get foreign developers involved as well. Never heard of the reverse happening.

pllbnk1y ago

I am European, however I have worked with developers from various parts of Asia and South America. English is usually a second language, however most developers are fairly fluent using it as a spoken or written language. Also, most development resources are written in English, so all developers know how to read it. Programming languages and their standard libraries are also written in English. It's the lingua franca worldwide, so we are all happy to use it in the technical context.

ceejayoz1y ago

The languages and frameworks and documentation are often in English. The code has a good chance of also being in English as a result.

See also: aviation.

jmorenoamor1y ago

I write code in english and user (andmin, ops, app user) messages in the appropiate language.

As programming languages keywords and APIs are written in english, it just looks better to keep it that way for identifiers and internal doc, the other way causes a dissonance for me which feels unconfortable.

colordrops1y ago

I worked at a Chinese company for a while and they used Chinese in meetings but English in the code base.

1 more reply

formerly_proven1y ago

EU - while exceptions exist, my experience generally has been that devs working in English are virtually always much better devs than their peers working in the native language of the land. Likewise, most business projects I've worked on were entirely English on the inside, even when the UI was e.g. german-only. I've also seen a few projects where the business domain is so thoroughly native-tongue (typically when the business domain is a projection of the local bureaucracy) that you couldn't name business entities in English if you tried. Those can end up with a somewhat weird hodgepodge, where the code and comments and such are still English, just the names of the entities aren't.

senko1y ago

> Or do devs around the world just have to bite the bullet and learn enough English to be able to use the majority of tools?

Not only that. All of the code I (not a native English speaker) write, even if only I will ever see it, is in English - comments too. And I'm pretty confident all my colleagues do that too.

Might be different for languages with large population of native speakers (Croatian is just a few mil so we're more exposed to it), but you still can't avoid using English for tools / libs / docs / research papers / stack overflow...

Bayart1y ago

> Or do devs around the world just have to bite the bullet and learn enough English to be able to use the majority of tools?

That's how it goes, at least around Europe. People know English as a technical jargon (similar to legal French and Latin in English) and can juggle enough to get around documentation, but I've been in companies where I was the only fluent English speaker (and we're talking startup stuff). That gave ma a bunch of cool opportunities though, being pulled in every other meeting as the designated translator.

yonatan80701y ago

I'm a native Hebrew speaker, I wouldn't even think to put Hebrew into my code, similar to how I won't use emoji or other non-ascii characters, except Hebrew in particular is even worse since it's RTL, and mixing it into LTR code would be a pain in pretty much every text editor.

I do occaisonally find code with variable names in other languages, but it's very rare, for the most part if you want to code, English is the way.

I've also seen a few devs who used Hebrew variable names but spelled in English (`shalom` instead of שלום).

vjk8001y ago

There basically isn't non English software engineering.

English is the universal language in programming and software engineering, much like Latin was the universal scholarly language in the past. Sometimes even to the extent that the language starts leaking from the code and technical documents, reports, etc. are being written in English, often just because the people working close to the software are more familiar with the terminology in English than in their native language.

formerly_proven1y ago

Curiously that wasn't always the case, if you bought a compiler and IDE in the 90s or 2000s from Microsoft or a few others, you'd get an environment that's fully translated to the local language. Granted, those translations frequently made almost no sense at all, but the words were all decidedly Not-English. You could also go out and buy translated books and references.

Even when you install e.g. Debian today and select Not-English as the system language, you might be surprised to see that GCC actually has i18n'd error messages, at least for some languages. Same for coreutils. I doubt anyone uses that intentionally, and they're probably not very up to date, but it does exist... kinda.

cratermoon1y ago

The Soviet Union is the only country I know that programmed extensively in non-English languages. The Soviets had a Russian-language implementation of ALGOL 68. They also, as best I understand, still use a Russian version of a language called 1C.

https://en.wikipedia.org/wiki/Non-English-based_programming_...

Etherlord871y ago

I remember when Adobe Flash Player would report bugs in Polish language, because my Windows was Polish. Googling the bug message was problematic, because most discussion is done in the international, English language. So the next time I was installing Windows, I made sure to choose English as the language. The same goes for browsers and pretty much everything else.

markus_zhang1y ago

It's very uncomfortable to switch languages during development. Think how often you would need to switch languages if you use Chinese for column names and such. English has been the second language for Chinese for the past 30 or so years. I started to learn Enginse from Grade 4, and nowadays they started in kindergarten.

However, I suspect it's a honey pot.

SZJX1y ago

This is precisely how English is the lingua franca of developers around the world, and a lot of (not all, of course) companies in e.g. Germany or Japan hire English-speaking programmers.

likeabatterycar1y ago

Most Chinese open source code I've seen is written in English, with English variable names, but comments in Chinese Unicode glyphs (in between all the buffer overflows and other general carelessness).

Don't forget Shenzhen is a stone's throw away from Hong Kong where English is widely spoken.

scheme2711y ago

That's pretty much it. Even stuff developed in other countries tends to be in English. For example, Lua was created in Brazil but it's primarily in English. Or Ruby, it was created by a japanese dev but I don't think it really supported japanese for a while.

karmasimida1y ago

Because DeepSeek researchers are Elite, English is like very very easy and common for top Chinese students. They just use it, and feel nothing wrong about it.

krust1y ago

>Or do devs around the world just have to bite the bullet and learn enough English to be able to use the majority of tools?

Yes, coding in english is the standard.

ghfhghg1y ago

That's my experience working in Asia. All the comments were in Japanese though

csomar1y ago

Thanks god everyone accepted that otherwise the fragmentation will be insane.

dailykoder1y ago

I write all of my code in english. Even if it's just for me. I am a native german speaker.

It just makes things A LOT easier in terms of debugging, researching, reading examples from documentation, etc etc. I don't even understand my (boomer) colleagues who straight up refuse to learn english and get angry when they can't find solutions with german search input

TacticalCoder1y ago

> Or do devs around the world just have to bite the bullet and learn enough English to be able to use the majority of tools?

That.

There's also a huge mental switching context cost when you try to have code mixing, say, french and english together:

    size_t taille;
    site_t taille_domaine;

vs:

    size_t length;
    size_t domain_length;

Hardly anyone does the former. It's simply not a thing. I mean: sure, there are the odd projects that'll be exceptions.

But we pretty much all name our functions/methods/variables/etc. and write our comments in english.

FWIW when I code I actually both think and count in english.

galnagli1y ago

Thank you everyone, this was responsibly disclosed to DeepSeek and published after the issue was remediated, we got acknowledgment from their team today on our contribution.

leftcenterright1y ago

were these "dev" domains holding real production data? the blog post does not clear it for me.

caust1c1y ago

Interesting to note:

- Dev infra, observability database (open telemetry spans)

- Logs of course contain chat data, because that's what happens with logging inevitably

The startling rocket building prompt screenshot that was shared is meant to be shocking of course, but most probably was training data to prevent deepseek from completing such prompts, evidenced by the `"finish_reason":"stop"` included in the span attributes.

Still pretty bad obviously and could have easily led to further compromise but I'm guessing Wiz wanted to ride the current media wave with this post instead of seeing how far they could take it. Glad to see it was disclosed and patched quickly.

pedrovhb1y ago

> but most probably was training data to prevent deepseek from completing such prompts, evidenced by the `"finish_reason":"stop"` included in the span attributes

As I understand, the finish reason being “stop” in API responses usually means the AI ended the output normally. In any case, I don't see how training data could end up in production logs, nor why they'd want to prevent such data (a prompt you'd expect to see a normal user to write) from being responded to.

> [...] I'm guessing Wiz wanted to ride the current media wave with this post instead of seeing how far they could take it.

Security researchers are often asked to not pursue findings further than confirming their existence. It can be unhelpful or mess things up accidentally. Since these researchers probably weren't invited to deeply test their systems, I think it's the polite way to go about it.

This mistake was totally amateur hour by DeepSeek, though. I'm not too into security stuff but if I were looking for something, the first thing I'd think to do is nmap the servers and see what's up with any interesting open ports. Wouldn't be surprised at all if others had found this too.

caust1c1y ago

Seems that you're right! Also, not that I doubted they were using OpenAI, but searching for `"finish_reason"` on the web all point to openai docs. Personally, I wouldn't say it's a very common attribute to see in logs generally.

https://platform.openai.com/docs/api-reference/introduction

Right there in the docs:

> Now that you've generated your first chat completion, let's break down the response object. We can see the finish_reason is stop which means the API returned the full chat completion generated by the model without running into any limits.

Regarding how training data ends up in logs, it's not that far fetched to create a trace span to see how long prompts + replies take, and as such it makes sense to record attributes like the finish_reason for observability purposes. However the message being incuded itself is just amateur, but common nonetheless.

1 more reply

danielodievich1y ago

open exposed clickhouse is this decade's open exposed elasticsearch so common in the past

ebfe11y ago

AFAIK, Opensource Elasticsearch does not offer any form of authentication upon installation for many years but ClickHouse does and in fact I'm often surprised at how many authentication mechanisms were introduced over the years and can be easily configured:

- Password authentication (bcrypt, sha256 hashes) - Certificate authentication (Fantastic for server to server communication) - SSH key authentication (Personally, this is my favourite - every database should have this authentication mechanism to make it easy for Dev to work with)

Not very popular but LDAP and Http Authentication Server are also great options.

I also wonder how DeepSeek engineers deployed their ClickHouse instance. When I deployed using yum/apt install, the installation step literally ask you to input a default password.

And if you were to set it up manually with ClickHouse binary, the out-of-the-box config seal the instance from external network access and the default user is only exposed to localhost as explained by Alex here - https://news.ycombinator.com/item?id=42871371#42873446.

pl4nty1y ago

shame they paywalled JWT authn behind their expensive PaaS offering :(

forced us to use an alternative, and paywalling security features in an "open source" product didn't make us feel comfortable for a long-term investment like a db

https://github.com/ClickHouse/ClickHouse/pull/68634#issuecom...

bearjaws1y ago

Which was originally the open exposed mongo server, then mysql/phpmyadmin, then exposed ftp, and then exposed telnet.

hmmm-i-wonder1y ago

We move on and upwards, but never really stop making the same mistakes do we.

astrea1y ago

Shows how old I am. Thought we were still in the "exposed ElasticSearch" era.

kdmtctl1y ago

I was sure this was Elastic, you are not alone.

blitzar1y ago

open exposed S3 bucket is this decade's open exposed S3 bucket so common in the past

mmaunder1y ago

Does DeepSeek have a bug bounty program I'm not aware of with a clearly defined scope? It appears that Wiz took it upon themselves to probe and access DeepSeek's systems without permission and then write about it.

If you do this and the company you're conducting your "research" on hasn't given you permission in some form, you can get yourself in a lot of hot water under the CFAA in the USA and other laws around the world.

Please don't follow this example. Sign up for a bug bounty program or work directly with a company to get permission before you probe and access their systems, and don't exceed the access granted.

soulofmischief1y ago

Your posturing is unwarranted. Literally in the first paragraph:

> The Wiz Research team immediately and responsibly disclosed the issue to DeepSeek, which promptly secured the exposure

archon8101y ago

FWIW, this is Mark Maunder, CEO of Defiant / Wordfence. I wouldn't write him off as some random guy on the internet.

https://www.linkedin.com/in/markmaunder

mmaunder1y ago

Posturing huh? Nice. That was intended to be helpful. Go read the CFAA. What they did is, believe it or not, illegal. I didn't make the law, and many think the CFAA is ridiculous, but that's how it works. If you even access a computer system beyond what you've been granted it's a CFAA violation with stiff penalties.

1 more reply

tevon1y ago

They left open a publicly exposed database... I'm sure they informed the company about this before publishing their post. Why are you blaming Wiz for this?

xinayder1y ago

I agree to your comment, but also there's probably an unspoken gentleman's agreement that DeepSeek fixed the issue and won't pursue legal action against Wiz, since they were helpful and didn't do anything malicious.

I did the same a while ago, an education platform startup had their web server misconfigured, I could clone their repo locally because .git was accessible. I immediately sent them an email from a throwaway account in case they wanted to get me in trouble and informed them about the configuration issues. They thanked me for the warning and suggestions, and even said they could get me a job at their company.

throwaway-bb21y ago

Going throwaway account for this.

Wiz folks are notoriously shady. They cross the line a ton. They did this to Amazon and Microsoft to make a name among other. Super unethical.

Their product isn't terrible but their sales people are just terrible. Completely off-putting. Most of them are idiots from zscaler.

janalsncm1y ago

The CFAA is a US law. Assuming you break it, in order for that to matter, an American prosecutor needs to find time to prosecute you for doing so. Does Deepseek have any American presence at all?

Likewise, there may be Chinese laws were violated. However, outside of China they are a moot point.

ziddoap1y ago

They're publicly accessible URLs.

DeepSeek & users that had data exposed here should be thanking Wiz.

SomeRainIsGood1y ago

lol

SomeRainIsGood1y ago

written like someone who has never litigated even a traffic light

pinoy4201y ago

Yes but they’re chinese so it’s okay /s

They are getting DoS’d by us gov too so they were only trying to help /s

ripped_britches1y ago

Ironic - I bet if you ask deepseek r1 how to set up clickhouse it would tell you the right way to do it.

semking1y ago

Can you imagine executing arbitrary SQL queries via your web browser? :D

Complete database control and potential privilege escalation within the DeepSeek environment without ANY authentication...

NathanKP1y ago

And that's why you run models locally. Or if you want a remote chat model, use something stateless like AWS Bedrock custom model import to avoid having stored chats on the server.

dotancohen1y ago

Not many non-gamers have hardware capable of running such a model locally - never mind the skills.

For most people, bash is not a tool for interacting with the computer, it is how they express their frustration with the computer (sometimes leaving damaged keyboards).

razster1y ago

I have DeepSeek-R1 1.5b running on a Raspberry Pi 5. I have DS-R1 14b Q6 running on my old AM4 Ryzen with a AMD GPU, without issues. My primary workstation is running 32B Q8 and without issues. And it's simple!

2 more replies

loloquwowndueo1y ago

Wow all the gamers with mad LLM skillz.

1 more reply

tonygiorgio1y ago

You could also use models that run on nvidia’s trusted execution environment.

janalsncm1y ago

Nvidia naming it “trusted” doesn’t mean I trust it.

sylware1y ago

The second Big Tech was threatened by significant competition (DeepSeek), this competition is "stealing"(lol), and is under heavy hacking attacks (main online inference portal).

There you have, the real face of Big Tech. Extinguishing the competition by locking a service behind a portal provided for free, then starting to milk the users, is not enough for them... they will also fight dirty, really dirty.

anhldbk1y ago

Good finding. I don't see its timeline usually discussed in other Ethical hacking and responsible disclosures.

Havoc1y ago

Ugh. I know I’ve got at least some keys in those logs. Thankfully nothing too intense

danparsonson1y ago

Hopefully this is a lesson not to trust your sensitive private data with a public service?

sd91y ago

I've been redacting my keys before sending config to chatgpt, it's a pain but I guess this shows it's worth the effort.

Havoc1y ago

Yeah I avoid it too but I know I missed some during rapid copy pasting.

b3ing1y ago

It seems fair since all the other AI's scraped copyrighted information, images, video online and from pirated books, etc. without ever asking anyone first.

mmaunder1y ago

The amount of vitriol in these comments is the really surprising data. I've seen the same on Twitter. I can only put it down to the financial pain DeepSeek inflicted on many US retail investors by wiping almost $700 billion off NVidia's stock price. I think a lot of folks didn't see it coming and it hurt them right where it matters most: In the wallet. The anger out there is very real.

bobxmax1y ago

It's also deeply damaging to the western ego, especially one rooted in American exceptionalism.

But also one those of us actually working on foundational AI saw coming a mile away when most of the top research of late has been happening in Chinese labs, not American or European ones.

Can't wait to see what this boneheaded President's tarrif on TSMC does to this situation.

hsuduebc21y ago

Well to be honest most of this on start came from US so the general surprise is understendable. But of course it would be foolish and arrogant assume that for whole progress forever.

I don't understand the rage. This is good for everyone. Competition is what drives innovation and they even open sourced it! If you want to outdo them, learn from them. Don't just try to cry louder, it's embarrassing for everyone.

awesomeMilou1y ago

> Can't wait to see what this boneheaded President's tarrif on TSMC does to this situation.

Can you please provide a source? Genuinely curious as this would be fatal to the US economy. Imagine working 2 years to get out from Covid chip shortages only to hammer progress down with tariffs.

2 more replies

lolinder1y ago

I'm sure some people did actually get hurt by NVIDIA's stock dropping, but it's also important to keep the size of the effect in perspective: NVIDIA's stock is back to where it was in September of last year, and still up almost 1900% from 5 years ago and up 103% from a year ago.

NVIDIA's stock has been super bubbly—all DeepSeek did was set off itchy investor trigger fingers that were already worried about its highly inflated price.

m00x1y ago

The hit on NVidia's stock price makes no sense to me.

DeepSeek uses H100s and H800s. They'll likely have reasons to buy more now, and America will want to compete even harder, buying more chips.

American companies are still way ahead as well, but they're just getting more competition. This will be healthy.

forgotoldacc1y ago

Many stocks aren't grounded in reality. They're essentially Memecoins: Classic Edition™ now.

Tesla barely even sells but the stock just won't go down. Boeing orders have fallen massively and they're posting massive losses each quarter, and management shows zero desire to improve the situation. But the stock has basically stabilized since the initial catastrophes.

1 more reply

mlinsey1y ago

People saw how much cheaper it was to train DeepSeek v3, and assumed this reduced NVidia's TAM. I think this doesn't make much sense.

a) For inference, cheaper and faster compute will increase total inference spend, because the end-user products will work better and people will use them more.

b) For training, the big labs will continue to spend because we have yet to see diminishing returns to scale - in fact, we have in the past year unlocked a new dimension to scale up training-time compute - doing more RL after pre-training to improve reasoning capabilities. Since current SOTA models are not yet smart enough for all the tasks people want to use them for, this means that any efficiency gains will be used to further improve performance. In the current competitive environment, even with DeepSeek's work, it's near-impossible to imagine OpenAI, Anthropic, Google, or Meta deciding to cut the compute budget for training their next model by an order of magnitude. They will still incorporate DeepSeek's techniques into their next model, but use them to squeeze even more performance out of the compute they have, and will keep purchasing as much compute as NVidia will sell them. Expect this trend to continue until there are no more returns to scale anymore.

1 more reply

gleenn1y ago

The stock price had assumptions baked in about the number of units expected to be sold. DeepSeek cut that hardware estimate by as much as 45x. That is that absolute obvious correlation between that model being very efficient to train and NVDA dropping 18%.

4 more replies

mmaunder1y ago

I think there was a growing awareness of NVidia's vulnerability and I think that, while I don't agree with his conclusions, Jeffrey Emanuel's excellent piece from the 25th added significantly to that momentum:

https://youtubetranscriptoptimizer.com/blog/05_the_short_cas...

jijji1y ago

It's probably the lack of understanding the market... Most people thought there was a ban (issued by the US in 2022) against China being able to utilize the H100 Nvidia graphics cards to prevent them from using AI (for the obvious purpose of oppressing their people). If anything, export controls need to be looked at and probably tightened as this is a glaring loop hole.

2 more replies

EVa5I7bHFq9mnYK1y ago

The nvidia bubble went too far and was about to burst anyway. I started to buy puts a year ago. The DeepSeek was just a convenient catalyst.

ziofill1y ago

Arbitrage opportunities :)

to11mtm1y ago

Every intelligent colleague is an interesting mix of 'sour but intrigued'

Personally, I know I've lost a lot of street cred amongst certain work circles in recent history as far as my thoughts of 'shops should pursue local LLM solutions[0]' and the '$6000 4-8 tokens/second local LLM box' post making the rounds [1] hopefully gives orgs a better idea of what LLMs can do if we keep them from being 100% SAASlike in structure.

I think a big litmus test for some orgs in near future, is whether they keep 'buying ChatGPT' or instead find a good way to quickly customize or at least properly deploy such models.

[0] - I mean, for starters, a locally hosted LLM resolves a LOT of concerns around infosec....

[1] - Very thankful a colleague shared that with me...

blackeyeblitzar1y ago

In my opinion, it is not vitriol as much as unfiltered recognition of the significant issues and risks that have become a part of DeepSeek’s story: the Chinese government injecting propaganda into LLMs, the threat of apps from adversaries in US app stores (like TikTok and DeepSeek), the disregard for user privacy (their database was open to the Internet with no authentication and no encryption of data), the misleading claim of quoting the cost of a single final run (which amounts to market manipulation of nvidia stock), the theft of OpenAI’s assets that they’ve not admitted to, the likely evasion of sanctions, and so on.

blitzar1y ago

Obviously a lot of people are long Nvidia stock, and based on the comments are in the denial stage of grief.

"This is good for Nvidia" is the 2025 version of "this is good for bitcoin"

ninetyninenine1y ago

Also American pride. China is on track to outpace the US in technical, military and economic dominance.

A lot of people want to poke at Chinese weakness wherever it’s exposed because Americans are used to being the best and also unconscious racism. When Japan was about to overtake the US the US pulled some similar moves and that is partly what’s responsible for japans current economic funk. It’s unlikely these moves will work on China.

1 more reply

jijji1y ago

Did you even read the article? It's about the backend of deepseek.com being completely unprotected to the point where all the prompts users typed in are being exposed to the public. The fact that these people are supposed to be competent leads most to believe this is a backend for the CCP to spy on users. Competent or not, I would not use that system for anything. If anything, one might host it locally and use it in that way. Regardless, deepseek.com has serious security issues.

sho_hn1y ago

I'm wary of engaging in false equivalence, but people seem to have really forgotten the revelations of the Snowden episode.

1 more reply

mmaunder1y ago

Yes. :-) And I commented below, earlier this evening. Scroll down and we can argue about whether what they did is legal or not.

lurking_swe1y ago

it’s a FREE LLM chat interface, that doesn’t require you to enter your real name or credit card information. Who cares? It’s not a government website or a tax software application.

Obviously i am disappointed regarding this, but people really blow this out of proportion imo. Rumor is this was a side project for some employees at a hedge fund. you think they specialize in security and software application best practices? i’m not exactly surprised that it’s insecure.

The really crazy thing is that anyone gives ANY company sensitive data to train on. regardless of which country the service is running in. That’s what’s actually crazy.

karim791y ago

+1. I also enjoy the "China be stealin' ur data and personal info" angle. As if the incumbents haven't already done that, and are still doing it, as their core business practices.

This whole thing should be an eye-opener to most people.

To ask an honest question, who gives a crap if a Chinese company manages to grab data that many of the usual Silicon Valley suspects have had all along and have been incrementally updating? How is this a "threat".

To pile on another gripe, why the hell does every single media outlet point out the "Tienanmen Square" question?

The whole thing has just become embarrassing. I honestly can't fathom what worse China could do with my personal info than the likes of say, Meta. I'm not saying I would enjoy it, but I just don't see how it could be more harmful than the Silicon Valley status quo.

sho_hn1y ago

> How is this a "threat".

Given how closely major US tech companies are now affiliated and partnered with the US Federal government, arguably the direct potential threat from them to US citizens may well be higher than from across a very big pond.

People trot out "I'd rather our guys spy on me than them" a lot, but that's putting a lot of faith in your local government. Conversely, who do you think has more to fear from their logged prompts on DeepSeek, US or Chinese citizens?

2 more replies

coliveira1y ago

> How is this a "threat".

It is a threat to WallStreet and Silicon Valley. It just broke the illusion that they're kings of tech.

> why the hell does every single media outlet point out the "Tienanmen Square" question?

Sour grapes, but also the media cannot report anything about China without showing its anti-China bias.

Kiro1y ago

I'm not seeing any vitriol or comments that are outside the norm, except people defending DeepSeek and throwing accusations left and right for seemingly no reason at all. That's the actually surprising data.

elevatedastalt1y ago

In whose hands do you think your personal data is more secure? Google or DeepSeek?

worksonmine1y ago

Do we have to choose? I'm doing fine without either.

JKCalhoun1y ago

Both can suck.

gerdesj1y ago

"The amount of vitriol in these comments is the really surprising data"

No it isn't (well it probably is too). This is the rather naff nation state bollocks in play.

You have either or both of "some bigger boys found a more efficient way of doing something I thought I was good at" and "I've wet myself".

gerdesj1y ago

sigh

seeknotfind1y ago

Where's the download link?

j451y ago

A data point on self-hosting being preferable, or using an alternate gpu cloud host who can run the model privately/semi-privately for you.

mr902101y ago

Poorly secured or not it still managed to hit your favourite stock. The execs at NVIDIA still haven’t recovered from the bloodbath.

Etherlord871y ago

This argument seems fallacious: the stock was "hit" before security issues emerged. It's hard to say how this recent news will affect the stock, directly and indirectly through eventual damage that follows. Imagine that the security issues were 1000 times worse: you could still write the same comment, but the reputation of DeepSeek and by extension all Chinese AI and software would be so badly hurt that long term any Chinese success would be downplayed, having lesser effect on stock market. If Nvidia stock would recover or not is more nuanced, because market is speculative, and if a bubble is burst, even if what pierced it turns out to be fake/irrelevant, the bubble is no longer there (a new one may need a lot of time and effort to grow).

hdlothia1y ago

This kinda does support the 'DeepSeek is the side project of a bunch of quants' angle.

Seems like the kind of mistake you would make if you are not used to deploying external client facing applications.

yk1y ago

That's pretty much the same mistake as in VW recent "We know where you parked" hack. [0] So while I don't really want to say anything nice about VW, the mistake is no something that only happens to side projects.

[0] https://www.spiegel.de/international/business/we-know-where-...

Twirrim1y ago

This is also something that keeps affecting "smart" software engineers with projects, that don't realise they've got misconfigured S3 buckets, or have Firebase or Mongodb etc. wide open to the world. We've seen so many companies that absolutely should know better be in this area.

1 more reply

throw_pm231y ago

Software is unfortunately a side-project for most auto makers :)

1 more reply

ziddoap1y ago

There are many examples of experienced teams doing stupid things like exposing databases that I don't really think this is a valid conclusion to draw.

readyplayernull1y ago

Right, just about 4 months ago Meta was fined for storing passwords in plain text:

https://news.ycombinator.com/item?id=41678840

The joke is these companies build systems that can tell them how to implement better security, they simply don't care.

whereismyacc1y ago

Clearly it could never be enough to draw that conclusion but it might be very weak evidence in one direction

1 more reply

lukan1y ago

'DeepSeek is the side project of a bunch of quants'

I doubt it very much that it only was that and not massivly backed by the Chinese state in general.

As with OpenAI, much of this has to do with hype based speculation.

In the case of OpenAI they played with the speculations, that they might have AGI locked up in their labs already and fueled those speculations. The result, massive investment (now in danger).

And China and the US play a game of global hegemony. I just read articles with the essence of, see China is so great, that a small sideproject there can take down the leading players from the west! Come join them.

It is mere propaganda to me.

Now deepseek in the open is a good thing, but I believe the Chinese state is backing it up massivly to help with that success and to help shake the western world of dominance. I would also assume, the chinese intelligence services helped directly with Intel straight out of OpenAI and co labs.

This is about real power.

Many states are about to decide which side they should take, if they have to choose between West and East. Stuff like this heavily influences those decisions.

(But btw. most don't want to have to choose)

jychang1y ago

I don't buy this, simply because if the Chinese government were to back an effort, it wouldn't be Deepseek.

Alibaba has Qwen. Baidu, Huawei, Tencent, etc all have their own AI models. The Chinese government would most likely push one of these forward with their backing, not an unknown small company.

1 more reply

persedes1y ago

To corroborate the side project angle, their sdks are quite literally taken from openai:

  # Please install OpenAI SDK first: `pip3 install openai`
  from openai import OpenAI
  client = OpenAI(api_key="<DeepSeek API Key>", base_url="https://api.deepseek.com")

blackeyeblitzar1y ago

DeepSeek isn’t a side project or just a bunch of quants - these are part of the marketing that people keep repeating blindly for some reason. To build DeepSeek probably requires at least a $1B+ budget. Between their alleged 50,000 H100 GPUs, expensive (and talented) staff, and the sheer cost of iterating across numerous training runs - it all adds up to far, far more than their highly dubious claim of $5.5M. Anyone spending that amount of money isn’t just doing a side project.

The client facing aspect isn’t the problem here. This linked article is talking about the backend having vulnerabilities, not the client facing application. It’s about a database that is accessible from the internet, with no authentication, with unencrypted data sitting in it. High Flyer, the parent company of Deep Seek, already has a lot of backend experience, since that is a core part of the technologies they’ve built to operate the fund. If you’re a quantitative hedge fund, you aren’t just going to be lazy about your backend systems and data security. They have a lot of experience and capability to manage those backend systems just fine.

I’m not saying other companies are perfect either. There’s a long list of American companies that violate user privacy, or have bad security that then gets exploited by (often Chinese or Russian) hackers. But encrypting data in a database seems really basic, and requiring authentication on a database also seems really basic. It would be one thing if exposure of sensitive info required some complicated approach. But this degree of failure raises lots of questions whether such companies can ever be trusted.

diggan1y ago

> Anyone spending that amount of money isn’t just doing a side project.

You're reciting a bunch of absolute numbers, without any sort of context at all. $5M isn't the same for every company. For example, in 2020, it seems High Flyer spent a casual $27M on a supercomputer. They later replaced that with a $138M new computer. $5.5M sounds like something that could be like a side-project for a company like that, whose blood and sweat is literally money.

> But this degree of failure raises lots of questions whether such companies can ever be trusted.

This, I agree with though. I wouldn't trust sending my data over to them. Using their LLMs though, on my own hardware? Don't mind if I do, as long as it's better, I don't really mind what country it is imported from.

2 more replies

nicce1y ago

> To build DeepSeek probably requires at least a $1B+ budget. Between their alleged 50,000 H100 GPUs, expensive (and talented) staff, and the sheer cost of iterating across numerous training runs - it all adds up to far, far more than their highly dubious claim of $5.5M.

This is not fair. Is OpenAI, for example, including the CEO paycheck for the model training costs?

1 more reply

fulladder1y ago

>To build DeepSeek probably requires at least a $1B+ budget.

Zero evidence that the above statement is true, and weak evidence (authors' claims) that it is false. Have you read their papers even?

https://arxiv.org/html/2412.19437v1#abstract https://arxiv.org/pdf/2501.12948

2 more replies

cma1y ago

> Between their alleged 50,000 H100 GPUs

I'm sure you were just mislead by all the people including Anthropic's Dario parroting this claim, but even Dario already said he was wrong to say that and semi analysis already clarified it was a misunderstanding of their claim, which was 50,000 H series, not 50,000 H100.

1 more reply

harrall1y ago

Some academic projects have a lot of funding and what they are researching is some top tier stuff.

But the software? Absolute disaster.

When people say DeepSeek is a side project, this is what I assume they mean. It's different when a bunch of software engineers make something with terrible security because it's their main job. With bunch of academics (and no offense to academics), software is not their main job. You could be working on teaching them how to use version control.

crummy1y ago

You think they deliberately left their DB open to the internet, without a password? Why?

1 more reply

zem1y ago

doesn't even need to be a side project, or by a bunch of quants. a bunch of AI researchers working on this as their primary job would still have no real idea about what it takes to secure a large-scale world-usable internet service.

sailingparrot1y ago

> This kinda does support the 'DeepSeek is the side project of a bunch of quants' angle

Can we stop with this nonsense ?

The list of author of the paper is public, you can just go look it up. There are ~130 people on the ML team, they have regular ML background just like you would find at any other large ML labs.

Their infra cost multiple millions of dollar per month to run, and the salary of such a big team is somewhere in the $20-50M per year (not very au fait of the market rate in china hence the spread).

This is not a sideproject.

Edit: Apparently my comment is confusing some people. Am not arguing that ML people are good at security. Just that DS is not the side project of a bunch of quant bros.

islewis1y ago

A bunch of ML researchers who were initially hired to do quant work published their first ever user facing project.

So maybe not a side project, but if you have ever worked with ML researchers before, lack of engineering/security chops shouldn't be that surprising to you.

3 more replies

manquer1y ago

> This is not a sideproject.

OP means to say public API and app being a side project, which likely it is, the skills required to do ML have little overlap to skills required to run large complex workloads securely and at scale for public facing app with presumably millions of users.

The latter role also typically requires experience not just knowledge to do well which is why experiences SREs have very good salaries.

weird-eye-issue1y ago

None of that has anything to do with "deploying external client facing applications"

1 more reply

skywhopper1y ago

?? The point is, the ML researchers aren’t experts at deploying secure infrastructure.

1 more reply

hombre_fatal1y ago

It doesn't say much.

Data breaches from unsecured or accidentally-public servers/databases are not unusual among much larger entities than DeepSeek.

fzzzy1y ago

how many people in the world are used to deploying external client facing applications?

lowdest1y ago

Hundreds of thousands. My employer alone probably has 1000.

1 more reply

nightpool1y ago

How many people in the world drink coffee? I don't understand your question.

1 more reply

CharlieDigital1y ago

A lot? They can go scoop up people from any number of SaaS startups or hire an external 3rd party to do a security audit.

We're not talking some poor college students here.

pyrareae1y ago

That's not a matter of battle hardened experience. Publicly exposing database management endpoints that allow arbitrary execution is a *massive* no-no that even a junior developer with zero experience should be able to sense is a bad idea.

matt-p1y ago

A million or more, be serious.

gchamonlive1y ago

I am and I'm quite sure I'm not that big of a deal

rvz1y ago

> More critically, the exposure allowed for full database control and potential privilege escalation within the DeepSeek environment, without any authentication or defense mechanism to the outside world.

Not only that, this was a "production-grade" database with millions of users using it and the app was #1 on the app store and ALL text sent there in the prompts was logged in plain-text?

Unbelievable.

byearthithatius1y ago

I agree this is really bad but far from unbelievable. I am only 23 and already my SSN and even my freaking DNA have both been leaked by major publicly traded companies.

sho_hn1y ago

Plus Volkswagen and Subaru in the last few weeks ...

1 more reply

sans_souse1y ago

You leaked your DNA on which companies?

gitaarik1y ago

Is it so strange to have logs in plain text? In my experience most logs at companies are in plain text. Only passwords are usually encrypted.

jazzyjackson1y ago

Did they ever make promises as to confidentiality? What if providing all chat logs with users is just part of their open source / shānzhài attitude ? :)

lexandstuff1y ago

Another example of DeekSeek copying straight from OpenAI's playbook [1] [2]

[1] https://www.reuters.com/technology/cybersecurity/openais-int...

[2] https://openai.com/index/march-20-chatgpt-outage/

nialv71y ago

I wonder if this is the "cyberattack" DeepSeek was talking about?

juliuskiesian1y ago

You are wondering wrong. This is a security hole and data leak.

A large scale DDos is being directed against deepseek.

US big tech wants to quench the competition.

hi_hi1y ago

I don't get the discussions around side project and they're ML engineers, not security experts. Why are you excusing a company for a serious security leak.

If you're releasing a major project into the wild, expect serious attention and have the money, you get third parties involved to test for these things before you launch.

Now can we get back to discussing the real conspiracy theories. This is clearly a disinformation piece by BigAI to add FUD around the Chinese challenger :-)

throwaway3141551y ago

> I don't get the discussions around side project and they're ML engineers, not security experts. Why are you excusing a company for a serious security leak.

No one is here as far as I can tell. But if you've ever been a software engineer who is required to work with someone purely from an ML lab and/or academia, you'll quickly discover that "principled software engineering" just isn't really something they consider an important facet of software. This is partly due to culture in academia, general inexperience (in the software industry) and deeply complicated/mathematical code really only needing to be read by other researchers who already "get it", to a degree.

Not an excuse but rather an explanation for _why_ such an otherwise impressive team might make a mistake like that.

hi_hi1y ago

Yeah, you're right, I was conflating the excusing bit.

I haven't worked with serious ML engineers, but having worked in large webdev there's usually a team involved in these projects, including senior none devs who would ensure the correct checks and balances are in place before go live. Does this not happen in ML projects? (of course there are always exceptions and unknowns that will slip through, I don't know if that was the case here, or something else)

1 more reply

maitola1y ago

How do we know for sure that DeepSeek is not actually trained on Nvidia chips? Did someone outside of China replicated the training from scratch (Spending $6M)?

haeffin1y ago

They themselves said it was trained on NVIDIA chips, so I’m not sure where you got that it wasn’t. It was trained on the less capable versions sold for the Chinese market.

maitola1y ago

I see, thank you for pointing that out. Then I’d rephrase, how do we know for sure that it wasn’t trained on the most advanced Nvidia chips? Did anyone outside of China replicated the training?

suraci1y ago

that's why i never use my strong passwords in many chinese websites(in fact, i tend not to use passwords in any website)

i suggest you guys don't do that also

this industry in china is so young, many devs and orgs don't understand what will happened if they shutdown the firewall or expose their database on the internet without a password

they just, can't think of it, need someone to remind them

gitaarik1y ago

I didn't understand your comment first, because I use a password manager which generates a unique and complicated password for each website I setup an account for. So I never reuse any password. So if one one those sites gets hacked and my password is potentially exposed, it doesn't matter, because I only use that password there.

I would recommend that. Bitwarden is a pretty good open-source password manager. You can install it as a plugin in your browser, so it can fill out your password for you so you don't have to manually copy and paste.

SebFender1y ago

Never forget honeypots.

nico1y ago

So much effort in trying to tarnish DeepSeek the last 24hrs

sho_hn1y ago

Can't fault hackers for taking a look at a website that goes from "virtually unknown" to "extremely popular and headline news globally" practically over night. If nothing else, the probability of low-hanging fruit in something that is barely battle-tested is high.

You can fault them for disclosure practices though :-)

htrp1y ago

> Wiz Research has identified a publicly accessible ClickHouse database belonging to DeepSeek, which allows full control over database operations, including the ability to access internal data. The exposure includes over a million lines of log streams containing chat history, secret keys, backend details, and other highly sensitive information.

>The Wiz Research team immediately and responsibly disclosed the issue to DeepSeek, which promptly secured the exposure.

It seems like Wiz told deepseek and deepseek secured this vuln?

khazhoux1y ago

Are you saying this report was falsified, or that the press should keep things like this secret?

LarsKrimi1y ago

Probably they are rather suggesting that there are a lot of unscrupulous western companies with a lot to lose who might have an interest in convincing certain people to skip responsible disclosure

1 more reply

rnd01y ago

It would be very nice if the press didn't just fall over itself trying to be a free PR agency for OpenAI.

krick1y ago

Maybe, but having dev services outside of VPN is pretty nuts, not much effort needed to find that. Wouldn't expect a company with such budgets be that careless, I'm sure it's only the tip of the iceberg.

meiraleal1y ago

One more proof that AGI is not near

gruez1y ago

I'm not sure why you think why this discovery has to be some sort of "effort in trying to tarnish DeepSeek". Deepseek is the #1 downloaded app and and the media can't stop talking about it. That means a lot more people are looking into the app and possibly finding vulnerabilities, no conspiracy needed.

talldayo1y ago

Not to mention the same thing happened to OpenAI and basically the same effort went into shaming them: https://www.theverge.com/2024/7/3/24191636/openai-chatgpt-ma...

skupig1y ago

edit: snip, misinfo, I'm illiterate. Sorry!

2 more replies

xmprt1y ago

I, for one, think this is a valuable piece of information and somewhat interesting analysis. You can take the cynical point of view that this was released just to tarnish their reputation or you can assume that it's security researchers publishing an important discovery just like they've always done whether it's for OpenAI, Microsoft Copilot, or any other AI or non AI product.

nyclounge1y ago

>https://news.ycombinator.com/item?id=42871371#42872454

We all agree this kind of leak should be disclosed. However normally security researchers don't just leaks specific URL and etc. This may be what the parent is referring to.

kdmtctl1y ago

I think that was a shameless self promotion. A lot of PR and free traffic by taking a low hanging fruit. Nothing else. But they did some good though.

zhengiszen1y ago

you're absolute right, so much garbage propaganda in many languages. For Apple we have tv news that usually promotes new Apple or OpenAI products (wtf!!) that are trying to tarnish DeepSeek on the privacy level... No words about all those garbage software siphoning off the web (without respecting neither copyright nor privacy)

breakitmakeit1y ago

Elsewhere perhaps, this seems like a pretty standard/legit security flaw in a new application which is found and hopefully quickly closed.

If your information is sensitive, do not use an LLM by public API - absolutely all of your data is being stored and processed. For all of them.

lysace1y ago

It would be incredibly immature and naive to presume that the data fed to this service is not going to be data mined by the CCP.

Downvoted - because of course the CCP wouldn't want all of this data, that's preposterous. What would they even do with it? /S

mandmandam1y ago

Yep.

Kinda like how your comment was grey within 1 minute, despite stating an objective truth.

Sure, this is to be expected given the billions and billions of dollars at stake but like - that money is gone lol. DeepSeek isn't going back in the bottle, nor is open source AI in general.

IncreasePosts1y ago

The comment isn't wrong, but the implication is at deep seek is somehow special and is getting undue attention from hackers. Any app that skyrockets from nowhere to number one in the app store overnight will have the attention of probably hundreds of thousands of hackers.

1 more reply

dotcoma1y ago

It’s a feature, not a bug !

bryan_w1y ago

This is totally expected when you use AI to build your infrastructure.

ripped_britches1y ago

I was going to say the opposite, ironic because an LLM would have told them not to do that if they were working closely with one.

bryan_w1y ago

With the various ways people setup their dev environment + docker, I imagine there are probably a lot of guides that show you how to set it up insecurely (because it assumes you're connecting from your local network) with a small asterisk at the end saying not to set it up in production like that. Very easy for an LLM to misunderstand.

mrbungie1y ago

[edit: Nevermind, see below]

The direct disclosure of urls and ports is insane. Wonder if they would be as irresponsible if it was MSFT, OpenAI, Anthropic, etc.

PS: Not defending DeepSeek for bad practices, but still. Nothing irresponsible here.

PS2: It is marked as resolved, I went directly to the vulns due to the title of the post.

bberenberg1y ago

It’s been disclosed and resolved. What’s the concern here?

nyclounge1y ago

Why is ClickHouse exposing unauthenticated database access at port 9000 to the public? Is this the default behavior or did DeepSeek open it up for dev purposes?

AlexClickHouse1y ago

ClickHouse does not allow external connections by default.

If someone wants to configure an unauthenticated access from the Internet, they have to do the following extra steps:

- enable listening to the wildcard address;

- remove IP filtering for the default user;

- set up a no-password authentication;

It is possible to ignore and turn off all guardrails that the system has by default, but it needs extra efforts. However, it's possible that someone copy-pasted a wrong configuration file from somewhere without knowing what is inside, or do something like - listen to localhost, but expose ports from Docker.

A use case for direct database access exists, and is acceptable, assuming you set up a readonly user, grant access to specific tables, limit queries by complexity, and limit total usage by quotas. This is demonstrated by the following public services:

https://play.clickhouse.com/

https://adsb.exposed/

https://reversedns.space/

In this way, ClickHouse can be used to implement public data APIs (which is probably not what DeepSeek wanted).

ClickHouse has a wide range of security and access control restrictions: authentication methods with SSL certificates; SSH keys; even simple password-based auth allows bcrypt and short-living credentials; integration with LDAP and Kerberos; every authentication method can be limited on a network level; full Role-Based Access Control; fine-grained restrictions on query complexity and resource consumption, user quotas.

But still, according to Shodan, there are 33,000 misconfigured ClickHouse servers on the Internet: https://www.shodan.io/search?query=clickhouse This can be attributed to a high popularity of ClickHouse (it is the most widely used analytic DBMS).

When you use ClickHouse Cloud, which is a commercial cloud service based on the open-source ClickHouse database (https://clickhouse.com/cloud), it ensures the needed security measures, improving strong defaults even more: TLS, stong credentials, IP filtering; plus it allows private link, data encryption with customer keys, etc.

2 more replies

ceejayoz1y ago

That used to be the default setup for Redis, too. Might still be. You aren’t supposed to have it on a public subnet.

2 more replies

jazzyjackson1y ago

I don't have personal experience but from a quick google it looks like default setup is to accept connections on localhost only [0], and there's a default user without capability to run SQL statements. They would have had to open remote connections and enable SQL capability for the default user (it looks like this is the first step to creating other users, the 3rd step is, removing SQL capability for default user.) [1]:

  1. Enable SQL-driven access control and account management for the default user.
  2. Log in to the default user account and create all the required users. Don’t forget to create an administrator account (GRANT ALL ON *.* TO admin_user_account WITH GRANT OPTION).
  3. Restrict permissions for the default user and disable SQL-driven access control and account management for it.

[0] https://chistadata.com/knowledge-base/allow-clickhouse-to-ac...

[1] https://clickhouse.com/docs/en/operations/access-rights

3 more replies

kdmtctl1y ago

I suspect this is a docker container hijacking host firewall rules which is a common pitfall. Of course there should be an ingress and others, but it is also common to roll out a VPS in a hurry. No bad intentions from any side, just lack of practice.

krick1y ago

I'm not sure "irresponsible" is the word. Shouldn't this be, like, punishable by law?

varenc1y ago

The vulnerable services were presumably fixed by the time this was published. I don't see anything wrong with releasing the details now.

tomlockwood1y ago

This doesn't look like a responsible disclosure, at all.

ed: I was wrong!

varenc1y ago

From the article:

> The Wiz Research team immediately and responsibly disclosed the issue to DeepSeek, which promptly secured the exposure.

Assuming everything mentioned in the article was fixed before publication, I don’t see an issue with it.

tomlockwood1y ago

Yeah my bad I missed that and edited OP.

CamelCaseName1y ago

Who's going to go after them? Heck, they may get an award for this.

megous1y ago

If random security researcher does this kind of disclosure, fine.

But if serious company that seems to offer services to seemingly plenty of serious customers acts this way, I'd not want to be their customer, if they seem to have such a cavalier attitude, disclosing stuff without even a sniff of "we notified the company about the breach".

1 more reply

krick1y ago

Uh, I don't know, but cannot DeepSeek do that, for starters? Being located in a different country than the service you are attacking doesn't really make you immune to being sued.

samedev1y ago

Man! I used deepseek.com luckily I didn't use the same password as I use. :) Time to use ollama!

j / k navigate · click thread line to collapse

474 comments

jvansc1y ago

This is probably an incredibly stupid, off-topic question, but why are their database schemas and logs in English?

I'm realizing now that I'm very ignorant when it comes to non English-based software engineering.

david-gpu1y ago

> Or do devs around the world just have to bite the bullet and learn enough English to be able to use the majority of tools?

londons_explore1y ago

Now there are a decent number of devices where the only documentation is only available in mandarin, and the design process was clearly done with little or no english involved.

3 more replies

miki1232111y ago

In my experience, you usually get English variable names / db schemas, localized chats and tickets, with internal docs, log messages and comments being a mixed bag.

For some kinds of software, localized names make a lot more sense, e.g. when you're dealing with very subtle distinctions between legal terms that don't have direct English equivalents.

bryanrasmussen1y ago

I have worked in a couple places where some of the code was not in English, and it was incredibly annoying, like an affectation.

1 more reply

edudobay1y ago

impulsivepuppet1y ago

So if anything, English is a sign of a battle-hardened developer, until they try to convert proper names.

denysvitali1y ago

In the wild I've seen a company returning a JSON key "ankunftTime" in one of their APIs

3 more replies

rcruzeiro1y ago

heelix1y ago

2 more replies

icepat1y ago

Yep, myself as well. I've heard non-English programmers who've worked with non-English codebases call them "very weird".

2 more replies

sghiassy1y ago

Dumb question, but it would then seem that you have to know English to program??

9 more replies

bri3d1y ago

Almost all software engineers learn a passing amount of English - truly localized programming environments are quite esoteric and not really available for most mainstream use cases I can think of.

buu7001y ago

0xcde4c3db1y ago

> Or do devs around the world just have to bite the bullet and learn enough English to be able to use the majority of tools?

lukan1y ago

"Or do devs around the world just have to bite the bullet and learn enough English to be able to use the majority of tools?"

Yes, that's what we did and do.

Depending on the project, I do use german variable names and comments at times, but stopped using all special characters like öüäß, they mess things up, despite in theory should just work fine.

So english it is.

(And it is the lingua franca in most parts of the world anyway)

maeil1y ago

nemoniac1y ago

Not only that, DeepSeek "thinks" in English!

When I interact with it by asking it a question in Spanish, the parts between the <think> ... </think> are in English before it goes on to answer in Spanish.

Give it a try in your favourite language.

I went on to ask it if it "thinks" in English, Spanish or Chinese but it just gives the pat answer that, being an LLM, it doesn't think in any language.

chromanoid1y ago

I assume that there is a prompt that asks the LLM to generate its thoughts. This prompt is probably in English.

dreilide1y ago

interestingly that hasn't been my experience. did you use their web interface or the API?

https://ibb.co/chYPXNDw

victorbjorklund1y ago

I'm from Sweden (okay not same thing as China due to english being more common here) but I always code in english. Even if it is a script just for myself I will use english for variable names etc

2mlWQbCK1y ago

sedatk1y ago

sakras1y ago

From what I’ve seen, code usually comes in one of two languages: English or French. Somehow everyone but the French speaks enough English to write code!

Nab4431y ago

Please come in Belgium, there are places where you can see code in Dutch, French and Englis within the same file. I suppose you even should be able to find code with some additional German in it..

sharpy1y ago

1 more reply

0x4571y ago

This way, you don't have to change keyboard layout while writing code.

Anyway, you're forced to learn some English when doing any real software development.

kdmtctl1y ago

amonith1y ago

I mean we're kind of an outsourcing hub so it makes sense. Even some of our companies outsource further to the east so you really can't avoid it.

Lanolderen1y ago

pllbnk1y ago

ceejayoz1y ago

The languages and frameworks and documentation are often in English. The code has a good chance of also being in English as a result.

See also: aviation.

jmorenoamor1y ago

I write code in english and user (andmin, ops, app user) messages in the appropiate language.

colordrops1y ago

I worked at a Chinese company for a while and they used Chinese in meetings but English in the code base.

1 more reply

formerly_proven1y ago

senko1y ago

> Or do devs around the world just have to bite the bullet and learn enough English to be able to use the majority of tools?

Not only that. All of the code I (not a native English speaker) write, even if only I will ever see it, is in English - comments too. And I'm pretty confident all my colleagues do that too.

Bayart1y ago

> Or do devs around the world just have to bite the bullet and learn enough English to be able to use the majority of tools?

yonatan80701y ago

I do occaisonally find code with variable names in other languages, but it's very rare, for the most part if you want to code, English is the way.

I've also seen a few devs who used Hebrew variable names but spelled in English (`shalom` instead of שלום).

vjk8001y ago

There basically isn't non English software engineering.

formerly_proven1y ago

cratermoon1y ago

https://en.wikipedia.org/wiki/Non-English-based_programming_...

Etherlord871y ago

markus_zhang1y ago

However, I suspect it's a honey pot.

SZJX1y ago

This is precisely how English is the lingua franca of developers around the world, and a lot of (not all, of course) companies in e.g. Germany or Japan hire English-speaking programmers.

likeabatterycar1y ago

Don't forget Shenzhen is a stone's throw away from Hong Kong where English is widely spoken.

scheme2711y ago

karmasimida1y ago

Because DeepSeek researchers are Elite, English is like very very easy and common for top Chinese students. They just use it, and feel nothing wrong about it.

krust1y ago

>Or do devs around the world just have to bite the bullet and learn enough English to be able to use the majority of tools?

Yes, coding in english is the standard.

ghfhghg1y ago

That's my experience working in Asia. All the comments were in Japanese though

csomar1y ago

Thanks god everyone accepted that otherwise the fragmentation will be insane.

dailykoder1y ago

I write all of my code in english. Even if it's just for me. I am a native german speaker.

TacticalCoder1y ago

> Or do devs around the world just have to bite the bullet and learn enough English to be able to use the majority of tools?

That.

There's also a huge mental switching context cost when you try to have code mixing, say, french and english together:

    size_t taille;
    site_t taille_domaine;

vs:

    size_t length;
    size_t domain_length;

Hardly anyone does the former. It's simply not a thing. I mean: sure, there are the odd projects that'll be exceptions.

But we pretty much all name our functions/methods/variables/etc. and write our comments in english.

FWIW when I code I actually both think and count in english.

galnagli1y ago

Thank you everyone, this was responsibly disclosed to DeepSeek and published after the issue was remediated, we got acknowledgment from their team today on our contribution.

leftcenterright1y ago

were these "dev" domains holding real production data? the blog post does not clear it for me.

caust1c1y ago

Interesting to note:

- Dev infra, observability database (open telemetry spans)

- Logs of course contain chat data, because that's what happens with logging inevitably

pedrovhb1y ago

> but most probably was training data to prevent deepseek from completing such prompts, evidenced by the `"finish_reason":"stop"` included in the span attributes

> [...] I'm guessing Wiz wanted to ride the current media wave with this post instead of seeing how far they could take it.

caust1c1y ago

https://platform.openai.com/docs/api-reference/introduction

Right there in the docs:

1 more reply

danielodievich1y ago

open exposed clickhouse is this decade's open exposed elasticsearch so common in the past

ebfe11y ago

Not very popular but LDAP and Http Authentication Server are also great options.

I also wonder how DeepSeek engineers deployed their ClickHouse instance. When I deployed using yum/apt install, the installation step literally ask you to input a default password.

pl4nty1y ago

shame they paywalled JWT authn behind their expensive PaaS offering :(

forced us to use an alternative, and paywalling security features in an "open source" product didn't make us feel comfortable for a long-term investment like a db

https://github.com/ClickHouse/ClickHouse/pull/68634#issuecom...

bearjaws1y ago

Which was originally the open exposed mongo server, then mysql/phpmyadmin, then exposed ftp, and then exposed telnet.

hmmm-i-wonder1y ago

We move on and upwards, but never really stop making the same mistakes do we.

astrea1y ago

Shows how old I am. Thought we were still in the "exposed ElasticSearch" era.

kdmtctl1y ago

I was sure this was Elastic, you are not alone.

blitzar1y ago

open exposed S3 bucket is this decade's open exposed S3 bucket so common in the past

mmaunder1y ago

Please don't follow this example. Sign up for a bug bounty program or work directly with a company to get permission before you probe and access their systems, and don't exceed the access granted.

soulofmischief1y ago

Your posturing is unwarranted. Literally in the first paragraph:

> The Wiz Research team immediately and responsibly disclosed the issue to DeepSeek, which promptly secured the exposure

archon8101y ago

FWIW, this is Mark Maunder, CEO of Defiant / Wordfence. I wouldn't write him off as some random guy on the internet.

https://www.linkedin.com/in/markmaunder

mmaunder1y ago

1 more reply

tevon1y ago

They left open a publicly exposed database... I'm sure they informed the company about this before publishing their post. Why are you blaming Wiz for this?

xinayder1y ago

throwaway-bb21y ago

Going throwaway account for this.

Wiz folks are notoriously shady. They cross the line a ton. They did this to Amazon and Microsoft to make a name among other. Super unethical.

Their product isn't terrible but their sales people are just terrible. Completely off-putting. Most of them are idiots from zscaler.

janalsncm1y ago

The CFAA is a US law. Assuming you break it, in order for that to matter, an American prosecutor needs to find time to prosecute you for doing so. Does Deepseek have any American presence at all?

Likewise, there may be Chinese laws were violated. However, outside of China they are a moot point.

ziddoap1y ago

They're publicly accessible URLs.

DeepSeek & users that had data exposed here should be thanking Wiz.

SomeRainIsGood1y ago

lol

SomeRainIsGood1y ago

written like someone who has never litigated even a traffic light

pinoy4201y ago

Yes but they’re chinese so it’s okay /s

They are getting DoS’d by us gov too so they were only trying to help /s

ripped_britches1y ago

Ironic - I bet if you ask deepseek r1 how to set up clickhouse it would tell you the right way to do it.

semking1y ago

Can you imagine executing arbitrary SQL queries via your web browser? :D

Complete database control and potential privilege escalation within the DeepSeek environment without ANY authentication...

NathanKP1y ago

And that's why you run models locally. Or if you want a remote chat model, use something stateless like AWS Bedrock custom model import to avoid having stored chats on the server.

dotancohen1y ago

Not many non-gamers have hardware capable of running such a model locally - never mind the skills.

For most people, bash is not a tool for interacting with the computer, it is how they express their frustration with the computer (sometimes leaving damaged keyboards).

razster1y ago

2 more replies

loloquwowndueo1y ago

Wow all the gamers with mad LLM skillz.

1 more reply

tonygiorgio1y ago

You could also use models that run on nvidia’s trusted execution environment.

janalsncm1y ago

Nvidia naming it “trusted” doesn’t mean I trust it.

sylware1y ago

The second Big Tech was threatened by significant competition (DeepSeek), this competition is "stealing"(lol), and is under heavy hacking attacks (main online inference portal).

anhldbk1y ago

Good finding. I don't see its timeline usually discussed in other Ethical hacking and responsible disclosures.

Havoc1y ago

Ugh. I know I’ve got at least some keys in those logs. Thankfully nothing too intense

danparsonson1y ago

Hopefully this is a lesson not to trust your sensitive private data with a public service?

sd91y ago

I've been redacting my keys before sending config to chatgpt, it's a pain but I guess this shows it's worth the effort.

Havoc1y ago

Yeah I avoid it too but I know I missed some during rapid copy pasting.

b3ing1y ago

It seems fair since all the other AI's scraped copyrighted information, images, video online and from pirated books, etc. without ever asking anyone first.

mmaunder1y ago

bobxmax1y ago

It's also deeply damaging to the western ego, especially one rooted in American exceptionalism.

But also one those of us actually working on foundational AI saw coming a mile away when most of the top research of late has been happening in Chinese labs, not American or European ones.

Can't wait to see what this boneheaded President's tarrif on TSMC does to this situation.

hsuduebc21y ago

Well to be honest most of this on start came from US so the general surprise is understendable. But of course it would be foolish and arrogant assume that for whole progress forever.

awesomeMilou1y ago

> Can't wait to see what this boneheaded President's tarrif on TSMC does to this situation.

Can you please provide a source? Genuinely curious as this would be fatal to the US economy. Imagine working 2 years to get out from Covid chip shortages only to hammer progress down with tariffs.

2 more replies

lolinder1y ago

NVIDIA's stock has been super bubbly—all DeepSeek did was set off itchy investor trigger fingers that were already worried about its highly inflated price.

m00x1y ago

The hit on NVidia's stock price makes no sense to me.

DeepSeek uses H100s and H800s. They'll likely have reasons to buy more now, and America will want to compete even harder, buying more chips.

American companies are still way ahead as well, but they're just getting more competition. This will be healthy.

forgotoldacc1y ago

Many stocks aren't grounded in reality. They're essentially Memecoins: Classic Edition™ now.

1 more reply

mlinsey1y ago

People saw how much cheaper it was to train DeepSeek v3, and assumed this reduced NVidia's TAM. I think this doesn't make much sense.

a) For inference, cheaper and faster compute will increase total inference spend, because the end-user products will work better and people will use them more.

1 more reply

gleenn1y ago

4 more replies

mmaunder1y ago

https://youtubetranscriptoptimizer.com/blog/05_the_short_cas...

jijji1y ago

2 more replies

EVa5I7bHFq9mnYK1y ago

The nvidia bubble went too far and was about to burst anyway. I started to buy puts a year ago. The DeepSeek was just a convenient catalyst.

ziofill1y ago

Arbitrage opportunities :)

to11mtm1y ago

Every intelligent colleague is an interesting mix of 'sour but intrigued'

I think a big litmus test for some orgs in near future, is whether they keep 'buying ChatGPT' or instead find a good way to quickly customize or at least properly deploy such models.

[0] - I mean, for starters, a locally hosted LLM resolves a LOT of concerns around infosec....

[1] - Very thankful a colleague shared that with me...

blackeyeblitzar1y ago

blitzar1y ago

Obviously a lot of people are long Nvidia stock, and based on the comments are in the denial stage of grief.

"This is good for Nvidia" is the 2025 version of "this is good for bitcoin"

ninetyninenine1y ago

Also American pride. China is on track to outpace the US in technical, military and economic dominance.

1 more reply

jijji1y ago

sho_hn1y ago

I'm wary of engaging in false equivalence, but people seem to have really forgotten the revelations of the Snowden episode.

1 more reply

mmaunder1y ago

Yes. :-) And I commented below, earlier this evening. Scroll down and we can argue about whether what they did is legal or not.

lurking_swe1y ago

it’s a FREE LLM chat interface, that doesn’t require you to enter your real name or credit card information. Who cares? It’s not a government website or a tax software application.

The really crazy thing is that anyone gives ANY company sensitive data to train on. regardless of which country the service is running in. That’s what’s actually crazy.

karim791y ago

+1. I also enjoy the "China be stealin' ur data and personal info" angle. As if the incumbents haven't already done that, and are still doing it, as their core business practices.

This whole thing should be an eye-opener to most people.

To pile on another gripe, why the hell does every single media outlet point out the "Tienanmen Square" question?

sho_hn1y ago

> How is this a "threat".

2 more replies

coliveira1y ago

> How is this a "threat".

It is a threat to WallStreet and Silicon Valley. It just broke the illusion that they're kings of tech.

> why the hell does every single media outlet point out the "Tienanmen Square" question?

Sour grapes, but also the media cannot report anything about China without showing its anti-China bias.

Kiro1y ago

elevatedastalt1y ago

In whose hands do you think your personal data is more secure? Google or DeepSeek?

worksonmine1y ago

Do we have to choose? I'm doing fine without either.

JKCalhoun1y ago

Both can suck.

gerdesj1y ago

"The amount of vitriol in these comments is the really surprising data"

No it isn't (well it probably is too). This is the rather naff nation state bollocks in play.

You have either or both of "some bigger boys found a more efficient way of doing something I thought I was good at" and "I've wet myself".

gerdesj1y ago

sigh

seeknotfind1y ago

Where's the download link?

j451y ago

A data point on self-hosting being preferable, or using an alternate gpu cloud host who can run the model privately/semi-privately for you.

mr902101y ago

Poorly secured or not it still managed to hit your favourite stock. The execs at NVIDIA still haven’t recovered from the bloodbath.

Etherlord871y ago

hdlothia1y ago

This kinda does support the 'DeepSeek is the side project of a bunch of quants' angle.

Seems like the kind of mistake you would make if you are not used to deploying external client facing applications.

yk1y ago

[0] https://www.spiegel.de/international/business/we-know-where-...

Twirrim1y ago

1 more reply

throw_pm231y ago

Software is unfortunately a side-project for most auto makers :)

1 more reply

ziddoap1y ago

There are many examples of experienced teams doing stupid things like exposing databases that I don't really think this is a valid conclusion to draw.

readyplayernull1y ago

Right, just about 4 months ago Meta was fined for storing passwords in plain text:

https://news.ycombinator.com/item?id=41678840

The joke is these companies build systems that can tell them how to implement better security, they simply don't care.

whereismyacc1y ago

Clearly it could never be enough to draw that conclusion but it might be very weak evidence in one direction

1 more reply

lukan1y ago

'DeepSeek is the side project of a bunch of quants'

I doubt it very much that it only was that and not massivly backed by the Chinese state in general.

As with OpenAI, much of this has to do with hype based speculation.

In the case of OpenAI they played with the speculations, that they might have AGI locked up in their labs already and fueled those speculations. The result, massive investment (now in danger).

It is mere propaganda to me.

This is about real power.

Many states are about to decide which side they should take, if they have to choose between West and East. Stuff like this heavily influences those decisions.

(But btw. most don't want to have to choose)

jychang1y ago

I don't buy this, simply because if the Chinese government were to back an effort, it wouldn't be Deepseek.

Alibaba has Qwen. Baidu, Huawei, Tencent, etc all have their own AI models. The Chinese government would most likely push one of these forward with their backing, not an unknown small company.

1 more reply

persedes1y ago

To corroborate the side project angle, their sdks are quite literally taken from openai:

  # Please install OpenAI SDK first: `pip3 install openai`
  from openai import OpenAI
  client = OpenAI(api_key="<DeepSeek API Key>", base_url="https://api.deepseek.com")

blackeyeblitzar1y ago

diggan1y ago

> Anyone spending that amount of money isn’t just doing a side project.

> But this degree of failure raises lots of questions whether such companies can ever be trusted.

2 more replies

nicce1y ago

This is not fair. Is OpenAI, for example, including the CEO paycheck for the model training costs?

1 more reply

fulladder1y ago

>To build DeepSeek probably requires at least a $1B+ budget.

Zero evidence that the above statement is true, and weak evidence (authors' claims) that it is false. Have you read their papers even?

https://arxiv.org/html/2412.19437v1#abstract https://arxiv.org/pdf/2501.12948

2 more replies

cma1y ago

> Between their alleged 50,000 H100 GPUs

1 more reply

harrall1y ago

Some academic projects have a lot of funding and what they are researching is some top tier stuff.

But the software? Absolute disaster.

crummy1y ago

You think they deliberately left their DB open to the internet, without a password? Why?

1 more reply

zem1y ago

sailingparrot1y ago

> This kinda does support the 'DeepSeek is the side project of a bunch of quants' angle

Can we stop with this nonsense ?

The list of author of the paper is public, you can just go look it up. There are ~130 people on the ML team, they have regular ML background just like you would find at any other large ML labs.

Their infra cost multiple millions of dollar per month to run, and the salary of such a big team is somewhere in the $20-50M per year (not very au fait of the market rate in china hence the spread).

This is not a sideproject.

Edit: Apparently my comment is confusing some people. Am not arguing that ML people are good at security. Just that DS is not the side project of a bunch of quant bros.

islewis1y ago

A bunch of ML researchers who were initially hired to do quant work published their first ever user facing project.

So maybe not a side project, but if you have ever worked with ML researchers before, lack of engineering/security chops shouldn't be that surprising to you.

3 more replies

manquer1y ago

> This is not a sideproject.

The latter role also typically requires experience not just knowledge to do well which is why experiences SREs have very good salaries.

weird-eye-issue1y ago

None of that has anything to do with "deploying external client facing applications"

1 more reply

skywhopper1y ago

?? The point is, the ML researchers aren’t experts at deploying secure infrastructure.

1 more reply

hombre_fatal1y ago

It doesn't say much.

Data breaches from unsecured or accidentally-public servers/databases are not unusual among much larger entities than DeepSeek.

fzzzy1y ago

how many people in the world are used to deploying external client facing applications?

lowdest1y ago

Hundreds of thousands. My employer alone probably has 1000.

1 more reply

nightpool1y ago

How many people in the world drink coffee? I don't understand your question.

1 more reply

CharlieDigital1y ago

A lot? They can go scoop up people from any number of SaaS startups or hire an external 3rd party to do a security audit.

We're not talking some poor college students here.

pyrareae1y ago

matt-p1y ago

A million or more, be serious.

gchamonlive1y ago

I am and I'm quite sure I'm not that big of a deal

rvz1y ago

Not only that, this was a "production-grade" database with millions of users using it and the app was #1 on the app store and ALL text sent there in the prompts was logged in plain-text?

Unbelievable.

byearthithatius1y ago

I agree this is really bad but far from unbelievable. I am only 23 and already my SSN and even my freaking DNA have both been leaked by major publicly traded companies.

sho_hn1y ago

Plus Volkswagen and Subaru in the last few weeks ...

1 more reply

sans_souse1y ago

You leaked your DNA on which companies?

gitaarik1y ago

Is it so strange to have logs in plain text? In my experience most logs at companies are in plain text. Only passwords are usually encrypted.

jazzyjackson1y ago

Did they ever make promises as to confidentiality? What if providing all chat logs with users is just part of their open source / shānzhài attitude ? :)

lexandstuff1y ago

Another example of DeekSeek copying straight from OpenAI's playbook [1] [2]

[1] https://www.reuters.com/technology/cybersecurity/openais-int...

[2] https://openai.com/index/march-20-chatgpt-outage/

nialv71y ago

I wonder if this is the "cyberattack" DeepSeek was talking about?

juliuskiesian1y ago

You are wondering wrong. This is a security hole and data leak.

A large scale DDos is being directed against deepseek.

US big tech wants to quench the competition.

hi_hi1y ago

I don't get the discussions around side project and they're ML engineers, not security experts. Why are you excusing a company for a serious security leak.

If you're releasing a major project into the wild, expect serious attention and have the money, you get third parties involved to test for these things before you launch.

Now can we get back to discussing the real conspiracy theories. This is clearly a disinformation piece by BigAI to add FUD around the Chinese challenger :-)

throwaway3141551y ago

> I don't get the discussions around side project and they're ML engineers, not security experts. Why are you excusing a company for a serious security leak.

Not an excuse but rather an explanation for _why_ such an otherwise impressive team might make a mistake like that.

hi_hi1y ago

Yeah, you're right, I was conflating the excusing bit.

1 more reply

maitola1y ago

How do we know for sure that DeepSeek is not actually trained on Nvidia chips? Did someone outside of China replicated the training from scratch (Spending $6M)?

haeffin1y ago

They themselves said it was trained on NVIDIA chips, so I’m not sure where you got that it wasn’t. It was trained on the less capable versions sold for the Chinese market.

maitola1y ago

I see, thank you for pointing that out. Then I’d rephrase, how do we know for sure that it wasn’t trained on the most advanced Nvidia chips? Did anyone outside of China replicated the training?

suraci1y ago

that's why i never use my strong passwords in many chinese websites(in fact, i tend not to use passwords in any website)

i suggest you guys don't do that also

this industry in china is so young, many devs and orgs don't understand what will happened if they shutdown the firewall or expose their database on the internet without a password

they just, can't think of it, need someone to remind them

gitaarik1y ago

SebFender1y ago

Never forget honeypots.

nico1y ago

So much effort in trying to tarnish DeepSeek the last 24hrs

sho_hn1y ago

You can fault them for disclosure practices though :-)

htrp1y ago

>The Wiz Research team immediately and responsibly disclosed the issue to DeepSeek, which promptly secured the exposure.

It seems like Wiz told deepseek and deepseek secured this vuln?

khazhoux1y ago

Are you saying this report was falsified, or that the press should keep things like this secret?

LarsKrimi1y ago

Probably they are rather suggesting that there are a lot of unscrupulous western companies with a lot to lose who might have an interest in convincing certain people to skip responsible disclosure

1 more reply

rnd01y ago

It would be very nice if the press didn't just fall over itself trying to be a free PR agency for OpenAI.

krick1y ago

meiraleal1y ago

One more proof that AGI is not near

gruez1y ago

talldayo1y ago

Not to mention the same thing happened to OpenAI and basically the same effort went into shaming them: https://www.theverge.com/2024/7/3/24191636/openai-chatgpt-ma...

skupig1y ago

edit: snip, misinfo, I'm illiterate. Sorry!

2 more replies

xmprt1y ago

nyclounge1y ago

>https://news.ycombinator.com/item?id=42871371#42872454

We all agree this kind of leak should be disclosed. However normally security researchers don't just leaks specific URL and etc. This may be what the parent is referring to.

kdmtctl1y ago

I think that was a shameless self promotion. A lot of PR and free traffic by taking a low hanging fruit. Nothing else. But they did some good though.

zhengiszen1y ago

breakitmakeit1y ago

Elsewhere perhaps, this seems like a pretty standard/legit security flaw in a new application which is found and hopefully quickly closed.

If your information is sensitive, do not use an LLM by public API - absolutely all of your data is being stored and processed. For all of them.

lysace1y ago

It would be incredibly immature and naive to presume that the data fed to this service is not going to be data mined by the CCP.

Downvoted - because of course the CCP wouldn't want all of this data, that's preposterous. What would they even do with it? /S

mandmandam1y ago

Yep.

Kinda like how your comment was grey within 1 minute, despite stating an objective truth.

Sure, this is to be expected given the billions and billions of dollars at stake but like - that money is gone lol. DeepSeek isn't going back in the bottle, nor is open source AI in general.

IncreasePosts1y ago

1 more reply

dotcoma1y ago

It’s a feature, not a bug !

bryan_w1y ago

This is totally expected when you use AI to build your infrastructure.

ripped_britches1y ago

I was going to say the opposite, ironic because an LLM would have told them not to do that if they were working closely with one.

bryan_w1y ago

mrbungie1y ago

[edit: Nevermind, see below]

The direct disclosure of urls and ports is insane. Wonder if they would be as irresponsible if it was MSFT, OpenAI, Anthropic, etc.

PS: Not defending DeepSeek for bad practices, but still. Nothing irresponsible here.

PS2: It is marked as resolved, I went directly to the vulns due to the title of the post.

bberenberg1y ago

It’s been disclosed and resolved. What’s the concern here?

nyclounge1y ago

Why is ClickHouse exposing unauthenticated database access at port 9000 to the public? Is this the default behavior or did DeepSeek open it up for dev purposes?

AlexClickHouse1y ago

ClickHouse does not allow external connections by default.

If someone wants to configure an unauthenticated access from the Internet, they have to do the following extra steps:

- enable listening to the wildcard address;

- remove IP filtering for the default user;

- set up a no-password authentication;

https://play.clickhouse.com/

https://adsb.exposed/

https://reversedns.space/

In this way, ClickHouse can be used to implement public data APIs (which is probably not what DeepSeek wanted).

2 more replies

ceejayoz1y ago

That used to be the default setup for Redis, too. Might still be. You aren’t supposed to have it on a public subnet.

2 more replies

jazzyjackson1y ago

  1. Enable SQL-driven access control and account management for the default user.
  2. Log in to the default user account and create all the required users. Don’t forget to create an administrator account (GRANT ALL ON *.* TO admin_user_account WITH GRANT OPTION).
  3. Restrict permissions for the default user and disable SQL-driven access control and account management for it.

[0] https://chistadata.com/knowledge-base/allow-clickhouse-to-ac...

[1] https://clickhouse.com/docs/en/operations/access-rights

3 more replies

kdmtctl1y ago

krick1y ago

I'm not sure "irresponsible" is the word. Shouldn't this be, like, punishable by law?

varenc1y ago

The vulnerable services were presumably fixed by the time this was published. I don't see anything wrong with releasing the details now.

tomlockwood1y ago

This doesn't look like a responsible disclosure, at all.

ed: I was wrong!

varenc1y ago

From the article:

> The Wiz Research team immediately and responsibly disclosed the issue to DeepSeek, which promptly secured the exposure.

Assuming everything mentioned in the article was fixed before publication, I don’t see an issue with it.

tomlockwood1y ago

Yeah my bad I missed that and edited OP.

CamelCaseName1y ago

Who's going to go after them? Heck, they may get an award for this.

megous1y ago

If random security researcher does this kind of disclosure, fine.

1 more reply

krick1y ago

Uh, I don't know, but cannot DeepSeek do that, for starters? Being located in a different country than the service you are attacking doesn't really make you immune to being sued.

samedev1y ago

Man! I used deepseek.com luckily I didn't use the same password as I use. :) Time to use ollama!

j / k navigate · click thread line to collapse