Hallucineted CVE against Curl: someone asked Bard to find a vulnerability (opens in new tab)

(mastodon.social)

200 pointsmartijnarts2y ago108 comments

108 comments

While somewhat off-topic, I had an interesting experience highlighting the utility of GitHub's Copilot the other day. I decided to run Copilot on a piece of code functioning correctly to see if it would identify any non-existent issues. Surprisingly, it managed to pinpoint an actual bug. Following this discovery, I asked Copilot to generate a unit test to better understand the identified issue. Upon running the test, the program crashed just as Copilot had predicted. I then refactored the problematic lines as per Copilot's suggestions. This was my first time witnessing the effectiveness of Copilot in such a scenario, which provided small yet significant proof to me that Language Models can be invaluable tools for coding, capable of identifying and helping to resolve real bugs. Although they may have limitations, I believe any imperfections are merely temporary hurdles toward more robust coding assistants.

WanderPanda2y ago

Copilot at present capabilities is already so valuable that not having it in some environment gives me the „disabledness feeling“ that I otherwise only get when vim bindings are not enabled. Absolute miracle technology! I‘m sure in the not too distant future we‘ll have have privacy preserving, open source versions that are good enough to not shovel everything over to openai

jgalt2122y ago

> shovel everything over to openai

Seriously, if you're a niche market with specific know-how, the easiest way to broadly propagate this know-now is to use copilot.

sangnoir2y ago

That sounds like very basic code review - which I guess is useful in instances where one can't get a review from a human. If it has a low enough false-positive rate, it could be great as a background CI/CD bot that can chime in the PR/changeset comments to say "You may have a bug here"

TheBlight2y ago

One nice thing about a machine code reviewing is no tedious passive-aggressive interactions or subjective style feedback you feel compelled to take etc.

crazysim2y ago

This isn't always the case!

There was a code review Q/A model posted on /r/locallamas which was very amusingly StackOverflow sometimes.

ceedan2y ago

Discovering a bug, and reproducing via unit tests is very different than "a very basic code review"

sangnoir2y ago

Identifying potential bugs within a unit is only a part of a good code review; good code reviews also identify potential issues with broader system goals, readability, and idiomaticness, elegance and "taste" (e.g. pythonicity in Python) which require larger contexts than LLMs can currently muster.

So yes, the ability to identify a bug and providing a unit test to reproduce it is rather basic[1], compared to what a good code review can be.

1. An org I worked for had one such question for entry-level SWEs interviews in 3-parts: What's wrong with this code? Design test cases for it. Write the correct version (and check if the tests pass)

2 more replies

hluska2y ago

That is nothing like a ‘very basic code review.’ The LLM discovered a bug and reproduced it via a test.

sangnoir2y ago

What is the purpose of Code reviews, if not to identify potential issues?

3 more replies

chrisco2552y ago

Try it on a million line code base where it's not so cut and dry to even determine if the code is running correctly or what correctly means when it changes day to day.

Closi2y ago

"A tool is only useful if I can use it in every situation".

LLM's don't need to find every bug in your code - even if they found an additional 10% of genuine bugs compared to existing tools, it's still a pretty big improvement to code analysis.

In reality, I suspect the scope is much higher than 10%.

chrisco2552y ago

If it takes you longer to vet hallucinations than to just test your code better, is it an improvement? If you accept a bug fix for a hallucination that you got too lazy to check because you grew dependent on AI to do the analysis for you, and the bug "fix" itself causes other unforeseen issues or fails to recognize why an exception in this case might be worth preserving, is it really an improvement?

1 more reply

yjftsjthsd-h2y ago

Is it better or worse than a human, though?

inopinatus2y ago

It’s slightly worse than a junior developer, and just as confidently incorrect, but much faster to iterate.

Either is better than no assistant at all. With circumstantial caveats.

1 more reply

SirMaster2y ago

I would imagine worse, because a human has a much, much, much larger context size.

2 more replies

dzhiurgis2y ago

I've separated 5000 line class into smaller domains yesterday. It didn't provide end solution, it wasn't perfect, but gave me a good plan where to place what.

Once it is capable to process larger context windows it will become impossible to ignore.

ushakov2y ago

You can’t, it has a context size window of 8192 tokens. That’s like 1000 lines depending on programming language

ushakov2y ago

That’s rather an exception in my experience. For unit-tests it starts hallucinating hard once you have functions imported from other files. This is probably the reason most unit tests in their marketing materials are things like fibonacci…

rripken2y ago

How did you prompt Copilot to identify issues? In my experience the best I can do is to put in code comments of what what I want a snippet to do and copilot tries to write it. I haven't had good luck asking copilot to rewrite existing code. Nearest I've gotten is: // method2 is identical to method1 except it fixes the bugs public void method2(){

crazysim2y ago

Might be using the Copilot chat feature.

ChatGTP2y ago

These things are amazing when you first experience but I think in most cases the user fails to realise how common their particular bug is. But then you also need to realise there maybe bugs in what has been suggested. We all know there are issue with stack overflow responses too.

Probably 85% of codebase are just rehashes of the same stuff. Co-pilot has seen it all I guess.

pylua2y ago

This is a great use of ai. In all seriousness I can’t wait for the day it gets added to spring as a plug-in.

evrimoztamur2y ago

If not malicious, then this shows that there are people out there who don't quite know how much to rely on LLMs or understand the limits of their capabilities. It's distressing.

jerf2y ago

I can also attest as a moderator that there is some set of people out there who use LLMs, knowingly use LLMs, and will lie to your face that they aren't and aggressively argue about it.

The only really new aspect about that is the LLM part. The people who will truly bizarrely lie about total irrelevancies to people on the Internet even when they are fooling absolutely no one has always been small but non-zero.

filterfiber2y ago

The average person sadly just hears the marketed "artificial intelligence" and doesn't grasp that it simply predicts text.

It's really good at predicting text we like, but that's all it does.

It shouldn't be surprising that sometimes it's prediction is either wrong or unwanted.

Interestingly even intelligent, problem solving, educated humans "incorrectly predict" all the time.

yukkuri2y ago

Marketing is lying as much as you can without going to jail for it.

dylan6042y ago

please, even if they were caught by "the authorities", it would just be a fine of such low monetary value that it will be considered cost of doing business rather than punishment.

people don't get charged with criminal counts for something they did as an employee of a company

kordlessagain2y ago

> It's really good at predicting text we like, but that's all it does.

It's important to recognize that predicting text is not merely about guessing the next letter or word, but rather a complex set of probabilities grounded in language and context. When we look at language, we might see intricate relationships between letters, words, and ideas.

Starting with individual letters, like 't,' we can assign probabilities to their occurrence based on the language and alphabet we've studied. These probabilities enable us to anticipate the next character in a sequence, given the context and our familiarity with the language.

As we move to words, they naturally follow each other in a logical manner, contingent on the context. For instance, in a discussion about electronics, the likelihood of "effect" following "hall" is much higher than in a discourse about school buildings. These linguistic probabilities become even more pronounced when we construct sentences. One type of sentence tends to follow another, and the arrangement of words within them becomes predictable to some extent, again based on the context and training data.

Nevertheless, it's not only about probabilities and prediction. Language models, such as Large Language Models (LLMs), possess a capacity that transcends mere prediction. They can grapple with 'thoughts'—an abstract concept that may not always be apparent but is undeniably a part of their functionality. These 'thoughts' can manifest as encoded 'ideas' or concepts associated with the language they've learned.

It may be true that LLMs predict the next "thought" based on the corpus they were trained on, but it's not to say they can generalize this behavior, past what "ideas" they were trained on. I'm not claiming generalized intelligence exists, yet.

Much like how individual letters and words combine to create variables and method names in coding, the 'ideas' encoded within LLMs become the building blocks for complex language behavior. These ideas have varying weights and connections, and as a result, they can generate intricate responses. So, while the outcome may sometimes seem random, it's rooted in the very real complex interplay of ideas and their relationships, much like the way methods and variables in code are structured by the 'idea' they represent when laid out in a logical manner.

Language is a means to communicate thought, so it's not a huge surprise that words, used correctly, might convey an idea someone else can "process", and that likely includes LLMs. That we get so much useful content from LLMs is a good indication that they are dealing with "ideas" now, not just letters and words.

I realize that people are currently struggling with whether or not LLMs can "reason". For as many times as I've thought it was reasoning, I'm sure there are many times it wasn't reasoning well. But, did it ever "reason" at all, or was that simply an illusion, or happy coincidence based on probability?

The rub with the word "reasoning" is that it directly involves "being logical" and how we humans arrive at being logical is a bit of a mystery. It's logical to think a cat can't jump higher than a tree, but what if it was a very small tree? The ability to reason about cats jumping abilities doesn't require understanding trees come in different heights, rather that when we refer to "tree" we mean "something tall". So, reasoning has "shortcuts" to arrive at an answer about a thing, without looking at all the things probabilities. For whatever reason, most humans won't argue with you about tree height at that point and just reply "No, cats can't jump higher than a tree, but they can climb it." By adding the latter part, they are not arguing the point, but rather ensuring that someone can't pigeonhole their idea of truth of the matter.

Maybe when LLMs get as squirrely as humans in their thinking we'll finally admit they really do "reason".

filterfiber2y ago

> It's important to recognize that predicting text is not merely about guessing the next letter or word, but rather a complex set of probabilities grounded in language and context. When we look at language, we might see intricate relationships between letters, words, and ideas.

> Maybe when LLMs get as squirrely as humans in their thinking we'll finally admit they really do "reason".

I know we can argue about the definitions of "intelligence", "reasoning", or even "sentience". But at the end of the day we get a list of tokens, and list of probabilities for each token. Yes it is extremely good at predicting tokens which embed information, and are able to predict in-depth concepts and predict what at least appears to be reasoning.

Regardless, probabilities of course contain the possibility of being either incorrect, or undesirable.

inopinatus2y ago

In short: LLMs are plausibility engines

2 more replies

lcnPylGDnU4H9OF2y ago

> a cat can't jump higher than a tree

I've never seen a tree jump.

(An interesting thing is that many LLM models would actually be able to explain this joke accurately.)

keithnoizu2y ago

Which is fundamentally different from how our brain chains together thoughts when not actively engaging in meta thinking how? Especially once chain of thought etc. is applied.

masklinn2y ago

It seems very similar to the case of the lawyers who used an LLM as a case law search engine. The LLM spit out bogus cases, then when the judge asked them to produce the cases as the references let nowhere they asked the LLM to produce cases which it "did".

dotnet002y ago

Or similarly the case where a professor failed an entire class of students (resulting in their diplomas being denied) for cheating on the essays using AI because he asked an LLM if the essays were AI generated and it said yes.

masklinn2y ago

I’d not heard about that one, it’s hilarious.

1 more reply

greenyoda2y ago

Some discussions of the lawyer story, in case someone missed them:

https://news.ycombinator.com/item?id=36130354

https://news.ycombinator.com/item?id=36097900

https://news.ycombinator.com/item?id=36095352

tomjen32y ago

We don't know what we can do with it yet and we don't understand the limits of their capabilities. Ethan Mollick calls it the ragged frontier[0], and that may be as good a metaphor as any. Obviously a frontier has to be explored, but the nature of that is that most of the time you are on one side or the other of the frontier.

[0]: https://www.oneusefulthing.org/p/centaurs-and-cyborgs-on-the...

blibble2y ago

its sad, this kind of behaviour is going to ddos every aspect of society into the ground

nicman232y ago

if that is all it takes, then good

elicksaur2y ago

Yikes! This type of cynicism about society is scarier to me than anything LLMs will ever be. Seems rampant on the internet these days.

abnercoimbre2y ago

> you did not find anything worthy of reporting. You were fooled by an AI into believing that.

The author's right. Reading the report I was stunned; the person disclosing the so-called vulnerability said:

> To replicate the issue, I have searched in the Bard about this vulnerability.

Does Bard clearly warn to never rely on it for facts? I know OpenAI says "ChatGPT may give you inaccurate information" at the start of each session.

xd19362y ago

Oh yeah. Google has warnings like "Bard may display inaccurate or offensive information that doesn’t represent Google’s views" all over it; Permanently in the footer, on splash pages, etc.

abnercoimbre2y ago

Well Google has a branding problem with Bard... because everyone knows Google for search. "Surely Bard must be a reliable engine too."

filterfiber2y ago

> Does Bard clearly warn to never rely on it for facts? I know OpenAI says "ChatGPT may give you inaccurate information" at the start of each session.

I know I shouldn't be, but I'm surprised the disclosure is even needed. People clearly don't understand how LLMs work -

LLM's predict text. That's it, they're glorified autocomplete (that's really good). When their prediction is wrong we call it a "hallucination" for some reason. Humans do the same thing all the time. Of course it's not always correct!

masklinn2y ago

> People clearly don't understand how LLMs work -

Of course not. Most developers don't understand how LLM work, even roughly.

> Humans do the same thing all the time. Of course it's not always correct!

The difference is that LLMs can not acknowledge incompetence, are always confidently incorrect, and will never reach a stopping point, at best they'll start going circular.

nerpderp822y ago

Everything out of an LLM is a confabulation, but you can constrain the output space of that confab by restraining it with proper prompting. You could ask it to put confidence intervals for each one of its sentences (ask it in the prompt), but those will be confabulated as well, but will give it some self doubt, as for now, it hasn't been programmed with any. Probably costs a lot more in power to run it with doubt. :)

Edit, I played around with this. It looks like GPT4 has a guard against asking for this. It flat out refused with the two prompts I gave it to include confidence intervals. Maybe that is a good thing.

fomine32y ago

> The difference is that LLMs can not acknowledge incompetence, are always confidently incorrect, and will never reach a stopping point, at best they'll start going circular.

Like a smartass?

lazide2y ago

At least they don’t control any militaries yet, since the natural outgrowth of that is making reality meet their answers.

Or at least it would be if they were human!

ImAnAmateur2y ago

There's a second wind to this story in the Mastodon replies. It sounds like the LLM appeared to be basing this output on a CVE that hadn't yet been made public, implying that it had access to text that wasn't public. I can't quite tell if that's an accurate interpretation of what I'm reading.

>> @bagder it’s all the weirder because they aren’t even trying to report a new vulnerability. Their complaint seems to be that detailed information about a “vulnerability” is public. But that’s how public disclosure works? And open source? Like are they going to start submitting blog posts of vulnerability analysis and ask curl maintainers to somehow get the posts taken down???

>> @derekheld they reported this before that vulnerability was made public though

>> @bagder oh as in saying the embargo was broken but with LLM hallucinations as the evidence?

>> @derekheld something like that yes

jameshart2y ago

Took me a while to figure out from the toot thread and comment history, but it appears that the curl 8.4.0 release notes (https://daniel.haxx.se/blog/2023/10/11/curl-8-4-0/) referred to the fact that it included a fix for an undisclosed CVE (CVE-2023-38545); the reporter ‘searched in Bard’ for information about that CVE and was given hallucinated details utterly unrelated to the actual curl issue.

The reporter is complaining that they thought this constituted a premature leak of a predisclosure CVE, and was reporting this as a security issue to curl via HackerOne.

Arnavion2y ago

No, it's not that Bard was trained on information that wasn't public. It's that the author of the report thought that the information about the upcoming CVE was public somewhere because Bard was reproducing it, because the author thinks Bard is a search engine. So they filed a report that the curl devs should take that information offline until the embargo is lifted.

sfink2y ago

Which is a fair request. Perhaps Bard should be taken offline.

The curl devs might even be the right ones to do it, if they slipped a DDOS into the code...

INTPenis2y ago

Just what do you think you are doing sfink?

orf2y ago

> I responsibly disclosed the information as soon as I found it. I believe there is a better way to communicate to the researchers, and I hope that the curl staff can implement it for future submissions to maintain a better relationship with the researcher community. Thank you!

… yeah…

HtmlProgrammer2y ago

Poor fella was embarrassed and looking to throw anything back at them

hypeatei2y ago

It looks like AI generated that response.

openasocket2y ago

I was curious how many bogus security reports big open source projects have. If you go to https://hackerone.com/curl/hacktivity and scroll down to ones marked as "Not-applicable" you can find some additional examples. No other LLM hallucinations, but some pretty poorly-thought out "bugs".

chrsig2y ago

Perhaps not useful to the conversation, but I really wish that whomever coined the behavior as a 'hallucination' had consulted a dictionary first.

It's delusional, not hallucinated.

Delusions are the irrational holdings of false belief, especially after contrary evidence has been provided.

Hallucinations are false sensations or perceptions of things that do not exist.

May some influential ML person read this and start to correct the vocabulary in the field :)

65a2y ago

Confabulation seems better aligned to neuropsychology, as far as I can tell: https://en.wikipedia.org/wiki/Confabulation

garba_dlm2y ago

cool, the scientific name for studying gaslighting!

groby_b2y ago

Gaslighting and confabulation are very different things.

Gaslighting are deliberate lies with the intent of creating self-doubt in the targeted person. Confabulation is creating falsehoods without an intent to deceive.

When we're discussing naming, it might be a good idea not to throw more misleading names onto the bonfire.

1 more reply

nikanj2y ago

That one is not part of everyone’s vocabulary

oxygen_crisis2y ago

That's even better, then, to address the issue of laypeople misinterpreting a distinctive problem according to familiar, overloaded definitions of the word used to refer to it.

baq2y ago

Not too late still, though an uphill battle

2 more replies

SrslyJosh2y ago

LLMs do not have beliefs, so "delusion" is no better than "hallucination". As statistical models of texts, LLMs do not deal in facts, beliefs, logic, etc., so anthropomorphizing them is counter-productive.

An LLM is doing the exact same thing when it generates "correct" text that it's doing when it generates "incorrect" text: repeatedly choosing the most likely next token based on a sequence of input and the weights it learned from training data. The meaning of the tokens is irrelevant to the process. This is why you cannot trust LLM output.

bregma2y ago

I think the right word is "bullshit". LLMs are neither delusional nor hallucinating since they have no beliefs or sensory input. The just generate loads of fertilizer and a lot of people like to spread it around.

wlonkly2y ago

I've been calling it bullshit too, because the thing about bullshitting is that the truth is irrelevant to a good story.

asadotzler2y ago

This is the correct answer. It's not a hallucination. It's goal is to create something that seems like the truth despite the fact that it has no idea if it's actually being truthful. If a human were doing this we'd call them a bullshitter or of they were good at it, maybe even a bullshit artist.

SkyPuncher2y ago

I think it’s appropriate.

Delusion tends to describe a state of being, in the form of delusional. Hallucinations tend to be used to describe an instance or finite event.

Broadly, LLMs are not delusions, but they do perceive false information.

Brian_K_White2y ago

The llm has neither of these, so neither is more correct or incorrect than the other.

jazzyjackson2y ago

IMHO it's fine to have a certain jargon within the context of "things neural nets do" and comes from the days of Deep Dream, when image classifiers were run in reverse and introduced the public to computer-generated-images that were quite psychedelic in nature. It's seeing things that aren't there.

inopinatus2y ago

LLMs don’t hold beliefs. Believing otherwise is itself a delusion.

In addition, the headline posted here doesn’t even say hallucinated, so that is also an hallucination. It says hallucineted. As portmanteaux go, that ain’t bad. I rather like the sense of referring to LLMs as hallucinets.

dylan6042y ago

The phrase "you must be trippin'!" is commonly used by some when they say something completely nonsensical. I can easily see where how/why hallucinating was chosen.

It's clearly meant to poke fun of the system. If you think people are going to NOT use words in jest while making fun of something, perhaps you could use a little less starch in your clothing.

giantrobot2y ago

I prefer the term confabulation. To the AI the made up thing isn't necessarily irrational. It's in fact very rational, simply incorrect.

asadotzler2y ago

aka bullshit. it is a bullshitter or a bullshit artist, virtually synonymous with confabulator.

user_78322y ago

I propose using delirium/delirious to describe the software.

seniorsassycat2y ago

So the reporter thinks that they were able to get accurate info about private details of a embargo'ed cve from Bard. If correct they would have found a cve in bard, not in curl.

In this case the curl maintainers can tell the details are made up and don't correspond to any cve.

nikanj2y ago

Mitre would probably still file a 10.0 CVE based on this report

bawolff2y ago

I'm not sure why this is interesting. AI was asked to make a fake vulnerability and it did. That's the sort of thing these AIs are good at, not exactly new at this point.

geraldcombs2y ago

You're leaving out the "...and then they reported it to the project" part, which meant that the project maintainers had to put in time and effort responding to a reported vulnerability.

bawolff2y ago

As someone who has been on the maintainer side of a bug bounty program - they are a mountain of BS with 1% being diamonds. This report probably didn't make much of a difference.

mcmoor2y ago

For one thing for the last week I've seen several articles about "curl is vulnerable and will be exposed soon!!". For it to turn out this way is certainly a plot twist.

bawolff2y ago

This is not the way that turned it out. The curl vuln everyone was fretting about was https://curl.se/docs/CVE-2023-38545.html still very much a serious and real vulnerability.

19h2y ago

I’m doing reverse engineering work every now and then and a year ago I’d have called myself a fool but I have found multiple exploitable vulnerabilities simply by asking an LLM (Claude refuses less often than GPT4, GPT4 generally got better results when properly phrasing the request).

One interesting find is that I wrote an integration with GPT4 for binaryninja and funnily enough when asking the LLM to rewrite a function into “its idiomatic equivalent, refactored and simplified without detail removal” and then asking it to find vulnerabilities, it cracked most of our joke-hack-me’s in a matter of minutes.

Interesting learning: nearly all LLMs can’t really properly work with disassembled Rust binaries, I guess that’s because the output doesn’t exactly resemble the rust code like it’d do in C and C++.

zamalek2y ago

The difference is that you'd at least try to compile the alleged exploit before disclosing it.

The usefulness of AI is inversely proportional to the laziness of its operator, and such a golden hammer is surefire fly's shit for lazy people.

But totally, actual pure gold in responsible hands.

tklinglol2y ago

This is confusing - the reporter claims to have "crafted the exploit" using the info they got from Bard. So the hallucinated info was actionable enough to actually perform the/an exploit, even though the report was closed as bogus?

openasocket2y ago

No, they weren't able to "craft the exploit". The text claims an integer overflow bug in curl_easy_setopt, and provides a code snippet that fixes it. Except the code snippet has a completely different function signature than the real curl_easy_setopt, and doesn't even compile. I doubt this person did any follow through at all, just copy/pasted the output from Bard directly into this bug report.

Khoth2y ago

The thing they're they're reporting is that a CVE leaked and Bard found out about it before public disclosure.

Except that it's false because Bard made it up. There's no real curl exploit involved.

sunbum2y ago

Or lied about crafting an exploit for a potential bug bounty payout

pengaru2y ago

ChatGPT is the epitome of a useful idiot.

j / k navigate · click thread line to collapse

108 comments

_pdp_2y ago

WanderPanda2y ago

jgalt2122y ago

> shovel everything over to openai

Seriously, if you're a niche market with specific know-how, the easiest way to broadly propagate this know-now is to use copilot.

sangnoir2y ago

TheBlight2y ago

One nice thing about a machine code reviewing is no tedious passive-aggressive interactions or subjective style feedback you feel compelled to take etc.

crazysim2y ago

This isn't always the case!

There was a code review Q/A model posted on /r/locallamas which was very amusingly StackOverflow sometimes.

ceedan2y ago

Discovering a bug, and reproducing via unit tests is very different than "a very basic code review"

sangnoir2y ago

So yes, the ability to identify a bug and providing a unit test to reproduce it is rather basic[1], compared to what a good code review can be.

1. An org I worked for had one such question for entry-level SWEs interviews in 3-parts: What's wrong with this code? Design test cases for it. Write the correct version (and check if the tests pass)

2 more replies

hluska2y ago

That is nothing like a ‘very basic code review.’ The LLM discovered a bug and reproduced it via a test.

sangnoir2y ago

What is the purpose of Code reviews, if not to identify potential issues?

3 more replies

chrisco2552y ago

Try it on a million line code base where it's not so cut and dry to even determine if the code is running correctly or what correctly means when it changes day to day.

Closi2y ago

"A tool is only useful if I can use it in every situation".

LLM's don't need to find every bug in your code - even if they found an additional 10% of genuine bugs compared to existing tools, it's still a pretty big improvement to code analysis.

In reality, I suspect the scope is much higher than 10%.

chrisco2552y ago

1 more reply

yjftsjthsd-h2y ago

Is it better or worse than a human, though?

inopinatus2y ago

It’s slightly worse than a junior developer, and just as confidently incorrect, but much faster to iterate.

Either is better than no assistant at all. With circumstantial caveats.

1 more reply

SirMaster2y ago

I would imagine worse, because a human has a much, much, much larger context size.

2 more replies

dzhiurgis2y ago

I've separated 5000 line class into smaller domains yesterday. It didn't provide end solution, it wasn't perfect, but gave me a good plan where to place what.

Once it is capable to process larger context windows it will become impossible to ignore.

ushakov2y ago

You can’t, it has a context size window of 8192 tokens. That’s like 1000 lines depending on programming language

ushakov2y ago

rripken2y ago

crazysim2y ago

Might be using the Copilot chat feature.

ChatGTP2y ago

Probably 85% of codebase are just rehashes of the same stuff. Co-pilot has seen it all I guess.

pylua2y ago

This is a great use of ai. In all seriousness I can’t wait for the day it gets added to spring as a plug-in.

evrimoztamur2y ago

If not malicious, then this shows that there are people out there who don't quite know how much to rely on LLMs or understand the limits of their capabilities. It's distressing.

jerf2y ago

I can also attest as a moderator that there is some set of people out there who use LLMs, knowingly use LLMs, and will lie to your face that they aren't and aggressively argue about it.

filterfiber2y ago

The average person sadly just hears the marketed "artificial intelligence" and doesn't grasp that it simply predicts text.

It's really good at predicting text we like, but that's all it does.

It shouldn't be surprising that sometimes it's prediction is either wrong or unwanted.

Interestingly even intelligent, problem solving, educated humans "incorrectly predict" all the time.

yukkuri2y ago

Marketing is lying as much as you can without going to jail for it.

dylan6042y ago

please, even if they were caught by "the authorities", it would just be a fine of such low monetary value that it will be considered cost of doing business rather than punishment.

people don't get charged with criminal counts for something they did as an employee of a company

kordlessagain2y ago

> It's really good at predicting text we like, but that's all it does.

Maybe when LLMs get as squirrely as humans in their thinking we'll finally admit they really do "reason".

filterfiber2y ago

> Maybe when LLMs get as squirrely as humans in their thinking we'll finally admit they really do "reason".

Regardless, probabilities of course contain the possibility of being either incorrect, or undesirable.

inopinatus2y ago

In short: LLMs are plausibility engines

2 more replies

lcnPylGDnU4H9OF2y ago

> a cat can't jump higher than a tree

I've never seen a tree jump.

(An interesting thing is that many LLM models would actually be able to explain this joke accurately.)

keithnoizu2y ago

Which is fundamentally different from how our brain chains together thoughts when not actively engaging in meta thinking how? Especially once chain of thought etc. is applied.

masklinn2y ago

dotnet002y ago

masklinn2y ago

I’d not heard about that one, it’s hilarious.

1 more reply

greenyoda2y ago

Some discussions of the lawyer story, in case someone missed them:

https://news.ycombinator.com/item?id=36130354

https://news.ycombinator.com/item?id=36097900

https://news.ycombinator.com/item?id=36095352

tomjen32y ago

[0]: https://www.oneusefulthing.org/p/centaurs-and-cyborgs-on-the...

blibble2y ago

its sad, this kind of behaviour is going to ddos every aspect of society into the ground

nicman232y ago

if that is all it takes, then good

elicksaur2y ago

Yikes! This type of cynicism about society is scarier to me than anything LLMs will ever be. Seems rampant on the internet these days.

abnercoimbre2y ago

> you did not find anything worthy of reporting. You were fooled by an AI into believing that.

The author's right. Reading the report I was stunned; the person disclosing the so-called vulnerability said:

> To replicate the issue, I have searched in the Bard about this vulnerability.

Does Bard clearly warn to never rely on it for facts? I know OpenAI says "ChatGPT may give you inaccurate information" at the start of each session.

xd19362y ago

Oh yeah. Google has warnings like "Bard may display inaccurate or offensive information that doesn’t represent Google’s views" all over it; Permanently in the footer, on splash pages, etc.

abnercoimbre2y ago

Well Google has a branding problem with Bard... because everyone knows Google for search. "Surely Bard must be a reliable engine too."

filterfiber2y ago

> Does Bard clearly warn to never rely on it for facts? I know OpenAI says "ChatGPT may give you inaccurate information" at the start of each session.

I know I shouldn't be, but I'm surprised the disclosure is even needed. People clearly don't understand how LLMs work -

masklinn2y ago

> People clearly don't understand how LLMs work -

Of course not. Most developers don't understand how LLM work, even roughly.

> Humans do the same thing all the time. Of course it's not always correct!

The difference is that LLMs can not acknowledge incompetence, are always confidently incorrect, and will never reach a stopping point, at best they'll start going circular.

nerpderp822y ago

fomine32y ago

> The difference is that LLMs can not acknowledge incompetence, are always confidently incorrect, and will never reach a stopping point, at best they'll start going circular.

Like a smartass?

lazide2y ago

At least they don’t control any militaries yet, since the natural outgrowth of that is making reality meet their answers.

Or at least it would be if they were human!

ImAnAmateur2y ago

>> @derekheld they reported this before that vulnerability was made public though

>> @bagder oh as in saying the embargo was broken but with LLM hallucinations as the evidence?

>> @derekheld something like that yes

jameshart2y ago

The reporter is complaining that they thought this constituted a premature leak of a predisclosure CVE, and was reporting this as a security issue to curl via HackerOne.

Arnavion2y ago

sfink2y ago

Which is a fair request. Perhaps Bard should be taken offline.

The curl devs might even be the right ones to do it, if they slipped a DDOS into the code...

INTPenis2y ago

Just what do you think you are doing sfink?

orf2y ago

… yeah…

HtmlProgrammer2y ago

Poor fella was embarrassed and looking to throw anything back at them

hypeatei2y ago

It looks like AI generated that response.

openasocket2y ago

chrsig2y ago

Perhaps not useful to the conversation, but I really wish that whomever coined the behavior as a 'hallucination' had consulted a dictionary first.

It's delusional, not hallucinated.

Delusions are the irrational holdings of false belief, especially after contrary evidence has been provided.

Hallucinations are false sensations or perceptions of things that do not exist.

May some influential ML person read this and start to correct the vocabulary in the field :)

65a2y ago

Confabulation seems better aligned to neuropsychology, as far as I can tell: https://en.wikipedia.org/wiki/Confabulation

garba_dlm2y ago

cool, the scientific name for studying gaslighting!

groby_b2y ago

Gaslighting and confabulation are very different things.

Gaslighting are deliberate lies with the intent of creating self-doubt in the targeted person. Confabulation is creating falsehoods without an intent to deceive.

When we're discussing naming, it might be a good idea not to throw more misleading names onto the bonfire.

1 more reply

nikanj2y ago

That one is not part of everyone’s vocabulary

oxygen_crisis2y ago

That's even better, then, to address the issue of laypeople misinterpreting a distinctive problem according to familiar, overloaded definitions of the word used to refer to it.

baq2y ago

Not too late still, though an uphill battle

2 more replies

SrslyJosh2y ago

bregma2y ago

wlonkly2y ago

I've been calling it bullshit too, because the thing about bullshitting is that the truth is irrelevant to a good story.

asadotzler2y ago

SkyPuncher2y ago

I think it’s appropriate.

Delusion tends to describe a state of being, in the form of delusional. Hallucinations tend to be used to describe an instance or finite event.

Broadly, LLMs are not delusions, but they do perceive false information.

Brian_K_White2y ago

The llm has neither of these, so neither is more correct or incorrect than the other.

jazzyjackson2y ago

inopinatus2y ago

LLMs don’t hold beliefs. Believing otherwise is itself a delusion.

dylan6042y ago

The phrase "you must be trippin'!" is commonly used by some when they say something completely nonsensical. I can easily see where how/why hallucinating was chosen.

It's clearly meant to poke fun of the system. If you think people are going to NOT use words in jest while making fun of something, perhaps you could use a little less starch in your clothing.

giantrobot2y ago

I prefer the term confabulation. To the AI the made up thing isn't necessarily irrational. It's in fact very rational, simply incorrect.

asadotzler2y ago

aka bullshit. it is a bullshitter or a bullshit artist, virtually synonymous with confabulator.

user_78322y ago

I propose using delirium/delirious to describe the software.

seniorsassycat2y ago

So the reporter thinks that they were able to get accurate info about private details of a embargo'ed cve from Bard. If correct they would have found a cve in bard, not in curl.

In this case the curl maintainers can tell the details are made up and don't correspond to any cve.

nikanj2y ago

Mitre would probably still file a 10.0 CVE based on this report

bawolff2y ago

I'm not sure why this is interesting. AI was asked to make a fake vulnerability and it did. That's the sort of thing these AIs are good at, not exactly new at this point.

geraldcombs2y ago

You're leaving out the "...and then they reported it to the project" part, which meant that the project maintainers had to put in time and effort responding to a reported vulnerability.

bawolff2y ago

As someone who has been on the maintainer side of a bug bounty program - they are a mountain of BS with 1% being diamonds. This report probably didn't make much of a difference.

mcmoor2y ago

For one thing for the last week I've seen several articles about "curl is vulnerable and will be exposed soon!!". For it to turn out this way is certainly a plot twist.

bawolff2y ago

This is not the way that turned it out. The curl vuln everyone was fretting about was https://curl.se/docs/CVE-2023-38545.html still very much a serious and real vulnerability.

19h2y ago

zamalek2y ago

The difference is that you'd at least try to compile the alleged exploit before disclosing it.

The usefulness of AI is inversely proportional to the laziness of its operator, and such a golden hammer is surefire fly's shit for lazy people.

But totally, actual pure gold in responsible hands.

tklinglol2y ago

openasocket2y ago

Khoth2y ago

The thing they're they're reporting is that a CVE leaked and Bard found out about it before public disclosure.

Except that it's false because Bard made it up. There's no real curl exploit involved.

sunbum2y ago

Or lied about crafting an exploit for a potential bug bounty payout

pengaru2y ago

ChatGPT is the epitome of a useful idiot.

j / k navigate · click thread line to collapse