Seriously, if you're a niche market with specific know-how, the easiest way to broadly propagate this know-now is to use copilot.
LLM's don't need to find every bug in your code - even if they found an additional 10% of genuine bugs compared to existing tools, it's still a pretty big improvement to code analysis.
In reality, I suspect the scope is much higher than 10%.
Once it is capable to process larger context windows it will become impossible to ignore.
Probably 85% of codebase are just rehashes of the same stuff. Co-pilot has seen it all I guess.
The only really new aspect about that is the LLM part. The people who will truly bizarrely lie about total irrelevancies to people on the Internet even when they are fooling absolutely no one has always been small but non-zero.
It's really good at predicting text we like, but that's all it does.
It shouldn't be surprising that sometimes it's prediction is either wrong or unwanted.
Interestingly even intelligent, problem solving, educated humans "incorrectly predict" all the time.
It's important to recognize that predicting text is not merely about guessing the next letter or word, but rather a complex set of probabilities grounded in language and context. When we look at language, we might see intricate relationships between letters, words, and ideas.
Starting with individual letters, like 't,' we can assign probabilities to their occurrence based on the language and alphabet we've studied. These probabilities enable us to anticipate the next character in a sequence, given the context and our familiarity with the language.
As we move to words, they naturally follow each other in a logical manner, contingent on the context. For instance, in a discussion about electronics, the likelihood of "effect" following "hall" is much higher than in a discourse about school buildings. These linguistic probabilities become even more pronounced when we construct sentences. One type of sentence tends to follow another, and the arrangement of words within them becomes predictable to some extent, again based on the context and training data.
Nevertheless, it's not only about probabilities and prediction. Language models, such as Large Language Models (LLMs), possess a capacity that transcends mere prediction. They can grapple with 'thoughts'—an abstract concept that may not always be apparent but is undeniably a part of their functionality. These 'thoughts' can manifest as encoded 'ideas' or concepts associated with the language they've learned.
It may be true that LLMs predict the next "thought" based on the corpus they were trained on, but it's not to say they can generalize this behavior, past what "ideas" they were trained on. I'm not claiming generalized intelligence exists, yet.
Much like how individual letters and words combine to create variables and method names in coding, the 'ideas' encoded within LLMs become the building blocks for complex language behavior. These ideas have varying weights and connections, and as a result, they can generate intricate responses. So, while the outcome may sometimes seem random, it's rooted in the very real complex interplay of ideas and their relationships, much like the way methods and variables in code are structured by the 'idea' they represent when laid out in a logical manner.
Language is a means to communicate thought, so it's not a huge surprise that words, used correctly, might convey an idea someone else can "process", and that likely includes LLMs. That we get so much useful content from LLMs is a good indication that they are dealing with "ideas" now, not just letters and words.
I realize that people are currently struggling with whether or not LLMs can "reason". For as many times as I've thought it was reasoning, I'm sure there are many times it wasn't reasoning well. But, did it ever "reason" at all, or was that simply an illusion, or happy coincidence based on probability?
The rub with the word "reasoning" is that it directly involves "being logical" and how we humans arrive at being logical is a bit of a mystery. It's logical to think a cat can't jump higher than a tree, but what if it was a very small tree? The ability to reason about cats jumping abilities doesn't require understanding trees come in different heights, rather that when we refer to "tree" we mean "something tall". So, reasoning has "shortcuts" to arrive at an answer about a thing, without looking at all the things probabilities. For whatever reason, most humans won't argue with you about tree height at that point and just reply "No, cats can't jump higher than a tree, but they can climb it." By adding the latter part, they are not arguing the point, but rather ensuring that someone can't pigeonhole their idea of truth of the matter.
Maybe when LLMs get as squirrely as humans in their thinking we'll finally admit they really do "reason".
https://news.ycombinator.com/item?id=36130354
[0]: https://www.oneusefulthing.org/p/centaurs-and-cyborgs-on-the...
The author's right. Reading the report I was stunned; the person disclosing the so-called vulnerability said:
> To replicate the issue, I have searched in the Bard about this vulnerability.
Does Bard clearly warn to never rely on it for facts? I know OpenAI says "ChatGPT may give you inaccurate information" at the start of each session.
I know I shouldn't be, but I'm surprised the disclosure is even needed. People clearly don't understand how LLMs work -
LLM's predict text. That's it, they're glorified autocomplete (that's really good). When their prediction is wrong we call it a "hallucination" for some reason. Humans do the same thing all the time. Of course it's not always correct!
Of course not. Most developers don't understand how LLM work, even roughly.
> Humans do the same thing all the time. Of course it's not always correct!
The difference is that LLMs can not acknowledge incompetence, are always confidently incorrect, and will never reach a stopping point, at best they'll start going circular.
>> @bagder it’s all the weirder because they aren’t even trying to report a new vulnerability. Their complaint seems to be that detailed information about a “vulnerability” is public. But that’s how public disclosure works? And open source? Like are they going to start submitting blog posts of vulnerability analysis and ask curl maintainers to somehow get the posts taken down???
>> @derekheld they reported this before that vulnerability was made public though
>> @bagder oh as in saying the embargo was broken but with LLM hallucinations as the evidence?
>> @derekheld something like that yes
The reporter is complaining that they thought this constituted a premature leak of a predisclosure CVE, and was reporting this as a security issue to curl via HackerOne.
The curl devs might even be the right ones to do it, if they slipped a DDOS into the code...
… yeah…
It's delusional, not hallucinated.
Delusions are the irrational holdings of false belief, especially after contrary evidence has been provided.
Hallucinations are false sensations or perceptions of things that do not exist.
May some influential ML person read this and start to correct the vocabulary in the field :)
An LLM is doing the exact same thing when it generates "correct" text that it's doing when it generates "incorrect" text: repeatedly choosing the most likely next token based on a sequence of input and the weights it learned from training data. The meaning of the tokens is irrelevant to the process. This is why you cannot trust LLM output.
Delusion tends to describe a state of being, in the form of delusional. Hallucinations tend to be used to describe an instance or finite event.
Broadly, LLMs are not delusions, but they do perceive false information.
In addition, the headline posted here doesn’t even say hallucinated, so that is also an hallucination. It says hallucineted. As portmanteaux go, that ain’t bad. I rather like the sense of referring to LLMs as hallucinets.
It's clearly meant to poke fun of the system. If you think people are going to NOT use words in jest while making fun of something, perhaps you could use a little less starch in your clothing.
In this case the curl maintainers can tell the details are made up and don't correspond to any cve.
One interesting find is that I wrote an integration with GPT4 for binaryninja and funnily enough when asking the LLM to rewrite a function into “its idiomatic equivalent, refactored and simplified without detail removal” and then asking it to find vulnerabilities, it cracked most of our joke-hack-me’s in a matter of minutes.
Interesting learning: nearly all LLMs can’t really properly work with disassembled Rust binaries, I guess that’s because the output doesn’t exactly resemble the rust code like it’d do in C and C++.
The usefulness of AI is inversely proportional to the laziness of its operator, and such a golden hammer is surefire fly's shit for lazy people.
But totally, actual pure gold in responsible hands.
Except that it's false because Bard made it up. There's no real curl exploit involved.