But as to the hype, we are in a brief pause before the election where no company wants to release anything that would hit the news cycle in a bad way and cause knee-jerk legislation. Are there new architectures and capabilities waiting? Likely some. Sora showed state of the art video generation, OpenAI has demoed an impressive voice mode, and Anthropic has teased that Opus 3.5 will be even more capable. OpenAI also clearly has some gas in the tank as they have focused on releasing small models such as GPT-4o and 4o mini. And many have been musing about agents and methods to improve system 2 like reasoning.
So while there’s a soft moratorium on showing scary new capability there is still evidence of progress being made behind-the-scenes. But what will a state of the art model look like when all of these techniques have been scaled up on brand new exascale data centers?
It might not be AGI, but I think it will at least be enough for the next hype Investment bubble.
Its done, you can’t make another LLM, all knowledge from here on out is corrupted by them, you can never deliver an epistemic “update.” GPT will become a relic of the 21st century, like a magazine from the 1950s.
The knee-jerk legislation has mostly been caused by Altman's statements though. So I wouldn't call it knee-jerk, but an attempt by OpenAI to get a legally granted monopoly.
I think we will see a very nice boost in capability within 6 months of the election. I don’t personally believe all the apocalyptic AGI predictions, but I do think that AI will continue to feed a nice growth curve in IT investment and productivity growth, similar to the last few decades of IT investment.
Yes. There is also the Hype of the "End of the Hype Cycle". There is Hype that the Hype is ending.
When really, there is something amazing being released weekly.
People are so desensitized that just because we don't have androids walking the streets or suddenly have Blade Runner like space colonies staffed with robots, that somehow AI is over.
What people, including me, are massively fed up is all the companies (I mean ALL) jumping on AI bandwagon in a beautiful show of how FOMO works and how even CEOs/shareholders are not immune to basic instincts. Literal hammer looking desperately for nails. Very few companies have amazing or potentially amazing products, rest not so much.
I absolutely don't want every effin' thing infused with some AI, since it will be used to 1) monitor my usage or me directly for advertising / credit & insurance scoring purposes, absolutely 0 doubt there; and 2) it may stop working once wifi is down, product is deprecated or company changes its policy (Sonos anyone?). Internet of Things hate v2.0.
I hate this primitive AI fashion wave, negative added value in most cases, 0 in the rest, yet users have to foot the bill. Seeing some minor industry crash due to unfulfilled expectations is just logical in such case.
I disagree. Palantir is trading at 200X earnings.
So I've seen how the field has progressed and also have been able to look at it from a perspective most AI/engineering people don't -- what does this artificial intelligence look like when compared to biological intelligence. And I must say I am absolutely astonished people don't see this as opening the flood-gates to staggeringly powerful artificial intelligence. We've run the 4-minute mile. There are hundreds of billions of dollars figuring out how to get to the next level, and it's clear we are close. Forget what the current models are doing, it is what the next big leap (most likely with some new architecture change) will bring.
In focusing on intelligence we forget that it's most likely a much easier challenge than decentralized cheap autonomy, which is what took the planet 4 billion years to figure out. Once that was done, intelligence as we recognize it took an eye-blink. Just like with powered-flight we don't need bioliogical intelligence to transform the world. Artificial intelligence that guzzles electricity, is brittle, has blind spots, but still capable of 1000 times more than the best among us is going to be here within the next decade. It's not here yet, no doubt, but I am yet to see any reasoned argument for why it is far more difficult and will take far longer. We are in for radical non-linear change.
The breakthroughs where deep AI have excelled -- object recognition in images, voice recognition and generation, and text-based info embedding and retrieval -- these require none of the multilevel abstraction that characterizes higher cognition (Kahneman's system 2 thinking). When we see steady progress on such frontiers, only then a plausible case can and should be made that the essentials of AGI are indeed within our grasp. Until then, plateauing at a higher level of pattern matching than we had expected -- which is what we have seen many times before from narrow AI -- this is not sufficient evidence that the other requisite skills needed for AGI are surely just around the corner.
Coding assistants today are useful, image generation is useful, speach recognition/generation is useful.
All of these can support businesses, even in their current (early) state. Those businesses have value in funding even 1% improvements in engineering/science.
I think that this is different than before, where even in the 80s there were less defined products, amd most everything was a prototype that needed just a bit more research to be commercially viable.
Where as in the past, hopes for the technology waned and funding for research dropped off a cliff, today's stuff is useful now, and so companies will continue to spend some amount on the research side.
You can see this in action with multiplication. Much like humans when asked to guess the answer, they'll get it wrong, unless they know the answer from rote learning multiplication tables, this System-1 thinking. In many cases when asked they can reason further and solve it, by breaking it down and solving it step by step, much like a human, this is system-2 thinking.
In my opinion, it seems nearly everything is there for it it to take the next leap in intelligence, it's just putting it all together.
The things that LLM’s are bad at are largely solved problems using much simpler technology. There is no reason that LLM’s have to be the only component in an intelligent agent. Biological brains have Specialized structures for specialized tasks like arithmetic. The solution is probably integration of LLMs as a part of a composite system that includes database storage, a code execution environment, and multiple agents to form a goal directed posit - evaluate loop.
I’ve had pretty remarkable success with this architecture running on 12b models and I’m a nobody with no resources.
LLM’s by themselves just come up with the first thing that crosses their”mind”. It shouldn’t be surprising that the very first unfiltered guess about a solution might be suboptimal.
There is a vast amount of knowledge embedded in our cultural matrix, and a lot of that is captured in the common crawl and other datasets.llms are like a search engine for that data , based on meaning rather than semantics.
Some of the most advanced AI are tool users and can both write and crucially also execute python, and embed the output in their responses.
> or even solve really simple logic puzzles for little kids.
As given in a recent discussion: https://chatgpt.com/share/ee013797-a55c-4685-8f2b-87f1b455b4...
(Custom instructions, in case you're surprised by the opening of the response).
Finding bugs in some models doesn’t mean you have a point about intelligence. If somebody could apply a similar argument to dismiss human intelligence, you don’t have a point. And here it goes: the most advanced human intelligence can’t reliably multiply large numbers or recall digits of Pi. Obviously humans are dumber than pocket calculators.
I am yet to see any reasoned argument for why it is easy to build real AI and that it will come fast.
As you said, AI has been there for decades and stagnated for pretty much the whole time. We've just had a big leap, but nothing says (except BS hype) that we're not in for a long plateau again.
We have "real ai" already.
As for future progress, have you tried just simple interpolation of the progress so far? Human level intelligence is very near. (Though of course artificial intelligence will never exactly match human intelligence: it will be ahead/behind in certain aspects...)
The thing that's different this time is the hardware capacity in TFLOPs and the like passing human brain equivalence.
There's a massive difference between much worse than human AI - a bit meh, and better than human AI - changes everything.
>any reasoned argument for why it is easy to build real AI and that it will come fast
It probably won't be easy but the huge value of better than human AI will ensure loads of the best and brightest working on it.
For language models specifically, they are trained on data and have historically been improved by increasing the size of the model (by number of parameters) and by the amount and/or quality of training data.
We are basically out of new, non-synthetic text to train models on and it’s extremely hard work to come up with novel architecture that performs well against transformers.
Those are some simple reasons why it will be far more difficult to improve general language models.
There are also papers showing that training models on synthetic data causes “model collapse” and greatly reduces output quality by magnifying errors already present in the model, so it’s not a problem we can easily sidestep.
It’s an easy mistake to see something like chatgpt not exist, then suddenly exist and assume a major breakthrough happened, but behind the scenes there has been like 50 years of R&D that led to it, it’s not like suddenly there was a breakthrough and now the gates are open.
A general intelligence for CS is like the elixir of life for medicine.
this is not even remotely true.
There is an astronomical amount of data siloed by publishers, professional journals etc. that is yet to be tapped.
OpenAI is making inroads by making deals with these content owners for access to all that juicy data.
Are we really now?
The smart people I've spoken to on the subject seem to agree the current technology based on LLM are at the end of the road and that there are no breakthrough in sight.
So what is your take on the next level?
People focused on the products are missing out on the dawn of an epoch. It's a failure of perspective and creativity that's thankfully not universal.
Powered flight offers a cautionary tale for AI. The first confirmed powered flight was in 1903. For the next 60 years, someone broke the airspeed record almost every year. The current record was set in 1976. Nobody has broken that record for 48 years. There are concerns that the state of AI will show a similar pattern, with rapid improvements followed by a plateau.
This means you are sure we are close to automated driving, engineering and hospitality?
We already have "automated driving" in some sense. Some cities have fully autonomous taxi services that have operated for a year or more, iirc.
I can put your brain in a vat and stimulate your sensory neurons with a statistical distribution with no actual meaning, and nothing about how your brain works would change either.
The LLM and your brain would attempt to interpret meaning with referent from training, and both would be confused at the information-free stimuli. Because during "training" in both cases, the stimuli received from the environment is structured and meaningful.
So what's your point?
By the way, pretty sure a neuroscientist with 20 years of ML experience has a deeper understanding of what "meaning" is than you do. Not to mention, your response reveals a significant ignorance of unresolved philosophical problems (hard problem of consciousness, what even is meaning) which you then use to incorrectly assume a foregone conclusion that whatever consciousness/meaning/reasoning is, LLMs must not have it.
I'm partial to doubting LLMs as they are now have the magic sauce, but it's more that we don't actually know enough to say otherwise, so why state that we do know?
We can't even say we know our own brains.
Can you explain what this means? Do you have a degree in neuroscience?
>We are in for radical non-linear change.
We aren't running miles much quicker than 4 mins though. The last record was 3m:43s set by Hicham El Guerrouj in 1999.
Humankind tried to break the 4 minute mile for hundreds of years - since measuring distance and time became accurate enough to be sure of both in the mid-18th century, at least - and failed.
In May 1954, Roger Bannister managed it. By late June it was done again by a different runner. Within 20 years the record was under 3:45, and today there are some runners who have achieved it more than 100 times and nearly 1800 runners who have done it at all.
Impossible for hundred of years, and then somebody did it, and people stopped thinking it was impossible and started doing it themselves. That’s the metaphor: sometimes we think of barriers that are really mental, not real.
I’m not sure that applies here either, but the point is not that progress is continuously exponential, but that once a barrier is conquered, we take on a perspective as if the barrier were never real in the first place. Powered flight went through this. Computing hardware too. It’s not an entirely foolish notion.
I'd like to believe it more than you do. Unfortunately, in spite of these millions of dollars, the progress on LLMs has stalled.
PS. I'm buying your book right now.
What signs do you see that make you believe that the next level (biological intelligence) is on the horizon?
If I had to bet, I would start with:
- Error-correcting specialized architectures for increasing signal-to-noise (as far as I can tell these are what everyone is racing to build this year, and should be doable with just conventional programming systems wrapping LLMs)
- Improved energy efficiency (as yes, human brains are currently much more efficient! But - there are also simple architecture improvements (both software and hardware) that are looking to save 100x. Specialized ASIC ternary chips using 1999's tech should be here quite soon, a lot more efficient in price and energy.)
- No Backwards-propagation. (As yes, the brain does seem to do it all with forward-propagation only. Though this is possible and promising in neural networks like the Forward-Forward algorithm too, they haven't been trained to the same scales as backprop-heavy transformers (and likely have a lower performance in terms of noise/accuracy). Though if I'm not mistaken, the brain does have forward-backward loops, but the signals go through separate neurons for each direction (rather than reusing one) - if so that's close to backprop by itself, but probably imposes a tradeoff as the same signal can't be perfectly reproduced backwards, yet it can perhaps be enhanced to be just the most relevant information by the separate specialized neuron. I'm obviously mostly ignorant of the neuroscience here but halfway-knowledgeable on the ML theory haha
But yes, I completely agree - the flood gates are already open. This is a few architecture quibbles away from an absolute deluge of artificial intelligence that will dwarf (drown?) anything we've known. Good point on decentralized cheap autonomy - the real accomplishment of life. Intelligence, as it appears, is just a fairly generous phenomenon where any autonomous process continually improves its signal-to-noise ratio... many ways to accomplish that one! Looking forward to seeing LLMs powered by ant colonies and slime molds, though I suspect by then there will be far more interesting and terrifying realities unlocked.
- we had some big breakthroughs recently
- some AI “godfathers” are “really worried”
And by the way I can copy your post character by character, without hallucinating. So I am definitely better than this crop of "AI" in at least one dimension.
This looks like a cognitive dissonance and they are addressed by revisiting your assumptions.
No flood-gates have been opened. ChatGPT definitely found uses in a few areas but the number is very far from what many people claimed. A few things are really good and people are using them successfully.
...But that's it. Absolutely nothing even resembling the beginnings of AGI is on the horizon and your assumption that the rate of progress will remain the same -- or even accelerate -- is a very classic mistake of the people who are enthusiasts in their fields.
> There are hundreds of billions of dollars figuring out how to get to the next level, and it's clear we are close.
This is not clear at all. If you know something that nobody else does, please let us know as well.
Perhaps it's confirmation bias ?
This wasn't the case with GPT-4/o. This capability is very new.
When I spoke to a colleague at Microsoft about these changes, they were floored. Microsoft has made themselves synonymous with AI, yet their company is barely even leveraging it. The big cos have put in the biggest investments, but also will be the slowest to change their processes and workflows to realize the shift.
Feels like one of those "future is here, not evenly distributed yet" moments. When a tool like Sonnet is released, it's not like big tech cos are going to transform over night. There's a massive capability overhang that will take some time to work itself through these (now) slow-moving companies.
I assume it was the same with the internet/dot-com crash.
I decided to fire up GPT-4o again today to see if maybe things have gotten better over the past few months.
I asked GPT to write code to render a triangle using Vulkan (a 3D graphics API). There are about 1000 tutorials on this that are almost certainly in GPT-4's training data. I gave GPT two small twists so it's not a simple case of copy/paste: I asked it 1) to apply a texture to the triangle and 2) to keep all the code in a single function. (Most tutorials break the code up into about a dozen functions, but each of these functions are called only once, so it should be trivial to inline them.)
Within the first ten lines, the code is already completely nonfunctional:
GPT-4o declares a pointer (VkPhysicalDevice) that is uninitialized. It queries the number of graphics devices on the host machine. A human being would allocate a buffer with that number of elements and store the reference in the pointer. GPT-4o just ignores the result. Completely ignores it. So the function call was just for fun, I guess? It then tries to copy an entire array of VkPhysicalDevice_T objects into this uninitialized pointer. So that's a guaranteed memory access violation right off the bat.
Some basic things are fine, but once you get into specialised things, everything gets just terribly wrong and weird. I can't even put it into words, I see people saying how they are 10x more productive (I'd like to see actual numbers and proof for this), but I just don't see how. Maybe we're working on very custom stuff, or very specific things, but all of these tools seem to give very deep or confident answers that are just plain wrong and shallow. Just yesterday I used GPT 4o for some basic help with Puppet, and the examples it printed, even though I would say it's quite basic, were just wrong, but in the sense of having to debug it for 2 hours just to figure out how ridiculous the error was.
I fear the fact that people will end up releasing unsafe, insecure and simply wrong code every day, code that they never debug and not even understand, that maybe works for a basic set of details, but once the real world hits it, it will fail like those self driving cars driving full speed into a trailer that has the same color as the road or sky.
"They clearly aren't using the right model!"
"It's obvious they don't know how to prompt, or they would see the value."
"Maybe it can't do that today, but GPT-5 is just around the corner."
I feel more and more that people have just decided that this is a technology that will do everything you can imagine, and no evidence to the contrary will change their priors.
By the way - I think AI or ML whatever has some valid uses right now. but mostly in image processing domain - so like recognizing shapes in some bounded domain OK yea. Generative image - NOT bad but theres always this "AI GLOW" to each image. Something is always off. Its some neat tools but a race to the bottom and mostly users want to generate explicit content lets be real. and they will become increasingly more creative and obtuse to get around the guards. nothing is stopping you from entering the * industry and making tons off money. but that industry is always doing good.
a friend recently suggested to use AI to generate generic icons for my game. Thats a really good use case. but does that radically change the current economy?
[BTW GENERIC STUFF ONLY UNTILL I could hire someone because i prefer that experience way more. you can get more interesting results. 4 eyes are better than 2.
For example, I can ask the LLM things like "What are the most common mistakes when using the Vulkan API to render a triangle with a texture?" and I'll very rapidly learn something about working with an API that I don't have deep understanding of, and I might not find a specific tutorial article about.
As another example, if I'm an experienced OpenGL programmer, I can ask directly "what's the Vulkan equivalent of this OpenGL API call?" and get quite good results back, most of the time.
So I'm asking questions where an 80% answer is still very valuable, and it's much faster than searching for documentation and doing a lot of comparison and headscratching, and it works well enough even when there's no specific article I could find in a variety of searches.
Anything better that the technology gets from here just makes things even easier yet!
For example, just now my NAS stopped working because the boot device went offline. So I got to thinking about writing a simple syslog server. I've never looked at the syslog protocol before, and I've never done any low-level TCP/UDP work in C# yet.
So I asked ChatGPT to generate some code[1], and while the result is not perfect it's certainly better than nothing, and would save me time to get going.
As another example, a friend who's not very technical wanted to make an Arduino circuit to perform some automated experiment. He's dabbled with programing and can modify code, but struggles to get going. Again just for kicks, I asked ChatGPT and it provided a very nice starting point[2].
For exploratory stuff like this, it seems to provide a nice alternative to searching and piecing together the bits. Revolutionary is a quite loaded word, but it's certainly not just a slight improvement on what we had before LLMs and instead feels like a quantum leap.
[1]: https://chatgpt.com/share/f4343939-74f1-404d-bfac-b903525f61... (modified, see reply)
[2]: https://chatgpt.com/share/fc764e73-f01f-4a7c-ab58-f43da3e077...
It seems to work best if I start with something very simple, and then layer on instructions ("now make it do X").
Where I have found it saves me time is in having to look up syntax or "gotchas" which I would otherwise search StackOverflow for. But as far as "writing code" -- it still feels a long way from that.
For example:
* I need a simple bash script for file manipulation or some simple tasks like setting up a project (example: download a secret from AWS SSM, check if an executable exist, if it doesn't write instructions on how to install it on most popular systems etc)
* I need a simple HTTP API, nothing fancy, maybe some simple database usage, maybe running some commands, simple error handling
* I need a YAML file for Kubernetes. I say what I need and usually, it gets most of it right
* I want an Ansible task for some simple thing. Ansible is quite verbose, so it's often saving me time
* I have a Kubernetes YAML file, but I want to manage it in terraform - I'll then ask to convert YAML to a terraform entry (and in general converting between formats is nice, cause even if you have only a piece of what you want to convert, LLMs will most of the time get it right)
* Surprisingly, it often gets openssl and ffmpeg commands right - something I always have to google anyway, especially openssl certificates generation or manipulation
* Giving it a function I wrote and writing test samples after providing a list of what it should test (and asking if it can come up with more, but sadly it rarely does generate anything useful on top of what I suggest)
A friend, whose SQL knowledge is minimal, used an LLM to query data from a database over a couple of tables. Yes, after a lot of trial and error he (most probably) got the correct data, however the only one being able to read the query is the LLM itself. It's full of coalesce, subselects that repeat the same joins again and again.
LLM will do a lot for you, but I really hate this "this will [already did] solve everything". No, it did not and no, because it's quality is those of a junior dev, at max.
And I think it'd be extremely easy to convince oneself of this. Look at where 'AI' was 5 years ago, look at where it is today and then try to imagine where it will be in another 5 years. Of course you have to completely blind yourself to the fact that the acceleration has clearly sharply stalled out, but humans are really good at cognitive dissonance, especially when your perception of your future depends on it.
And there's also the point that even though I'm extremely critical of LLMs in general, they have absolutely 'transformed' my workflow in that natural language search of documentation is really useful. Being able to describe a desired API, but in an overly broad way that a search engine can't really pick up on, but that an LLM [often] can, is just quite handy. On the other hand, this is more a condemnation of search engine tech being frozen 20 years in the past than it is about an imminent LLM revolution.
Especially if it's a question that's hard to Google, like "I remember there is more than one way to split an array in this language, list them". This saves me minutes every day.
But it's especially helpful if you are working on projects outside your own domain where you are a newbie.
Cursor is a purpose-built IDE for software development. The Cursor team has put a lot of research and sweat into providing the used LLMs (also from OpenAI/Anthropic) with:
- the right parts of your code
- relevant code/dependency documentation
- and, importantly, the right prompts.
to successfully complete coding tasks. It's an apple and oranges situation.
I work in 9 different projects now and I would say that around 80% of functional code comes from Sonnet (like GP) for these projects. These are not (all) trivial either; there is a very niche (for banking) key/value store written in Go for instance which has a lot of edge cases etc, all the plumbing (x,err = etc aka stuff people find annoying) comes from sonnet and works one-shot. A lot of business logic comes from sonnet too; it works but usually needs a little tweaking to make it correct.
Tests are all done by Sonnet. I think 80% is low balling it on Go code really.
We have a lot of complex code generator stuff and DSLs in TS which also works well often. Sometimes it gets some edge cases wrong, but either re-prompting with more details or fixing it ourselves, will do it. At a fraction of the time/money of what a fully human team would deliver.
I wrote a 3d editor for fun with Sonnet in a day.
I have terrible results with gpt/copilot (copilot is good for editing instead of complete files/functions; chatgpt is not good much compared with sonnet); it doesn't get close at all; it simply keeps giving me the same code over and over again when I say it's wrong; it hardcodes things specifically asked to make flexible etc. Not sure why the difference is so massive all of sudden.
Note: I use the sonnet API, not the web interface, but same for gpt so...
However once you step outside JS or Python, the models are essentially useless. Comprehension of pointer semantics? You wish. Anything with Lisp outside its training corpus of homework assignments? LOL. Editing burden quickly exceeds any possible speed-up.
But, I agree with your sentiment that asking it to do stuff like that often doesn’t work. I’ve found that what it _can_ do is stuff like “here’s a Model object, write a query to fetch it with the schema I told you about ages ago”. It might not give perfect results, but I know how to write that query and it’s faster to edit Claude’s output than it is to write it from scratch.
Fwiw, I've had some helpful successful prompts here and there, and in some very narrow scopes I'll get something usable, like parsing JSON or scaffolding some test cases, which is real saved time, but I stopped thinking about these tools long ago.
To get real value out of something like your example, I'd be using it as a back and forth to help me understand how some concepts work or write example questions I can drill on my own, but nothing where precision matters
There are also some more gotchas, like the generated code using a slightly different package versions than the installed ones.
Same, it can't even fix an xcode memory leak bug in a simple app. It will keep trying and breaking it non-stop. Garbage
If you define "productive" as writing a simple CRUD web application that your 13-year-old cousin could write between two gaming sessions, then you'll consider LLMs as sacred monsters.
Snake oil vendors always had great appeal over people who didn't know better.
AI is great for me, but it is more like a junior developer you are pairing with than a replacement.
Like simple python scripts for Home Assistant it just nails first go.
Give Anthropic a shot (its even better via the API console.anthropic.com/workbench).
OpenAI is yesterdays news.
Why? I see it like querying a database of human knowledge. I wouldn't expect a SQL database to infer information it's never seen before, why would I expect an LLM to do so?
I use it where I know a solution exists but I'm stumped on the syntax or how to implement it in an unfamiliar environment, or I want to know what could have caused a bug based on others' experience etc.
here's a chat for a uc and LCD chip that I picked at random (and got the name wrong for) (and didn't want raspberry pi code for so it stopped it short on that response)
https://chatgpt.com/share/2004ac32-b08b-43d7-b762-91543d656a...
Would be really interesting if anyone had blog posts on their actual workflow with LLMs, in case there's something I'm doing different.
When you are familiar with LLMs, then a question from someone who doesn't use AI is very obvious. It's the same feeling you have when you roll your eyes and say "you could have googled that in 10 seconds".
It's either explaining code where you don't even know the lingo for or what the question could be. Or touching code with a framework you never used. Or tedious tasks like convert parts of text into code or json. Or sometimes your mind is stuck or drifts off. Ask AI for an idea to get the ball rolling again.
Yes, discovering what works and what doesn't is tedious and slower then "just doing it yourself". Like switching IDEs. But if you found a handful of usecases that solve your problems, it is very refreshing.
I saw an LLM demo at one point where it was asked to write FFT and add unit tests for it which really drove this point home for me.
A programmer is a nicer term for a code monkey. You ask them to write FFT and they'll code it. All problems can be solved with mode code. They can edit code, but on the whole it's more just to add more code. LLMs are actually pretty good at this job, in my experience. And this job is important, not all tasks can be engineered thoroughly. But this job has its scaling limits.
A software engineer is not about coding per se, it's about designing software. It's all about designing the right code, not more code. Work smarter, not harder, for scale. You ask them to write FFT and they'll find a way to depend on it from a common implementation so they don't have to maintain an independent implementation. I've personally found LLMs very bad at this type of work, the same way you and others relying to you describe it. (Ok, maybe FFT is overly simple, I'm sure an LLM can import that for you. But you get the idea.) LLMs have statistical confidence, not intellectual confidence. But software engineering generally works with code too complex for pure statistical confidence.
No offense to the LLM fans here, but I strongly suspect most of them are closer to the programmer category of work. An important job, but one more easily automated away by LLMs (or better software engineering long-term). And we can see this by how a lot of programming has been outsourced for decades to cheap labor in third-world countries: it's a simpler type of job. That plus the people biased because their jobs and egos depend on LLMs succeeding.
I’ve found that it’s sometimes amazing and sometimes wastes a lot of my time. A few times it’s really come up with a good insight I hadn’t considered because the conversation has woken up some non-obvious combination. I use ChatGPT, Claude, Perplexity and one or two IDE tools.
It seems especially strong with Python but a bit medium with Swift.
Every thread like this over the past year or so has had comments similar to yours, and it always remains quite vague, or when examples are given, it’s about self-contained tasks that require little contextual knowledge and are confined to widely publicly-documented technologies.
What exactly floored your colleague at Microsoft?
So I don't know how this would go in a much larger codebase.
What floored him was simply how much of my programming I was doing with an LLM / how little I write line-by-line (vs edit line-by-line).
If you're really curious, I recorded some work for a friend. The first video has terrible audio, unfortunately. This second one I think gives a very realistic demonstration – you'll see the model struggle a bit at the beginning:
However there are some very frustrating limitations to greptle, so severe that I basically only use it to ask implementation questions on existing codebases, not for anything like general R&D: 1) answers are limited to about 150 lines. 2) it doesn't re-analyze a repo after you link it in a conversation (you need to start a new conversation, and re-link the repo, then wait 20+ min for it to parse your code) 3) it is very slow (maybe 30 seconds to answer a question) 4) there's no prompt engineering
I think it's a bit strange that no other ai solution lets you ask questions about existing codebases. I hope that will be more widespread soon.
Speaking of understand context… They floored him not other way round
Cursor offers some super marginal UX improvements over the latter (being that it’s a fork of VScode), since it allows you to switch models. But Claude and GPT have been interchangeable at least for my workflows, so I’m not sure the hype is really deserved.
I can only imagine the excitement comes from the fact that cursor has a full-fat free trial, and maybe most people have never bothered paying for copilot?
Perhaps it's my language of choice (Elixir)? Claude absolutely nails it, rarely gives me code with compilation errors, seems to know and leverage the standard library very well, idiomatic. Not the same with GPTs.
did a quick check, it's $20/month, and it has a vim plugin: https://github.com/pasky/claude.vim
going to give it a spin
Maybe GPT4o changed things.
I was an early advocate for Copilot but, honestly, nowadays I really don't find it that useful, compared to GPT-4o via ChatGPT.
ChatGPT not being directly integrated into my editor turns out to be an advantage. The problem with Copilot is it gets in the way. It's too easy to unintentionally insert a line or block completion that isn't what you want, or is out and out garbage, and it's constantly shoving up suggestions as I type, which can be distracting. It's particularly irritating when I'm trying to read or understand a piece of code, or maybe do a refactor, and I leave my caret in one position for half a second too long, and suddenly it's ghost-inserted a block of code as a suggestion that's moved half of what I'm reading down the screen and now I have to find my place again.
Whereas, with ChatGPT being separate, it operates at a much less intrusive cadence, and only responds when I ask it too, which turns out to be much more useful and productive.
I'm seriously considering binning of my Copilot subscription as a result.
Most software vendors are selling their version of AI as hallucination free though. So that's terrifying.
I think that's why I like to compare the current state of AI to the state of the CPU industry maybe around the 286-486 era going towards the Pentium.
But outside of that, beyond needing to remember a certain syntax, I have found that any time I tried to use it for anything more complex I am finding myself spending more time going back and forth trying to get code that works than I would have if I had just done it myself in the first place.
If the code works, it just isn't maintainable code if you ask it to do too much. It will just remove entire functionality.
I have seen a situation of someone submitting a PR, very clearly copying a method and sticking it in AI and saying "improve this". It made changes for no good reason and when you ask the person that submitted the PR why they made the change we of course got no answer. (these were not just Linter changes)
Thats concerning, pushing code up that you can't even explain why you did something?
Like you said with the hard work, sure it can churn out code. But you need to have a complete clear picture of what that code needs to look like before you start generating or you will not like the end result.
It is of vital importance (imho) to get open models at the same level before another jump comes (if it comes of course, maybe another winter, but at least we'll have something I use every day/all day; so not all hype I think).
...do you not see the obnoxious CoPilot(TM) buttons and ads everywhere? It's even infected the Azure Portal - and every time I use it to answer a genuine question I have I get factually-incorrect responses (granted, I don't ask it trivial or introductory-level questions...).
1. Use cursor with Claude Sonnet
2. Pick a programming language you don't know at all
3. Build an app in that language prompting only, don't stop prompting until you've run it successfully
Business Advice including marketing, reaching out to investors, understanding SAFE notes (follow up questions after watching the Y Combinator videos), customer interview design. All of which, as an engineer, I had never done before.
Create SQL queries for all kinds of business metrics including Monthly/Daily Active users, breakdown of users by country, abusive user detection and more.
Automated unit test creation. Not just the happy path either.
Automated data repository creation, based on a one shot example and MySQL text output describing the tables involved. From this, I have super fast data repositories that use raw SQL to get/write data.
Helping with challenging code problems that would otherwise need hours of searching google or reading the docs.
Database and query optimization.
Code Review. This has caught edge case bugs that normal testing did not detect.
I'm going to try out aider + claude sonnet 3.5 on my codebases. I have heard good things about it and some rave reviews on X/twitter. I watched a video where an engineer had a bug, described it to some tool (which wasn't specified, but I suspect aider), then Claude created a test to reproduce the bug and then fixed the code. The test passed, they then did a manual test and the bug was gone.
I'm glad this has been working for you -- generally any time I actually have a really difficult problem, ChatGPT just makes up the API I wish existed. Then when I bring it up to ChatGPT, it just apologizes and invents new API.
When I look at the kinds of AI projects I have visibility into, there's a parallel where the public are expecting a centralized, all knowing, general purpose AI, but what it's really going to look like is a graph of oddball AI agents tuned for different optimizations.
One node might be slow and expensive but able to infer intent from a document, but its input is filtered by a fast and cheap one that eliminates uninteresting content, and it could offload work to a domain-specific one that knows everything about URLs, for example. More like the network of small, specialized computers scattered around your car than a central know-it-all computer.
Every question I've asked of chatGPT, meta and Gemini have returned results that were either obvious or wrong. Pointing out how wrong the answers returned got the obvious, "I apologize" response which returned an obvious answer.
I consider all these AI engines to be interactive search engines where the results need to be double checked. The only thing these engines do, for me perhaps, is save some search time so I don't have to click around on a lot of sites to scroll for some semblance of an answer to verify.
In fields I have less experience with it seems feasible. In fields I am an expert in, I know it's dangerous. That makes me worry about the applicability of the former and people's critical evaluation ability of the whole idea.
I err on the side of "run away".
But the Rubicon is still crossed. There is a general purpose computer system that understands human language and can write real sounding human language. That's a sea change.
All in all, it helps assist us in new ways. Had somebody take a picture of a car part that had no markings and it identified it, found the maker/manufacturer/SKU and gave all the details etc. That stuff is useful.
But now we're looking at in-authentic stuff. Artists, writers being plagiarized, job cuts (for said marketing/pitches, BS presentations to downsize teams). It's not just losing its hype, its losing any hype in building humanity for the better. It's just more buzzwords, more 'glamour' more 'pop' shoved in our faces.
The layoffs aren't looking pretty.
Works well to help us code though. Viva, sysadmins unite.
Document embedding from transformers are great and fit into existing search paradigms.
Computer vision and image segmentation is at a level I thought impossible 10 years ago.
Text to speech that sounds natural? I might actually use Siri and Alexa! (Ok, that one might be considered “generative”)
The sooner people start to find it boring, the sooner we can stop wasting time on all the hot air and just use the bits that work.
For what it’s worth hype doesn’t mean sustainability anyway. If all the jokers go onto a new fad it’s hardly the skin off the back of anyone taking this seriously, they’ve been through worse times.
Large and small, entire development teams are completely unaware of the basics of “prompt engineering” for coding, and corporate has an entirely regressive anti-AI policy that doesnt factor in the existence of locally run language models, and just assumes ChatGPT and cloud based ones digesting trade secrets. People arent interested in seeing what the hype is about, and are disincentived from bothering on a work computer. I’m on one team where the Engineering Manager is advocating for Microsoft CoPilot licenses, as in, its a concept that hasnt happened and needs buy in to even start considering.
I would say most people really haven't looked into it. Work is work, the sprint is the sprint, on to the next part of the product, rinse repeat. Time flies for those people, its probably most of the people here.
We are running out of textual data now to train on… so now they have switched to VIDEO. Geez now they can train on all the VIDEOS on the internet.
And when they finally get bots working, they will have limitless streams of TACTILE data…
Writing it off as the next fad seems fun. But to be honest, I was shocked by what openai did the first time. So they have my respect. I don’t think many of us saw it coming. And I think writing their creativity off again may not be wise.
So when they say the bubble is about to break… I get it. But I don’t see how.
I hardly ever pay for anything.
But I gladly spend money on ai to get the answers I need. Just makes my work work!
Also I would say the economic benefit of this tech for workers is that it will 2x the average worker as they catch on. Seriously I am a 2x coder compared to what I was because of this.
Therefore if me a person who hardly ever spends money has to buy it… I think eventually all businesses will realize all their employees need it. This driving massive revenue for those who sell it.
But it may not be the companies we think.
ChatGPT truly is impressive. Nonetheless, i still think most companies integrating "AI" into their products is buzzword bs that is all going to collapse in on itself.
You probably shouldn't advertise that.
Isn't the energy consumption of this technology pretty catastrophic? Do you consider the issue of energy consumption so abstracted you don't worry about it? Do you do anything to offset your increased carbon emissions?
There are a lot of smallish tasks/problems people/systems needs to deal with, some of them even waste notable real engineering capacity, and a highschooler could do manually quite easily by hand.
Example: find out if a text contains an email address, including all kinds of shenanigans people do to mask it (may not be allowd, ... whatever). From a purely coding standpoint, this is a cats-and-mouse game of improving regex solutions in many cases to also find the more sophisticated patterns, but there will always be uncatched/new ways or simply errors that produce false positives. But a highschooler can be given a text and instantly spot the email address (or confirm none is in there).
In order to "solve" these types of small problems, LLMs are pretty much fantastic. It needs to only be reliable enough to produce a structured answer within a few attempts and cheap enough to not be a concern for finance/operations. Thats why for me it makes absolutely sense that the #1 priority for OpenAI since GPT4 has been building smaller/faster/cheaper models. Automators need exactly that, not genius-level AGI.
Also for me I think we're not even scratching the surface still about many tasks can be automated away within the current constraints/flaws of LLMs (hallucination, accuracy, ...). Everyone tries to hype up some super generic powerful future (that usually falls flat after a while), whereas the true value of LLMs is in the many small things where hardcoding solutions is expensive but an intern could do it right away.
They work when there's a lot of examples on github or google, but once you get into something that doesn't have a lot of examples like closed source code or rarely used libraries, it will start hallucinating and even mixing up different API versions to create a mess that doesn't work at all.
I don't believe LLMs will get any better than this without a new major breakthrough, but this is already better than using Google search.
The shortcomings are aplenty, but they don't bother me. The things it can do weren't possible 2 years ago. I'll leverage those and take the bad with the good.
Similar experience with Tesla FSD. I know other Tesla owners who tried it a few times and think it's trash because they had to disengage. I disengage preemptively all the time but the other 90% of my drive being done for me is not something that used to be possible. I tried to give up my subscription because it's expensive and couldn't hold out two days.
I kind of like LLMs for learning new languages. Claude or ChatGPT are good for asking questions. But copilot really stunts learning for me. I find nothing really sticks in my brain when I have copilot running. I feel like I just turn my brain off, which seems kind of dangerous to me in the long run.
I see a lot of negative reactions from programmers precisely because it is good at what they do. If you’re feeling threatened you’re much more likely to focus on the things it can’t do
Just have to learn to let it go, despite xkcd 386.
Regarding LLMs I bet we will see them evolving. Don't forget about https://news.ycombinator.com/item?id=41269791 there are many problems that LLMs are no good for but they are being better that many Google search results, and that means something from the economic point of view.
I am sure that The Economist analytics is having a good moment /s.
Seemingly every non-tech company in the world has been trying to figure out an "AI strategy," driven by hype and FOMO, but most corporate executives have no clue as to what they're doing or ought to be doing. They are spending money on poorly thought-out ideas.
Meanwhile, every tech company providing "AI services" has been spending money like a drunken sailor, fueled by hype and FOMO. None of these AI services are generating enough revenue to cover the cost of development, training, or even, in many cases, inference.
Nvidia, the dominant software-plus-hardware platform (CUDA is a big deal), appears to be the only financial beneficiary of all this hype and FOMO.
According to the OP, the business of "AI" is losing hype, suggesting we're approaching a bust.
This isn't a bull bet, it's a bear. AI would need to be perfectly monopolized to capture all the gains, and it's increasingly looking like that won't be the case - as all the component pieces are already open source at competitive levels, and any final architecture improvements that cross the final thresholds could be leaked in a 50GB file. Whoever gets to it first has a few months head start, at most, and probably not enough time or control to sell products - or shovels. After that it's a neverending race to zero, to the benefit of the consumer and the detriment of the investor.
Nvidia is a great example case. They currently dominate the GPU market, an "essential hardware for AI", yet ternary asic chips specialized for transformer-only architectures are looking quite viable at 1999s tech levels. Wouldn't bet on that monopoly sticking around much longer.
It depends how you look at it. A lot of the spend by big tech can be seen as protecting what they already have from disruption. Its not all about new product revenues it’s about keeping the revenue share in the markets they already have.
On the other hand we are no where near approaching hard limits on LLMs. When LLMs start to be trained for smaller subject areas with massive hand curated examples for solving problems, then they will reach expert performance in those narrow tech areas. These specialized models will be combined in general purpose MoEs.
Then new approaches beyond LLMs, RL, etc. will be discovered, perfected, made more efficient.
Seriously, any hard limits are far into the future.
Now the one API wrapper projects that I love are my meeting transcription and summarization apps. You can tear those from my cold, dead hands.
with regard to art AI, I think the debates are going to die off and the artists and people making stuff are going to just keep doing that, and some of them will use AI in ways that will challenge people in ways good art often does.
In other words, lot of people seem to think that human attention spans are what determine everything, but the technological cycles at work here are much much deeper.
Personally I have used Midjourney and ChatGPT in ways that will have huge impacts on many activities and industries. Denying that because of media trendiness about AI seems shortsighted.
Please tell that to all types on HN who downvote anything related to Rust without even reading past the title. :D
> In other words, lot of people seem to think that human attention spans are what determine everything, but the technological cycles at work here are much much deeper.
IMO no reasonable person denies this, it's just that the "AI" technology regularly over-promises and under-delivers. At one point it's no longer discrimination, it's just good old pattern recognition.
> Personally I have used Midjourney and ChatGPT in ways that will have huge impacts on many activities and industries. Denying that because of media trendiness about AI seems shortsighted.
Some examples with actual links would go a long way. I for one am skeptical of your claim but I am open to have my mind changed (f.ex. my CFO told me once that ChatGPT helped him catch several bad contract clauses).
• text generators
• code generators
• image generators
• video generators
• speech generators
• sound/music generators
• various robotics vision and control systems (often trained in virtual environments)
• automated factories / warehouses / fulfillment centers
• self-driving cars (trucks/planes/trains/boats/bikes/whatever)
• scientific / reasoning / math AIs
• military AIs
I find all of these categories already have useful AIs. And they are getting better all the time. The progress might slow down here and there, but it keeps on going.
Self-driving was pretty bad a year ago, and now we have Tesla FSD driving uninterrupted for multiple hours in complex city environments.
Image generators now exceed 99.9% of humans in painting/drawing abilities.
Text generators are decent. There are hallucination issues, and they are not creative at the best human level, but I'd say they write better than 90% of humans. When it comes to poetry/lyrics, they all still suck pretty badly.
Video generators are in their infancy - we get decent quality, but absolutely mental imagery.
Reasoning is the weakest point, in my opinion. Current gen models are just not good at reasoning. Sometimes they are brilliant, but then they make very silly mistakes that a 10-year old child wouldn't make. You just can't rely on their logical abilities. I have really high hopes for that area. If they can figure out reasoning, our science research will become a lot more reliable and a lot more fast.
The threshold for acceptable self-driving is genuine effort from the automated system to avoid accidents as we can't punish it for bad driving. And I want auditable proof of that.
> Image generators now exceed 99.9% of humans in painting/drawing abilities.
I'm pretty sure the amount of people that can draw is less than that. And they can beat image generators by a mile as those generators are mostly doing automated matte painting. Yes copy-paste is faster than typing, but that's not write a novel.
> Text generators are decent...but I'd say they write better than 90% of humans.
Humans use language to communicate. And while there are bad communicators, I think lots of people are doing ok on that front. Text generators can be perfect syntax-wise, but the intent has to come from someone. And the produced text's quality is proportional to the amount of intent that it produces (that's why corporate language is so bland).
> Video generators are in their infancy - we get decent quality, but absolutely mental imagery.*
See Image Generator section, but in motion.
> Reasoning is the weakest point, in my opinion... If they can figure out reasoning
That's the 1-billion dollar question.
I couldn't care less about (any, so also neither about) the LLM hype. Especially didn't bother going to a new web site (ChatGPT), or installing new IDEs etc.
I checked Codeium's mycompany-customized landing page: a one-liner vim plug-in installation and copy pasting an auth token.
I started typing in the very same editor, very same environment, very same everything, and the thing just works, most of the time guesses well what I would want to write, so then I just press tab to accept and voila.
I wasn't expecting such a seamless experience.
I still haven't integrated its "chat" functionality into my workflow (maybe I won't at all). I'm not hyped about it, it just feels like a companion to already working (and correct) code completion.
I read a lot about other people's usages (I'm a devXP engineer), and I feel like that for whatever reason there is more love / hype / faith on their chosen AI companion than how much they actually could improve if took a humble way of understanding code, reading (and writing) docs, reasoning about the engineering solution.
As everything, now AI is losing hype, but somehow (in my bubble) seems like engineers are still high on it. But I also see that this will distill further the set of people who I look up to and want to collaborate with, because e of that mentioned humbleness, as opposed to just accepting text predicted solutions mindlessly.
In my experience for every task another LLM is excelling and where one was good it might fail for the other task. They can do great things, but it’s not guaranteed and a lot of manual intervention and back and forth is still needed.
We are not at the point where using AI in the company ist just a blanket win for everyone involved. Companies are investing a lot but the return is hard to measure and not always guaranteed.
This is the problem with early technologies, they work sometimes but not guaranteed and we build our expectations on extrapolating their usefulness. We should not judge this technology by the current success rate, but rather by how much impact it will have once we get the success rate to higher and higher levels.
Still, what we can say is that for certain occupations it already helps them reduce their work by 15% (software engineers) and probably even more for some (writers, product owners, office warriors and alike). This is a great achievement in of itself, think how much this will make up in a company as large as MS or Google.
It is dotcom bust again. Mainstream is losing interest but at the same time I see our internal chatbots / ai agents doing hockey stick growth and I am using several hours code pilot daily.
Are you referencing something specific here, or is there something you can link to? To be honest the only significant 'disruption' I've seen for LLMs so far has been cheating on homework assignments. I'd be happy to read something if you have it.
So again, let's see some proof of this extensive use and large improvements to productivity.
I think that’s what people like about AI, it’s hope, maybe you won’t have to learn anything but still be productive. Sounds nice ?
That’s gonna be a bad take I think.
It has made people lump all AI technology into a bubble regardless of it is functional or not.
You are using this stuff to do some really cool things, but having hype attached to it can be very positive short term, damaging in the medium term and neutral long term. We are moving into the medium term.
Meanwhile we’re seeing the first of the new generation of on-device inference chips being shipped as commodity edge compute.
When the devices you use every day — cars, doorbells, TV remotes, points-of-sale, roombas — can interpret camera and speech input locally in the time it takes to draw a frame and with low enough power to still give you 10h between charges: then we’ll be due another round of innovation.
The article points to how few parts of the economy are leveraging the text-only API products currently available. That still feels very Web 1.0, for me.
- AI is currently hyped to the gills - Companies may find it hard to improve profits using AI in the short term - A crash may come - We may be close to AGI - Current models are flawed in many ways - Current level generative AI is good enough to serve many use cases
Reality is nobody truly knows - there's disagreement on these questions among the leaders in the field.
An observation to add to the mix:
I've had to deliberately work full time with LLM's in all kinds of contexts since they were released. That means forcing myself to use them for tasks whether they are "good at them" yet or not. I found that a major inhibitor to my adoption was my own set of habits around how I think and do things. We aren't used to offloading certain cognitive / creative tasks to machines. We still have the muscle memory of wanting to grab the map when we've got GPS in front of us. I found that once I pushed through this barrier and formed new habits it became second nature to create custom agents for all kinds of purposes to help me in my life. One learns what tasks to offload to the AI and how to offload them - and when and how one needs to step in to pair the different capabilities of the human mind.
I personally feel that pushing oneself to be an early adopter holds real benefit.
We have to realize that there is a ton of money right now behind pushing AI everywhere. We have entire conventions for leadership pushing that a year later "is the time to move AI to Prod" or "Moving past the skeptics".
We have investors seemingly asking every company they invest in "how are you using generative AI" before investing. We have Microsoft, Google, and Apple (to a lesser degree) forcing AI down our throats whether we like it or not and ignoring any reliability (inaccurate) issues.
FFS Microsoft is pushing AI as a serious branding part of Windows going forward.
We have too much money committed to pushing the idea that we already have general AI, too much marketing, etc.
Consumer hype and money in this situation are going to be very different things. I do think a bust is going to happen, but I don't think in any meaningful way the "hype" has died down. I think and I hope it will die down, we keep seeing how the technology just simply can't do what they are claiming. But I honestly don't think it is going to happen until something catastrophic happens, and it is going to be ugly when it does. Hopefully your company won't be so reliant on it to not recover.
AI ain’t going nowhere. And certainly isn’t overhyped. LLMs however, certainly are overhyped.
Then again I find it a good interface for assistants and actual AI and APIs that it can call on your behalf
NVDA's high closes were $135.58 June 18, down to $134.91 July 10th and $130 close today. It's highest sale is $140.76. So it's close today is 8% off its highest sale ever, and 4% off its highest close ever, not a big thing for a volatile stock. It's earnings are next week and we'll see how it does.
Nvidia and SMCI are the ones who have been earning money selling equipment for "AI". For Microsoft, Google, Facebook, Amazon, OpenAI etc., it is all big initial capital expenditure which they (and the scolding investment bank analysts) hope to regain in the future.
Personally, I'd wager the latter.
among which audience? is the hype necessary for further development? we attained much, if not all, of the recent achievements without hype. if anything, i'm strongly in favor of ai losing all the hype so that our researchers can focus on what's necessary, not what will win the loudest applause from so fickle a crowd. i'd be worried if ai was attracting less researchers than, say, two or three years ago. that doesn't seem to be the case.
The future is most definitely exciting though, and sadly quite scary, too.
Those who do not know history are doomed to repeat it.
But then, the current hype wasn't there to produce something useful, but for "serial entrepreneurs" to get investor money. They'll just move to the next hyped thing.
Yann LeCunn had a great tweet on this: Sometimes, the obvious must be studied so it can be asserted with full confidence: - LLMs can not answer questions whose answers are not in their training set in some form, - they can not solve problems they haven't been trained on, - they can not acquire new skills our knowledge without lots of human help, - they can not invent new things. Now, LLMs are merely a subset of AI techniques. Merely scaling up LLMs will not lead systems with these capabilities.
link https://x.com/ylecun/status/1823313599252533594?ref_src=twsr...
To focus on this: - LLMs can not answer questions whose answers are not in their training set in some form, - they can not solve problems they haven't been trained on
Given that we are close to maximum in the size of the training set, this means they are not going to improve without some completely unknown at the moment technical breakthrough. Going from "not intelligent" to "intelligent" is a massive shift.
The problem is that, by the standards of most human beings, they are in fact doing what we informally call "inference" or "creating new things".
That this is being accomplished by something that is "technically a fancy autocomplete" doesn't seem to matter practically... it's still doing all this useful and surprising stuff.
You're doing yourself and readers a disservice when you quote him without mentioning his conflict of interest.
His research is in analytical approaches to ML hence his bitterness against current LLM techniques and skepticism towards Sutton's Bitter Lesson.
- Startups whose entire business model is to just provide a wrapper around OpenAI's API.
- Social Media "AI Influencers" and their mindless "7 Ways To Become A Millionaire With ChatGPT" videos.
- Non-technical pundits claiming we are 1-2 years from AGI (and AGI talk in general).
- The stock market assigning insane valuations to any company that claims to somehow be "doing AI".
Things that are NOT coming to an end:
- Ongoing R&D in AI (and not just LLMs).
- Companies at the frontier of AI (OpenAI, Anthropic, Mistral, Google, Meta) releasing ever more capable models and tooling around those models.
- Forward looking companies in all industries using AI both to add capabilities to their products and to drive efficiencies in internal processes.
This collapses as soon as this collapses
> - The stock market assigning insane valuations to any company that claims to somehow be "doing AI".
Either way if it is indeed a bubble that will burst at some point, it doesn't bode well for the tech industry. With the mass layoffs, which are ongoing, seems like there won't be enough jobs for everyone.
For the record, before spelling the recipes out, it made sure I understood that collecting elk eggs may be unlawful in some jurisdictions.
I think part of it is due to the politically and internet-induced death of nuance. But part of it I can't fully understand.
Personally I think it's rather useful. I don't consider myself a heavy user and still use it almost every day to help code, I ask it a lot of questions about specific and general stuff. It's partially or totally substituted for me: Stack Overflow, Google Search, Google Translate, most tech references. In the office I see people using it all the time, there's almost always a chatgpt window open in some of the displays.
I think it's very difficult to say this is 100% hype and/or a "phase". It's almost a proven fact it's useful and people will want it in their lives. Even if it never improves again, ever. It's a new tool in the toolbox and there will be businesses providing it as a service, or perhaps we will get to open source general availability.
On the other extreme, all the AI doomerism and AGI stuff to me seems almost as unfounded as before generative AI. Sure, it's likely we'll get to AGI one day. But if you thought we were 100 years away, I don't think chatgpt put us any closer to it and I just don't get people who now say 5. I'd rather they worried about the impact of image gen AI in deepfakes and misinformation. That's _already_ happening.
My take on this is that those 2 developers are often working on very different tasks.
If you're a very smart coder working in a large codebase with tons of domain knowledge you'll find it's useless.
If you're a very smart coder working in a consultancy and your end result looks like a few thousand lines of glue code, then you're probably going to get a lot out of LLMs.
It's a bit like "software engineering" vs "coding". Current iterations of LLMs is good for "coding" but crap at "software engineering".
> The new crop of intelligent agents are different from the automated devices of earlier eras because of their computational power. They have Turing-machine powers, they take over human tasks, and they interact with people in human-like ways-perhaps with a form of natural language, perhaps with animated graphics or video. Some agents have the potential to form their own goals and intentions. to initiate actions on their own without explicit instruction or guidance, and to offer suggestions to people. Thus, agents might set up schedules, reserve hotel and meeting rooms, arrange transportation, and even outline meeting topics, all without human intervention.
they need to find a different derogatory slur to refer to tech workers
ideally one that isn't sexist and doesn't erase the contributions of women to industry
I have mixed feelings. On the one hand, I have a ton of schadenfreude for the AI maximalists (see: Leopold Aschenbrenner and the $1 trillion cluster that will never be), hype men (LinkedIn gurus and Twitter “technologists” that post threads with the thread emoji regurgitating listicles) or grifters (see: Rabbit R1 and the LAM vaporware).
On the other hand, I’m worried about another AI winter. We don’t need more people figuring out how to make bigger models, we need more fundamental research on low-resource contexts. Transformers are really just a trick to be able to ingest the whole internet. But there are many times where we don’t have a whole internet worth of data. The failure of LLMs on ARC is a pretty clear indication we’re not there yet (although I wouldn’t consider ARC sufficient either).
AI is following more a seasonal pattern with a AI Winters, can we expect a new winter soon?
> “An alarming number of technology trends are flashes in the pan.”
this has been a trend that seems to keep on recurring but does not stop from the tech bros from pushing the marketing beyond the realities.
raising money in the name of the future will give you similar results as self-driving cars or vr. the potential is crazy, but it is not going to make you double your money in a couple financial years. this should help serious initiatives find better-aligned investors.
The Economist, seriously?
The first was started with simple non-ML image manipulation and video analysis (like finding baggage left unmoved for a certain amount of time in a hall, trespassing alerts for gates and so on) and reach the level of live video analysis for autonomous drive. The second date back a very big amount of time, maybe with the Conrad Gessner's libraries of Babel/Biblioteca Universalis ~1545 with a simple consideration: a book is good to develop and share a specific topic, a newspaper to know "at a glance" most relevant facts of yesterday and so on but we still need something to elicit specific bit of information out of "the library" without the human need to read anything manually. Search engines does works but have limits. LLMs are the failed promise to being able to juice information (in a model) than extract it on user prompt distilled well. That's the promise, the reality is that pattern matching/prediction can't work much for the same problem we have with image, there is no intelligence.
For an LLM if a known scientist (as per tags in some parts of the model ingested information) say (joking in a forum) that eating a small rock a day it's good for health, the LLM will suggest such practice simply because it have no knowledge of joke. Similarly having no knowledge of humans a hand with ten fingers it's perfectly sound.
That's the essential bubble, PRs and people without knowledge have seen Stable Diffusion producing an astronaut riding a horse, have ask some questions to ChatGPT and have said "WOW! Ok, not perfect but it will be just a matter of time" and the answer is no, it will NOT be at least with the current tech. There are some use, like automatic translation, imperfect but good enough to be arranged so 1 human translator can do the same job of 10 before, some low importance ID checks could be done with electronic IDs + face recognition so a single human guards can operate 10 gates alone in an airport just intervening where face recognition fails. Essentially FEW low skill jobs might be automated, the rest is just classic automation, like banks who close offices simply because people use internet banking and pay with digital means so there is almost no need to pick and deposit cash anymore, no reasons to go to the bank anymore. The potential so far can't grow much more, so the bubble burst.
Meanwhile big tech want to keep the bubble up because LLM training is a thing not doable at home as single humans alone, like we can instead run a homeserver for our email, VoIP phone system, file sharing, ... Yes, it's doable in a community, like search with YaCy, maps with Open Street Maps etc but the need of data an patient manual tagging is simply to cumbersome to have a real community born and based model that match or surpass one done by Big Tech. Since IT knowledge VERY lately and very limited start to spread a bit enough to endanger big tech model... They need something users can't do at home on a desktop. And that's a part of the fight.
Another is the push toward no-ownership for 99% to better lock-in/enslave. So far the cloud+mobile model have created lock-in but still users might get data and host things themselves, if they do not operate computers anymore, just using "smart devices" well, the option to download and self host is next to none. So here the push for autonomous taxis instead of personal cars, connected dishwashers who send 7+Gb/day home and so on. This does not technically work so despite the immense amount of money and the struggle of the biggest people start to smell rodent and their mood drop.
Q: How many N's are there in Normation?
A: There is one N in the word "Normation"
Note that the answer is the same when asked n's instead of N's.
And this is but one example of many simple cases demonstrating that these model are indeed not reasoning in a similar manner to humans. However, the outputs are useful enough that I myself use Claude and GPT-4o for some work, but with full awareness that I must review the outputs in cases where factual accuracy is required.
Q: How many N's are there in Normation? A: There are 2 N's in the word "Normation". Q: Confidence level? A: A more precise question! After rechecking, I can confirm that there is indeed 2 N's in the word "Normation". Confidence level: 100%
Q: How many N's are there in the string "Normation"? A: There are 2 N's in the string "Normation".
On several regenerations, at some point it gives: A: There are 3 N's in the word "Normation". Q: Confidence level? A: A slight challenge! After re-checking, I see that there are actually 2 N's in the word "Normation".