So, considering that the $3400/task system isn't able to compete with STEM college grad yet, we still have some room (but it is shrinking, i expect even more compute will be thrown and we'll see these barriers broken in coming years)
Also, some other back of envelope calculations:
The gap in cost is roughly 10^3 between O3 High and Avg. mechanical turkers (humans). Via Pure GPU cost improvement (~doubling every 2-2.5 years) puts us at 20~25 years.
The question is now, can we close this "to human" gap (10^3) quickly with algorithms, or are we stuck waiting for the 20-25 years for GPU improvements. (I think it feels obvious: this is new technology, things are moving fast, the chance for algorithmic innovation here is high!)
I also personally think that we need to adjust our efficiency priors, and start looking not at "humans" as the bar to beat, but theoretical computatble limits (show gaps much larger ~10^9-10^15 for modest problems). Though, it may simply be the case that tool/code use + AGI at near human cost covers a lot of that gap.
You can scale them up and down at any time, they can work 24/7 (including holidays) with no overtime pay and no breaks, they need no corporate campuses, office space, HR personnel or travel budgets, you don't have to worry about key employees going on sick/maternity leave or taking time off the moment they're needed most, they won't assault a coworker, sue for discrimination or secretly turn out to be a pedophile and tarnish the reputation of your company, they won't leak internal documents to the press or rage quit because of new company policies, they won't even stop working when a pandemic stops most of the world from running.
> deep learning doesn't allow models to generalize properly to out-of-distribution data—and that is precisely what we need to build artificial general intelligence.
I think even (or especially) people like Altman accept this as a fact. I do. Hassabis has been saying this for years.
The foundational models are just a foundation. Now start building the AGI superstructure.
And this is also where most of the still human intellectual energy is now.
They're risky in that they fail in ways that aren't readily deterministic.
And would you trust your life to a self-driving car in New York City traffic?
Imagine you have a self-driving AI that causes fatal accidents 10 times less often than your average human driver, but when the accidents happen, nobody knows why.
Should we switch to that AI, and have 10 times fewer accidents and no accountability for the accidents that do happen, or should we stay with humans, have 10x more road fatalities, but stay happy because the perpetrators end up in prison?
Framed like that, it seems like the former solution is the only acceptable one, yet people call for CEOs to go to prison when an AI goes wrong. If that were the case, companies wouldn't dare use any AI, and that would basically degenerate to the latter solution.
The one minor risk I see is the cat being too polite and getting effectively stuck in dense traffic. That's a nuisance though.
Is there something about NYC traffic I'm missing?
I mean, this is an incredible moment from that standpoint.
Regarding the topic at hand, I think that there will always be room for humans for the reasons you listed.
But even replacing 5% of humans with AI's will have mind boggling consequences.
I think you're right that there are jobs that humans will be preferred for for quite some time.
But, I'm already using AI with success where I would previously hire a human, and this is in this primitive stage.
With the leaps we are seeing, AI is coming for jobs.
Your concerns relate to exactly how many jobs.
And only time will tell.
But, I think some meaningful percentage of the population -- even if just 5% of humanity will be replaced by AI.
Sure, if a business deploys it to perform tasks that are inherently low risk e.g. no client interface, no core system connection and low error impact, then the human performing these tasks is going to be replaced.
This reminds me of the school principal who sent $100k to a scammer claiming to be Elon Musk. The kicker is that she was repeatedly told that it was a scam.
https://abc7chicago.com/fake-elon-musk-jan-mcgee-principal-b...
Or - worse - there is no accessible code anywhere, and you have to prompt your way out of "I'm sorry Dave, I can't do that," while nothing works.
And a human-free economy does... what? For whom? When 99% of the population is unemployed, what are the 1% doing while the planet's ecosystems collapse around them?
Your concerns about mysterious AI code and system crashes are backwards. This approach eliminates integration bugs and maintenance issues by design. The generated TypeScript is readable, fully typed, and consistently updated across the entire stack when business logic changes.
If you're struggling with AI-generated code maintainability, that's an implementation problem, not a fundamental issue with code generation. Proper type safety and schema validation create more reliable systems, not less. This is automation making developers more productive - just like compilers and IDEs did - not replacing them.
The code works because it's built on sound software engineering principles: type safety, single source of truth, and deterministic generation. That's verifiable fact, not speculation.
People talking like this also, in the back of their minds like to think they'll be OK. They're smart enough to be still needed. They're a human, but they'll be OK even while working to make genAI out perform them at their own work.
I wonder how they'll feel about their own hubris when they struggle to feed their family.
The US can barely make healthcare work without disgusting consequences for the sick. I wonder what mass unemployment looks like.
This is interesting because it's both Oddly Specific and also something I have seen happen and I still feel really sorry for the company involved. Now that I think about it, I've actually seen it happen twice.
The wild part is that LLMs understand us way better than we understand them. The jump from GPT-3 to GPT-4 even surprised the engineers who built it. That should raise some red flags about how "predictable" these systems really are.
Think about it - we can't actually verify what these models are capable of or if they're being truthful, while they have this massive knowledge base about human behavior and psychology. That's a pretty concerning power imbalance. What looks like lower risk on the surface might be hiding much deeper uncertainties that we can't even detect, let alone control.
Do you mean they lie because of bad training data? Or because of ill intent? How can an LLM have intent if it’s a stateless feedforward model?
Less risky to deploy question will probably come once it is closer to 10x the cost. Considering the model was even specifically tuned for the test and doesn't involve other complexity I will say we are actually 10^4 cost off in terms of real world scenario.
I would imagine with better algorithm, tuning and data we could knock off 10^2 from the equation. That would still leave us with 10^2 cost to improve from Hardware. Minimum of 10 years.
For AI example(s): Attribution is low, a system built without human intervention may suddenly fall outside its own expertise and hallucinate itself into a corner, everyone may just throw more compute at a system until it grows without bound, etc etc.
This "You can scale up to infinity" problem might become "You have to scale up to infinity" to build any reasonably sized system with AI. The shovel-sellers get fantastically rich but the businesses are effectively left holding the risk from a fast-moving, unintuitive, uninspected, partially verified codebase. I just don't see how anyone not building a CRUD app/frontend could be comfortable with that, but then again my Tesla is effectively running such a system to drive me and my kids. Albeit, that's on a well-defined problem and within literally human-made guardrails.
This is a big downside of AI, IMHO. Those offices need to be filled! ;-)
That one isn’t guaranteed. Many examples online of exfiltration attacks on LLMs.
The rhetoric of not needing people doing work is cartoon'ish. I mean there is no sane explanation of how and why that would happen, without employing more people yet again, taking care of the advancements.
It's nok like technology has brought less work related stress. But it has definitely increased it. Humans were not made for using technology at such a pace as it's being rolled out.
The world is fucked. Totally fucked.
Digital banks
Cashless money transfer services
Self service
Modern farms
Robo lawn mowers
NVR:s with object detection
I can go on forever
I agree the most interesting thing to watch will be cost for a given score more than maximum possible score achieved (not that the latter won't be interesting by any means).
Do you have a sense of what kind of task this benchmark includes? Are they more “general” such that random people would fare well or more specialized (ie something a STEM grad studied and isn’t common knowledge)?
I have failed the real ARC AGI :)
This isn't to say groups always outperform their members on all tasks, just that it isn't unusual to see a result like that.
So ya, working on efficiency is important, but we're still pretty far away from AGI even ignoring efficiency. We need an actual breakthrough, which I believe will not be possible by simply scaling the transformer architecture.
So combined together we are currently at least 10^5 in terms of cost efficiency. In reality I wont be surprised if we are closer to 10^6.
Energy Need: The average home uses 30 kWh/day, requiring 6 kW/hour over 5 peak sunlight hours.
Multijunction Panels: Lab efficiencies are already at 47% (2023), and with multiple years of progress, 60% efficiency is probable.
Efficiency Impact: At 60% efficiency, panels generate 600 W/m², requiring 10 m² (e.g., 2 m × 5 m) to meet energy needs.
This size can fit on most home roofs, be mounted on a pole with stacked layers, or even be hung through an apartment window.
For example, my 50 sq m set up, at -29 deg latitude, generated your estimated 30 kwh/day output. I have panels with ~20% efficiency, suggesting that at 60% efficiency, the average household would only get to around half their energy needs with 10 sq m.
Yes, solar has the potential to drastically reduce energy costs, but even with free energy storage, individual households aren’t likely to achieve self sustainability.
In Europe it is around 6-7 kWh/day. This might increase with electrification of heating and transport, but probably nothing like as much as the energy consumption they are replacing (due to greater efficiency of the devices consuming the energy and other factors like the quality of home insulation.)
In the rest of the world the average home uses significantly less.
General inflation has outpaced the inflation of electricity prices by about 3x in the past 100 years. In other words, electricity has gotten cheaper over time in purchasing power terms.
And that's whilst our electricity usage has gone up by 10x in the last 100 years.
And this concerns retail prices, which includes distribution/transmission fees. These have gone up a lot as you get complications on the grid, some of which is built on a century old design. But wholesale prices (the cost of generating electricity without transmission/distribution) are getting dirt cheap, and for big AI datacentres I'm pretty sure they'll hook up to their own dedicated electricity generation at wholesale prices, off the grid, in the coming decades.
Not saying this will happen, but it's risky to rely on solar as the only long-term solution.
Then let's say that OpenAI brute forced this without any meta-optimization of the hypothesized search component (they just set a compute budget). This is probably low hanging fruit and another 2x in compute reduction. ($850)
Then let's say that OpenAI was pushing really really hard for the numbers and was willing to burn cash and so didn't bother with serious thought around hardware aware distributed inference. This could be more than a 2x decrease in cost like we've seen deliver 10x reductions in cost via better attention mechanisms, but let's go with 2x for now. ($425).
So I think we've got about an 8x reduction in cost sitting there once Google steps up. This is probably 4-6 months of work flat out if they haven't already started down this path, but with what they've got with deep research, maybe it's sooner?
Then if "all" we get is hardware improvements we're down to what 10-14 years?
Since then there has been a tsunami of optimizations in the way training and inference is done. I don't think we've even begun to find all the ways that inference can be further optimized at both hardware and software levels.
Look at the huge models that you can happily run on an M3 Mac. The cost reduction in inference is going to vastly outpace Moore's law, even as chip design continues on its own path.
I'd hope we see more internal optimizations and improvements to the models. The idea that the big breakthrough being "don't spit out the first thought that pops into your head" seems obvious to everyone outside of the field, but guess what turns out it was a big improvement when the devs decided to add it.
It's obvious to people inside the field too.
Honestly, these things seem to be less obvious to people outside the field. I've heard so many uninformed takes about LLMs not representing real progress towards intelligence (even here on HN of all places; I don't know why I torture myself reading them), that they're just dumb memorizers. No, they are an incredible breakthrough, because extending them with things like internal thoughts will so obviously lead to results such as o3, and far beyond. Maybe a few more people will start to understand the trajectory we're on.
While I agree that the LLM progress as of late is interesting, the rest of your sentiment sounds more like you are in a cult.
As long as your field keep coming with less and less realistic predictions and fail to deliver over and over, eventually even the most gullible will lose faith in you.
Because that's what this all is right now. Faith.
> Maybe a few more people will start to understand the trajectory we're on.
All you are saying is that you believe something will happen in the future.
We can't have a intelligent discussion under those premises.
It's depressing to see so many otherwise smart people fall for their own hype train. You are only helping rich people get more rich by spreading their lies.
doesnt help that most people are just mimics when talking about stuff thats outside their expertise.
Hell, my cousin a quality-college educated individual, high social/ emotional iq, will go down the conspiracy theory rabbit hole so quickly based on some baseless crap printed on the internet. then he’ll talk about people being satan worshipers.
It’s very easy to say hey ofc it’s obvious but there is nothing obvious about it because you are anthropomorphizing these models and then using that bias after the fact as a proof of your conjecture.
This isn’t how real progress is achieved.
The state of the art seems very focused on promoting that language that might encode reason is as good as actual reason, rather than asking what a reasoning model might look like.
The trend for power consumption of compute (Megaflops per watt) has generally tracked with Koomey’s law for a doubling every 1.57 years
Then you also have model performance improving with compression. For example, Llama 3.1’s 8B outperforming the original Llama 65B
Routing to the correct human support
Providing FAQ level responses to the most common problems.
Providing a second opinion to the human taking the call.
So, even this most relevant domain for the technology doesn't eliminate human employment (because it's just not flexible or reliable enough yet).
If this turns out to be hard to optimize / improve then there will be a huge economic incentive for efficient ASICs. No freaking way we’ll be running on GPUs for 20-25 years, or even 2.
But sorry, blablabla, this shit is getting embarrassing.
> The question is now, can we close this "to human" gap
You won’t close this gap by throwing more compute at it. Anything in the sphere of creative thinking eludes most people in the history of the planet. People with PhDs in STEM end up working in IT sales not because they are good or capable of learning but because more than half of them can’t do squat shit, despite all that compute and all those algorithms in their brains.