As a real world example, I was told to evaluate Claude Code and ChatGPT codex at my current job since my boss had heard about them and wanted to know what it would mean for our operations. Our main environment is a C# and Typescript monorepo with 2 products being developed, and even with a pretty extensive test suite and a nearly 100 line "AGENTS.md" file, all models I tried basically fail or try to shortcut nearly every task I give it, even when using "plan mode" to give it time to come up with a plan before starting. To be fair, I was able to get it to work pretty well after giving it extremely detailed instructions and monitoring the "thinking" output and stopping it when I see something wrong there to correct it, but at that point I felt silly for spending all that effort just driving the bot instead of doing it myself.
It almost feels like this is some "open secret" which we're all pretending isn't the case too, since if it were really as good as a lot of people are saying there should be a massive increase in the number of high quality projects/products being developed. I don't mean to sound dismissive, but I really do feel like I'm going crazy here.
- driving the LLM instead of doing it yourself. - sometimes I just can't get the activation energy and the LLM is always ready to go so it gives me a kickstart
- doing things you normally don't know. I learned a lot of command like tools and trucks by seeing what Claude does. Doing short scripts for stuff is super useful. Of course, the catch here is if you don't know stuff you can't drive it very well. So you need to use the things in isolation.
- exploring alternative solutions. Stuff that by definition you don't know. Of course, some will not work, but it widens your horizon
- exploring unfamiliar codebases. It can ingest huge amounts of data so exploration will be faster. (But less comprehensive than if you do it yourself fully)
- maintaining change consistency. This I think it's just better than humans. If you have stuff you need to change at 2 or 3 places, you will probably forget. LLM's are better at keeping consistency at details (but not at big picture stuff, interestingly.)
It's possible some of it is due to codebase size or tech stack, but I really think there might be more of a human learning curve going on here than a lot of people want to admit.
I think I am firmly in the average of people who are getting decent use out of these tools. I'm not writing specialized tools to create agents of agents with incredibly detailed instructions on how each should act. I haven't even gotten around to installing a Playwright mcp (probably my next step).
But I've:
- created project directories with soft links to several of my employer's repos, and been able to answer several cross-project and cross-team questions within minutes, that normally would have required "Spike/Disco" Jira tickets for teams to investigate
- interviewed codebases along with product requirements to come up with very detailed Jira AC, and then,.. just for the heck of it, had the agent then use that AC to implement the actual PR. My team still code-reviewed it but agreed it saved time
- in side projects, have shipped several really valuable (to me) features that would have been too hard to consider otherwise, like... generating pdf book manuscripts for my branching-fiction creating writing club, and launching a whole new website that has been mired in a half-done state for years
Really my only tricks are the basics: AGENTS.md, brainstorm with the agent, continually ask it to write markdown specs for any cohesive idea, and then pick one at a time to implement in commit-sized or PR-sized chunks. GPT-5.2 xhigh is a marvel at this stuff.
My codebases are scala, pekko, typescript/react, and lilypond - yeah, the best models even understand lilypond now so I can give it a leadsheet and have it arrange for me two-hand jazz piano exercises.
I generally think that if people can't reach the above level of success at this point in time, they need to think more about how to communicate better with the models. There's a real "you get out of it what you put into it" aspect to using these tools.
I can’t say it’s led to shipping “high quality projects”, but it has let me accomplish things I just wouldn’t have had time for previously.
I’ve been wanting to develop a plastic -> silicone -> plaster -> clay mold making process for years, but it’s complex and mold making is both art and science. It would have been hundreds of hours before, with maybe 12 hours of Claude code I’m almost there (some nagging issues… maybe another hour).
And I had written some home automation stuff back with Python 2.x a decade ago; it was never worth the time to refamiliarize myself with in order to update, which led to periodic annoyances. 20 minutes, and it’s updated to all the latest Python 3.x and modern modules.
For me at least, the difference between weeks and days, days and hours, and hours and minutes has allowed me to do things I just couldn’t justify investing time in before. Which makes me happy!
So maybe some folks are “pretending”, or maybe the benefits just aren’t where you’re expecting to see them?
I KNOW a common issue people run into is they forget to handle rate limits, but I also know more JavaScript than Python and have limited time, so before I'd write:
``` # NOTE: Make sure to handle the rate limit! This is just an example. See example.com/docs/javascript/rate-limit-example for a js example doing this. ```
Unsurprisingly, more than half of customers would just ignore the comment, forget to handle the rate limit, and then write in a few months later. With Claude, I just write "Create a customer demo in Python that handles rate limits. Use example.com/docs/javascript/rate-limit-example as a reference," and it gets me 95% of the way there.
There are probably 100 other small examples like this where I had the "vibe" to know where the customer might trip over, but not the time to plug up all the little documentation example holes myself. Ideally, yes, hiring a full-time person to handle plugging up these holes would be great, but if you're resource constrained paying Anthropic for tokens is a much faster/cheaper solution in the short term.
In this author's case, they currently work for a company that .. wait for it .. less than 2 weeks ago launched some "AI image generation built for teams" product. (Also, oddly, the author lists himself as the 'Technical Director' at the company, working there for 5-6 years, but the company's Team page doesn't list him as an employee).
Since last few months, I have seen a notable difference in the quality and extent of projects these students have been able to accomplish. Every project and website they show looks polished, most of those could be a full startup MVP pre AI days.
The bar has clearly been raised way high, very fast with AI.
But the reason you don’t see a flood of great products is that the managerial layer has no idea what to do with massively increased productivity (velocity). Ask even a Google what they’d do with doubly effective engineers and the standard answer is to lay half of them off.
The headline gain is speed. Almost no-one's talking about quality - they're moving too fast to notice the lack.
That they are so good at the things I like to do the least and still terrible at the things at which I excel. That's just gravy.
But I guess this is in line with how most engineers transition to management sometime in their 30s.
usually when someone hypes it up it's things like, "i have it text my gf good morning every day!!", or "it analyzed every single document on my computer and wrote me a poem!!"
The "open secret" is that shipping stuff is hard. Who hasn't bought a domain name for a side project that didn't go anywhere. If there's anybody out there, raise your hand! So there's another filtering effect.
The crazy pills are thinking that HN is in any way representative of anything about what's going on in our broader society. Those projects are out there, why do you assume you'll be told about it? That someone's going to write an exposé/blog post on themselves about how they had AI build a thing and now they're raking in the dollars and oh, buy my course on learning how to vibecode? The people selling those courses aren't the ones shipping software!
Even if it's not straight astroturfing I think people are wowed and excited and not analyzing it with a clear head
So, I've very little to publicly show for all my obnoxious LLM advocacy. I wonder if any others are in the same boat?
This is the challenge I also face, it's not always obvious when a change I want will be properly understood by the LLM. Sometimes it one shots it, then others I go back and forth until I could have just done it myself. If we have to get super detailed in our descriptions, at what point are we just writing in some ad-hoc "programming language" that then transpiles to the actual program?
Given time AI will lead to incredible productivity. In the meantime, use as appropriate.
I'm mostly a freeloader, so how could I judge people who put in the tokens equivalent to 15 years worth of electricity (incl heating and hot water) bills for my home in a C compiler?
Well, I can see that Anthropic is still an AI company, not a software company, they're granting us access to their most valuable resource that almost doesn't require humans, for a very reasonable fee, allowing us to profit instead of them. They're philanthropists.
It does also seem to me that there is a lot of variance in skills for prompting/using AI in general (I say this as someone who is not particularly good as far as I’m aware – I’m not trying to keep tips secret from you). And there is also a lot of variance in the ability for an AI to solve problem of equal difficulty for a human.
What makes the difference is that agents can create these instructions themselves and monitor themselves and revert actions that didn't follow instructions. You didn't fet there because you achieved satisfactory results with semi-manual solutions. But people who abhor manual are getting there already.
I used this line for a long time, but you could just as easily say the same thing for a typical engineer. It basically boils down to "Claude likes its tickets to be well thought out". I'm sure there is some size of project where its ability to navigate the codebase starts to break down, but I've fed it sizeable ones and so long as the scope is constrained it generally just works nowadays
It's the appearance of productivity, not actual productivity.
interesting.
how much planning do you put into your project without AI anyway?
Pretty much all the teams I've been involved in:
- never did any analysis planning, and just yolo it along the way in their PR - every PR is an island, with tunnel vision - fast forward 2 years. and we have to throw it out and start again.
So why are you thinking you're going to get anything different with LLMs?
And plan mode isn't just a single conversation that you then flip to do mode...
you're supposed to create detailed plans and research that you then use to make the LLM refer back to and align with.
This was the point of the Ralph Loop
Tried to move some excel generation logic from epplus to closedxml library.
ClosedXml has basically the same API so the conversion was successful. Not a one-shot but relatively easy with a few manual edits.
But closedxml has no batch operations (like apply style to the entire column): the api is there but internal implementation is on cell after cell basis. So if you have 10k rows and 50 columns every style update is a slow operation.
Naturally, told all about this to codex 5.3 max thinking level. The fucker still succumbed to range updates here and there.
Told it explicitly to make a style cache and reuse styles on cells on same y axis.
5-6 attempts — fucker still tried ranges here and there. Because that is what is usually done.
Not here yet. Maybe in a year. Maybe never.
That being said, its great at generating boilerplate code or in my case, doing something like 'make a react component here please that does this small thing, and is aligned with the style in the rest of the file'. Good for when I need to work with code bases or technologies that are not my daily. Also a great research assistant.
But I guess being a 'better google' or a 'glorified spellchecker' doesn't get that hype money.
It also kinda feels gaslightish and as I've said in some controversial replies in other posts, its sort of eerily mass "psychosis" vibes just like during COVID.
All AI-IS-WONDERFUL stories are garbage-trash written by garbage people.
Fuck AI. Fuck HN AI promoters. Hopefully you all lose your jobs and fail in life.
Hardly before, now its almost three times a week. And never gets any questions on GPU amortization...
> This has truly freed up my productivity, letting me pursue so many ideas I couldn’t move forward on before
If you're writing in a blog post that AI has changed your life and let you build so many amazing projects, you should link to the projects. Somehow 90% of these posts don't actually link to the amazing projects that their author is supposedly building with AI.
I've got 10+ years of coding experience, I am an AI advocate, but not vibe coding. AI is a great tool to help with the boring bits, using it to initialize files, help figure out various approaches, as a first pass code reviewer, helping with configuring, those things all work well.
But full-on replacing coders? It's not there yet. Will require an order of magnitude more improvement.
Maybe they don't feel like sharing yet another half working Javascript Sudoku Solver or yet another half working AI tool no one will ever use?
Probably they feel amazed about what they accomplished but they feel the public won't feel the same.
https://apps.apple.com/us/app/snortfolio/id6755617457
30kloc client and server combined. I built this as an experiment in building an app without reading any of the code. Even ops is done by claude code. It has some minor bugs but I’ve been using it for months and it gets the job done. It would not have existed at all if I had to write it by hand.
SHOW ME THE MONEY!!!
GPT-5.2 fixed my hanging WiFi driver: https://gist.github.com/lostmsu/a0cdd213676223fc7669726b3a24...
Nobody is actually using AI for anything useful or THEY WOULDNT BE TALKING ABOUT IT. They’d be disrupting everything and making billions of dollars.
Instead this whole AI grift reads like “how to be a millionaire in 10 days” grifts by people that aren’t, in fact, millionaires.
Is it really to escape from "getting bogged down in the specifics" and being able to "focus on the higher-level, abstract work", to quote OP's words? I thought naively that engineering always has been about dealing with the specifics and the joy of problem solving. My guess is that the drive is toward power. Which is rather natural, if you think about it.
Science and the academic world
I have always failed to understand the obsessive dream of many engineers to become managers. It seems not to be merely about an increase in revenue.
Is it to escape from "getting bogged down in the specifics" and being able to "focus on the higher-level, abstract work", to quote OP's words? I thought naively that engineering has always been about dealing with the specifics and the joy of problem-solving. My guess is that the drive is towards power, which is rather natural, if you think about it.
Science and the academic world suffer a comparable plague.
And when you're in an existing company, stuck in thing X, knowing that it's obsolete, and the people doing the latest Y that's hot in the job market are in another department and jealously guard access to Y projects?
How about when you go to interview, and you not ONLY have to know Y, but the Leetcode from 15 years ago?
So maybe I've given you another alternative to 'it has to be power, there's no other rational reason to go into management'.
Here's a gentler one: if you want to build big things, involving many people, you need to be in management.
Do you enjoy brick laying and calculating angles around doorways? You're the engineer. Do you want to be the architect hiring engineers, working with project managers, and assessing the budget while worrying about approvals? They're different types of work, and it's not about 'power' like you are suggesting. Autonomy and decision-making power are more the 'power' engineers often don't get (unless they are lucky, very very smart or in a small startup-like environment).
Real managers deal with coaching, ownership, feelings, politics, communication, consensus building, etc. The people who are good at it like setting other people up to win.
Often too it's the architecture that can cause a grand idea to crash and burn—experienced devs should be moving toward solving those problems.
Like I’ve been in situations as an IC where poor leadership from above has literally caused less efficient and more painful day-to-day work. I always hoped I could sway those decisions from my position as an IC, but reality rarely aligned with that hope.
I actually love the details, but I just don’t get too deep into them these days as I don’t want to micro-manage.
I do find I have more say in things my team deals with now that I’m a manager.
That can extend to arbitrary absurdity. You are probably not growing your own food, mining your own ore, forging your own tools, etc etc etc.
It's all just a matter of where you rely on external tools/abstractions to do parts of the work you don't want to do yourself.
It's frontier exploration that brings me joy. If a clanker can do something, then it's a solved problem. I use all the tools at my disposal to push the frontier of problems solved. Wasting my time re-inventing the wheel brings me the opposite of joy.
But I'm acutely conscious that in the 5+ years that I've been a senior developer, my ability to come up with useful ideas has significantly outstripped the time I have to realize those ideas (and from experience, the same is often true of academics).
At work, I have the choice between remaining hands-on and limiting what I can get done, or acting more like a manager, and having the opportunity to get more done, but only by letting other people do it, in ways that might not reflect my vision. It's pretty frustrating, to be honest.
For side projects, it's worse. Most of them just can't be done, because I don't even have the choice.
Not really for me. Programming is an effort type job. The more effort you put in the more you get out. True in other professions sure but multiplied with dev work. When became a dad everything changed. Solve hard problem or spend time with kid. I couldn't juggle the two. So i made a choice and fortunately had an opportunity to move into management.
Anyway full circle now I'm back to being a dev and this go around couldn't be easier with our ai agents. Point is I went into management because I was forced, not at all for power.
You want to write a book about people's deepest motivations. Formative experiences, relationships, desires. Society, expectations, disappointment. Characters need to meet and talk at certain times. The plot needs to make sense.
You bring it to your editor. He finds you forgot to capitalise a proper noun. You also missed an Oxford comma. You used "their" instead of "they're".
He sends you back. You didn't get any feedback about whether it makes sense that the characters did what they did.
You are in hell, you won't hear anything about the structure until you fix your commas.
Eventually someone invents an automatic editor. It fixes all the little grammar and spelling and punctuation issues for you.
Now you can bring the script to an editor who tells you the character needs more development.
You are making progress.
Your only issue is the Luddites who reckon you aren't a real author, because you tend to fail their LeetGrammar tests, calling you a vibe author.
I think it's that there is only that much demand for solving really complex problems, and doing the same thing over and over is boring, so management is the only way forward for many people
I was recently looking for mentors to work with him and advance his skills, targeting college aged kids / young 20s..
It was surprising to me how many people I came across in this field at this young age that are trying to focus on the "higher level" game planning aspects and not so much on the lower level implementation specifics.
For me it's the other way around. Engineering was always a means to an end - I just want to build products. It was a creative artform more than a scientific endeavour.
You can't do that from a high level abstract position. You actually need to stand at the coal face and think about it from time to time.
This article encodes an entitled laziness that's destructive to personal skill and quality work.
A few years ago, when Agile was still the hot thing and companies had an Agile "facilitor" or manager for each dev team, the common career path I heard when talking to those people was: "I worked as a java/cobol/etc in the past, but it just didn't click with me. I'm more of a peoples person, you know, so project management is where I really do my best work!".
Yeah, right...
What type of code? What types of tools? What sort of configuration? What messaging app? What projects?
It answers none of these questions.
This is an AI generated post likely created by going to chatgpt.com and typing in "write a blogpost hyping up [thing] as the next technological revolution", like most tech blog content seems to be now. None of those things ever existed, the AI made them up to fulfill the request.
>Over the past year, I’ve been actively using Claude Code for development. Many people believed AI could already assist with programming—seemingly replacing programmers—but I never felt it brought any revolutionary change to the way I work.
Funny, because just last month, HN was drowning in blog posts saying Claude Code is what enables them to step away from the desk, is definitely going to replace programmers, and lets people code "all through chatting on [their] phone" (being able to code from your phone while sitting on the bus seems to be the magic threshold that makes all the datacenters worth it).
It's like we all fell under the spell of a terminal endlessly printing output as some kind of measurement of progress.
I just give the link to those posts to my AI to read it, if it's not worth a human writing it, it's not worth a human reading it.
The only software I've seen designed and implemented by OpenClaw is moltbook. And I think it is hard to come up with a bigger pile of crap than Moltbook.
If somebody can build something decent with OpenClaw, that would help add some credibility to the OpenClaw story.
They are not able to comprehend that for anything more complicated than that, the code might compile, but the logical errors and failure to implement the specs start piling up.
Grok 4 Fast told me its own internal system prompt has rules against autonomous operation, so that might have something to do with it. I am having decent results with it though.
For me the pain point has always been with non-IT people/companies. They are way more accustomed with phone or even in person appointments. They in general have way more of a say than me, the customer.
Can Openclaw make and take phone calls for me to make appointments? Can Openclaw do chores for me? Can Openclaw meet with contractors for me? None of them it can do. It can make notes for me (useless as most notes are useless). It can scrap websites for me (not very interesting as why would I want to collect so much knowledge?). It can probably automate anything that already has an endpoint or whatever, but I don’t mind write code for my own projects. I always failed to understand why anyone would want to let AI write most of the code of their PERSONAL project — unless they want to sell them quickly.
I’m just a frustrated old man I guess.
[0] https://vapi.ai/
> I’m just a frustrated old man I guess.
I think this is a great summary of the failure of vision that a lot of tech people are having right now.
> automate anything that already has an endpoint or whatever
Facebook used to have API's, Reddit used to have API's, amazon used to have API's
They are gone.
Enshitification and dark patterns have taken over.
"Hey open claw, cancel service xxx" where XXX is something that is 17 steps and purposely hard to cancel so they keep your money.
What's going to happen when your AI tool can go to a website and strip the ad's off and return you just the text? What happens when it can build a customized news feed that looks less like Facebook and more like HN? Aren't we just gaining back function we lost with the death of RSS?
Consumers are mad about the hype of AI but the moment that it can cut through the bullshit we keep putting in their way it's going to wreck business MODELS, and the choice will be adapt or die. Start asking your "AI" tools to do all the basic, tedious bullshit tasks that are low risk (you have a ton of them) and if it gets 1/4 of them done your going to free up a ton of your own time.
[1] https://reorx.com/blog/rabbit-r1-the-upgraded-replacement-fo...
I tried using LLMs to help debug at different points, but they went in circles on bad ideas, even when I gave them what turned out to be a correct clue.
Root cause turned out to be that IPv6 wasn't enabled for Docker networking, but was enabled for the websites DNS. So people who connected over IPv6 were getting their IPs all converted to the same internal Docker IP before being handed to the per-IP throttling algorithm.
I spotted that there were no IPv6 IPs in the logs, but the LLMs missed that the key pattern was the absence of something expected, instead drawing wrong conclusions.
So no, I'm not about to turn OpenClaw loose on building anything at all complex.
> OpenClaw gave me the chance to become that super manager [...] A manager shouldn’t get bogged down in the specifics—they should focus on the higher-level, abstract work
These two propositions seem to be highly incompatible
Honestly I'd rather die
1. It has a lot of files that it loads into it's context for each conversation, and it consistently updates them. Plus it stores and can reference each conversation. So there's a sense of continuity over time.
2. It connects to messaging services and other accounts of yours, so again it feels continuous. You can use it on your desktop and then pick up your phone and send it an iMessage.
3. It hooks into a lot of things, so it feels like it has more agency. You could send it a voice message over discord and say "hey remember that conversation about birds? Send an email to Steve and ask him what he thinks about it"
It feels more like a smart assistant that's always around than an app you open to ask questions to.
However, it's worth stressing how terrible the software actually is. Not a single thing I attempted to do worked correctly, important issues (like the discord integration having huge message delays and sometimes dropping messages) get closed because "sorry we have too many issues", and I really got the impression that the whole thing is just a vibe coded pile of garbage. And I don't like to be that critical about an open source project like this, but I think considering the level of hype and the dramatic claims that humans shouldn't be writing code anymore, I think it's worth being clear about.
Ended up deleting it and setting up something much simpler. I installed a little discord relay called kimaki, and that lets me interact with instances of opencode over discord when I want to. I also spent some time setting up persistent files and made sure the llm can update them, although only when I ask it to in this case. That's covered enough of what I liked from OpenClaw to satisfy me.
https://github.com/a-n-d-a-i/ULTRON
Well, it's a work in progress, but I have self-upgrading and self-restarting working, and it's already more reliable than Claw ;)
I used the Claude Code SDK (Agents SDK) originally, but then realized I can get the same result by just calling `claude -p the_telegram_message`
The magic sauce being the --continue flag, of course. Bit less useful otherwise.
I haven't figured out how to interrupt it or see what it's doing yet though.
> Generally, I believe [Rabbit] R1 has the potential to change the world. This is a thought that seldom comes to my mind, as I have seen numerous new technologies and inventions. However, R1 is different; it’s not just another device to please a certain niche. It’s meticulously designed to serve one significant goal for all people: to improve lifestyle in the digital world.
I don't know about this; or at least, in my experience, is not a what happens with good managers.
I guess best managers just develop the hunch and know when to do this and when to ask engineers for smallest details to potentially develop different solutions. You have to be technical enough to do this
And me ruining my day fighting with a million hooks, specs and custom linters micromanaging Claude Code in the pursuit of beautiful code.
That would be really helpful.
Why isn't Claude doing all that for me, while I code? Why the obsession that we must use code generation, while other gabage activities would free me to do what I'm, on paper, paid to do?
It's less sexy of course, it doesn't have the promise of removing me in the end. But the reason, in the present state, is that IT admins would never accept for an llm to handle permissions, rotations, management would never accept an llm to report status or provide estimate. This is all "serious" work where we can't have all the errors llm create.
Dev isn't that bad, devs can clean slop and customers can deal with bugs.
Good luck hoping that none from the big money would try to stand between you and someone giving you a service (uber, airbnb, etsy, etc) and get rent from that.
I'm not running OpenClaw, but I've given Claude its own email address and built a polling loop to check email & wake Claude up when I've sent it something. I'm finding a huge improvement from that. Working via email seems to change the Claude dynamic, it feels more like collaborating with a co-worker or freelancer. I can email Claude when I'm out of the house and away from my computer, and it has locked down access to use various tools so it can build some things in reply to my emails.
I've been looking into building out voice memos or an Eleven Labs setup as well, so I can talk to Claude while I'm out exercising, washing dishes etc. Voice memos will be relatively easy but I haven't yet got my head around how to integrate Eleven Labs and work with my local data & tools (I don't want a Claude that's running on Eleven Labs servers).
What made it so popular I think is that it made it easy to attach it to whatever "channel" you're comfortable with. The mac app comes with dictation, but unsure the amount of setup to get tts back.
I feel like there's this "secret" hiding behind all these AI tools, that actually it's all very complicated and takes a lot of effort to make work, but the tools we're given hides it all. It's nice that we benefit from its simplicity of use. But hiding complexity leads to unexpected problems, and I'm not sure we've seen any of those yet - other than the massive, gaping security hole.
So, OpenClaw has changed his life: It has accelerated the AI psychosis.
Regardless of how you isolate the OpenClaw instance (Mac Mini, VPS, whatever) - if it’s allowed to browse the web for answers then there’s the very real risk of prompt injection inserting malicious code into the project.
If you are personally reviewing every line of code that it generates you can mitigate that, but I’d wager none of these “super manager” users are doing that.
I saw on The Verve that they partnered with the company that repeatedly disclosed security vulnerabilities to try to make skills more secure though which is interesting: https://openclaw.ai/blog/virustotal-partnership
I’m guessing most of that malware was really obvious, people just weren’t looking, so it’s probably found a lot. But I also suspect it’s essentially impossible to actually reliably find malware in LLM skills by using an LLM.
And 99% those AI-created "amazing projects" are going to be dead or meaningless in due time, rather sooner than later. Wasted energy and water, not to mention the author's lifetime.
The "supervisor" workflow mentioned by others in this thread (using one agent to manage multiple worker agents) is exactly where the industry is heading. It turns the human from a "vibe coder" into an architect who manages state and requirements while the agents handle the implementation "beads".
If you're hitting the "stupid zone" on larger tasks, try breaking the plan into smaller, specific markdown specs first. OpenClaw's ability to "interview" a codebase and then implement from those specs in commit-sized chunks is a game changer for non-trivial monorepos.
> Rabbit R1 - The Upgraded Replacement for Smart Phones
Kinda hard to take anything here seriously.
- Because the seasoned developers have something entirely different to say https://www.xda-developers.com/please-stop-using-openclaw/
- Also please stop spamming HN with this stuff
https://github.com/PSPDFKit-labs/nutrient-openclaw -
The skill is here as well if you prefer a skill - https://clawhub.ai/jdrhyne/nutrient-openclaw
I let it run in a VM on my desktop and I can check on its progress and provide feedback any time. Only took a few iterations of telling it to tweak its workflow to land on something very productive. Doesn't work for everything but it covers a lot of my work.
This has been a significant aspect of ai use as well. As a result a feel a little less friction with myself, less that I am letting things slip by because, well, because I still want a nice balance to work, life, leisure, etc. I don’t want to overstate things, it’s not a cure all for any of these things, but it helps a lot.
Don't compare your day 1 with some one's day 100
The free versions are toys
So, it appears that we have come a long way bubbling up through abstraction layers: assembly code -> high-level languages -> scripting -> prompting -> openclaw.
> Generally, I believe (Rabbit) R1 has the potential to change the world.
There is a pattern here.
If you delegate these tasks to OpenClaw, I am not really sure the result is exactly what you want to achieve and it works like you want it to.
> Then OpenClaw came along, and everything changed.
> After a few rounds of practice, I found that I could completely step away from the programming environment and handle an entire project’s development, testing, deployment, launch, and usage—all through chatting on my phone.
So, with Claude Code, you're stuck typing in a chat box. Now, with OpenClaw, you can type in a chat box on your phone? This is exciting and revolutionary.
What I really wonder, is who the heck is upvoting this slop on hackernews?
I haven't been able to find a good use for myself yet. Almost everything I use an LLM for has some kind of hard human-in-the-loop factor that is as of yet inescapable -- but I also don't really use LLMs for things like "sort my email.". mostly entirely coding.
Some are learning the hard way why you shouldn't do that having to hire freelancer developers the fix their entire code.
I spoke with a friend who is also in IT, the company he works for is full on into AI, everything is done or managed by AI, they only hit the button. Dude was describing their infrastructure and projects like if AI was a God.
Those are gonna be the first ones to fall, because they aren't using AI to improve their work, they are using AI to completely take over, full access to projects, full access to infrastructures, you name it.
Once we get to a spot where the AI can check its work and iterate, the loop is closed. But we are a long way off from that atm. Even for the web. I mean, have you tried the Playwright MCP server? Aside from being the slowest tool calls I have ever seen, the agent struggles mightily to figure out the simplest of navigation and interaction.
Yes yes Unit tests, but functional is the be all end all and until it can iterate and create its own functional test suite, I just don’t get it.
What am I missing?
It's a racket never ends.
It is a constant lure products and tools have to create the feeling of sensemaking. People want (pejorative) tools that show visualizations or summaries, without thinking about the particular visual/summary artifact is useful, actionable or accurate!
Maybe it's unfair to judge an author's current opinion by their past opinion - but since the piece is ultimately an opinion based on their own experience I'm going to take it along a giant pile of salt that the author's standards for the output of AI tools are vastly different than mine.
It's the endgame.
Even then, the architecture will be horrible unless you chat _a lot_ about it upfront. At some point, it’s easier to just look in the terminal.
Click bait at its peak.
Poe's law strikes... I can't tell if this is satire.
https://reorx.com/blog/rabbit-r1-the-upgraded-replacement-fo...
I hope at some point there will be a medical research into this hysteria.
getting sick of this fluff stuff
Agents work but still mostly produce slop.
Press [Space] to skip
There's not a single real example, and it even has all the em-dashes intact.
I quite like it just from the simple perspective that its a local LLM provider that's available to chat with in tons of apps I already use (e.g. Discord); its a good reduction in the number of parties who are privy to these conversations. I'm not sure if there's another system out there that's so plug-and-play, with so many options for conversation (Discord, Telegram, text, self-hosted web ui, etc).
But the tool calling is vastly overblown. It takes forever to get them set up, and that's to get them barely working. Bluebubbles has always been an ish app whose reverse engineering of the iMessage protocol is more likely to break on every macOS upgrade than do what you want it to do; and OpenClaw's iMessage integration is built on it. I've not yet gotten a Spotify skill to work (though I'm not sure what I'd do with it when I have one); the models just run in circles saying "it should be set up, ope its not, spotify_player sucks, lets try spt, wait that isn't working, lets try ncspot, why isn't this working". The "gog" tool is interesting, its a CLI-based tool for accessing data in your google account, it works alright, though OpenClaw's icon for the tool in their repository is a game controller icon; I suspect a mistaken, likely vibed, reference to the unrelated GOG/Good Ol' Games PC game store. What a mess. I could go on.
The cheaper models critically struggle to grep the full array of tools they have available to them. Kimi K2.5 exhibits this behavior where it will reiterate that it does not have access to my calendar, but usually if I ask it four or five times in a row, eventually it will claim it "discovered" the gog/Google Calendar tool in a hidden sub-directory (what?). Even with more intelligent models, like Opus or 5.2/5.3, the tools oftentimes need to be invoked with highly specific verbiage; "what's on my calendar" might work if you're lucky, but "use gog to fetch my calendar and display today's events" usually works.
I oftentimes just don't see the point. I can click the Gmail or Google Calendar app on my phone and get what I need out of those apps in less-than 6 seconds; it would take longer for me to dictate the exact phrasing to get what I need out of OpenClaw, let alone type it. I can see some argument for cross-operating on data between two apps, but getting that to work without paying Anthropic fifty cents for every query is even rarer. When I need an LLM to operate on my Obsidian notes, I can just use Claude Code or OpenCode... why do I need OpenClaw?
(I am genuinely open minded here; but articles like this just dance around high-minded abstract ideas of "im a super ai manager im so productive" without giving concrete examples. My suspicion is that the people who write these things were previously deeply unproductive people, and now AI has enabled them to achieve a mere fraction of the productivity that most of us already had.)
(And that's being generous. I think there's also a lot of grifters out there. I'll have to fire a stray at Cloudflare for this one: They've published a "get OpenClaw working on Cloudflare" repo where, if you set it up, would straight up cost you $50-$60, maybe $100/month; and they lie [1] about the cost in their own documentation. And you're paying that in addition to the LLM cost. Very bad look from a company I admire.)
[1] https://github.com/cloudflare/moltworker/issues/76#issuecomm...
For the impatient, here's a transcript summary (from Gemini):
The speaker describes creating a "virtual employee" (dubbed a "replicant") running on a local server with unrestricted, authenticated access to a real productivity stack—including Gmail, Notion, Slack, and WhatsApp. Tasked with podcast production, the agent autonomously researched guests, "vibe coded" its own custom CRM to manage data, sent email invitations, and maintained a work log on a shared calendar. The experiment highlights the agent's ability to build its own internal tools to solve problems and interact with humans via email and LinkedIn without being detected as AI.
He ultimately concludes that for some roles, OpenClaw can do 90%+ of the work autonomously. Jason controversially mentions buying Macs to run Kimi 2.5 locally so they can save on costs. Others argue that hosting an open model on inference optimized hardware in the cloud is a better option, but doing so requires sharing potentially sensitive data.