Watching AI drive Microsoft employees insane (opens in new tab)

(old.reddit.com)

1088 pointslaiysb1y ago552 comments

552 comments

Interesting that every comment has "Help improve Copilot by leaving feedback using the or buttons" suffix, yet none of the comments received any feedback, either positive or negative.

> This seems like it's fixing the symptom rather than the underlying issue?

This is also my experience when you haven't setup a proper system prompt to address this for everything an LLM does. Funniest PRs are the ones that "resolves" test failures by removing/commenting out the test cases, or change the assertions. Googles and Microsofts models seems more likely to do this than OpenAIs and Anthropics models, I wonder if there is some difference in their internal processes that are leaking through here?

The same PR as the quote above continues with 3 more messages before the human seemingly gives up:

> please take a look

> Your new tests aren't being run because the new file wasn't added to the csproj

> Your added tests are failing.

I can't imagine how the people who have to deal with this are feeling. It's like you have a junior developer except they don't even read what you're telling them, and have 0 agency to understand what they're actually doing.

Another PR: https://github.com/dotnet/runtime/pull/115732/files

How are people reviewing that? 90% of the page height is taken up by "Check failure", can hardly see the code/diff at all. And as a cherry on top, the unit test has a comment that say "Test expressions mentioned in the issue". This whole thing would be fucking hilarious if I didn't feel so bad for the humans who are on the other side of this.

surgical_fire1y ago

> I can't imagine how the people who have to deal with this are feeling. It's like you have a junior developer except they don't even read what you're telling them, and have 0 agency to understand what they're actually doing.

That comparison is awful. I work with quite a few Junior developers and they can be competent. Certainly don't make the silly mistakes that LLMs do, don't need nearly as much handholding, and tend to learn pretty quickly so I don't have to keep repeating myself.

LLMs are decent code assistants when used with care, and can do a lot of heavy lifting, they certainly speed me up when I have a clear picture of what I want to do, and they are good to bounce off ideas when I am planning for something. That said, I really don't see how it could meaningfully replace an intern however, much less an actual developer.

safety1st1y ago

These GH interactions remind me of one of those offshore software outsourcing firms on Upwork or Freelancer.com that bid $3/hr on every project that gets posted. There's a PM who takes your task and gives it to a "developer" who potentially has never actually written a line of code, but maybe they've built a WordPress site by pointing and clicking in Elementor or something. After dozens of hours billed you will, in fact, get code where the new file wasn't added to the csproj or something like that, and when you point it out, they will bill another 20 hours, and send you a new copy of the project, where the test always fails. It's exactly like this.

Nice to see that Microsoft has automated that, failure will be cheaper now.

kamaal1y ago

>>These GH interactions remind me of one of those offshore software outsourcing firms on Upwork or Freelancer.com that bid $3/hr on every project that gets posted.

This level of smugness is why outsourcing still continues to exist. The kind of things you talk about were rare. And were mostly exaggerated to create anti-outsourcing narrative. None of that led to outsourcing actually going away simply because people are actually getting good work done.

Bad quality things are cheap != All cheap things are bad.

Same will work with AI too, while people continue to crap on AI, things will only improve, people will be more productive with AI, get more and bigger things done for cheaper and better. This is just inevitable given how things are going now.

>>There's a PM who takes your task and gives it to a "developer" who potentially has never actually written a line of code, but maybe they've built a WordPress site by pointing and clicking in Elementor or something.

In the peak of outsourcing wave. Both the call center people and IT services people had internal training and graduation standards that were quite brutal and mad attrition rates.

Exams often went along the lines of having to write whole ass projects without internet help in hours. Theory exams that had like -2 marks on getting things wrong. Dozens of exams, projects, coding exams, on-floor internships, project interviews.

>>After dozens of hours billed you will, in fact, get code where the new file wasn't added to the csproj or something like that, and when you point it out, they will bill another 20 hours, and send you a new copy of the project, where the test always fails. It's exactly like this.

Most IT services billing had pivoted away from hourly billing, to fixed time and material in the 2000s itself.

>>It's exactly like this.

Very much like outsourcing. AI is here to stay man. Deal with it. Its not going anywhere. For like $20 a month, companies will have same capability as a full time junior dev.

This is NOT going away. Its here to stay. And will only get better with time.

2 more replies

sbarre1y ago

I think that was the point of the comparison..

It's not like a regular junior developer, it's much worse.

1 more reply

preisschild1y ago

> That said, I really don't see how it could meaningfully replace an intern however

And even if it could, how do you get senior devs without junior devs? ^^

3 more replies

kaycey20221y ago

> It's like you have a senior phd level intelligence developer except they don't even read what you're telling them, and have 0 agency to understand what they're actually doing.

Is that better?

PKop1y ago

Did you miss the "except" in his sentence? He was making the point this is worse than junior devs for all reasons listed.

1 more reply

yubblegum1y ago

This field (SE - when I started out back in late 80s) was enjoyable. Now it has become toxic, from the interview process, to imitating "big tech" songs and dances by small fry companies, and now this. Is there any joy left in being a professional software developer?

bluefirebrand1y ago

Making quite a bit of money brings me a lot of joy compared to other industries

But the actual software part? I'm not sure anymore

diggan1y ago

> This field (SE - when I started out back in late 80s) was enjoyable. Now it has become toxic

I feel the same way today, but I got started around 2012 professionally. I wonder how much of this is just our fading optimism after seeing how shit really works behind the scenes, and how much the industry itself is responsible for it. I know we're not the only two people feeling this way either, but it seems all of us have different timescales from when it turned from "enjoyable" to "get me out of here".

salawat1y ago

My issue stems from the attitudes of the people we're doing it for. I started out doing it for humanity. To bring the bicycle for the mind to everyone.

Then one day I woke up and realized the ones paying me were also the ones using it to run over or do circles around everyone else not equipped with a bicycle yet; and were colluding to make crippled bicycles that'd never liberate the masses as much as they themselves had been previously liberated; bicycles designed to monitor, or to undermine their owner, or more disgustingly, their "licensee".

So I'm not doing it anymore. I'm not going to continue making deliberately crippled, overly complex, legally encumbered bicycles for the mind, purely intended as subjects for ARR extraction.

3 more replies

bwfan1231y ago

It happens in waves. For a period, there was an oversupply of cs engineers, and now, the supply will shrink. On top of this, the BS put out by AI code will require experienced engineers to fix.

So, for experienced engineers, I see a great future fixing the shit show that is AI-code.

2 more replies

iamleppert1y ago

No, there is absolutely no joy left.

coldpie1y ago

I've been looking at getting a CDL and becoming a city bus driver, or maybe a USPS driver or deliveryman or clerk or something.

1 more reply

mrweasel1y ago

At least we can tell the junior developers to not submit a pull-request before they have the tests running locally.

At what point does the human developers just give up and close the PRs as "AI garbage". Keep the ones that works, then just junk the rest. I feel that at some point entertaining the machine becomes unbearable and people just stops doing it or rage close the PRs.

pydry1y ago

When their performance reviews stop depending upon them not doing that.

Microsoft's stock price is dependent on them proving that this is a success.

3 more replies

throwaway20371y ago

    > rage close the PRs

I am shaking with laughter reading this phrase. You got me good here. It is the perfect repurpose of "rage quit" for the AI slop era. I hope that we see some MSFT employees go insane from responding to so many shitty PRs from LLMs.

One of my all time "rage quit" stories is Azer Koçulu of npm left-pad incident infamy. That guy is my Internet hero -- "fight the power".

microtherion1y ago

Better yet, deploy their own LLM to close the PRs.

throwup2381y ago

> Interesting that every comment has "Help improve Copilot by leaving feedback using the or buttons" suffix, yet none of the comments received any feedback, either positive or negative.

The feedback buttons open a feedback form modal, they don’t reflect the number of feedback given like the emoji button. If you leave feedback, it will reflect your thumbs up/down (hiding the other button), it doesn’t say anything about whether anyone else has left feedback (I’ve tried it on my own repos).

belter1y ago

This whole thread from yesterday take a whole different meaning: https://news.ycombinator.com/item?id=44031432

Comment in the GitHub discussion:

"...You and I and every programmer who hasn't been living under a rock knows that AI isn't ready to be adopted at this scale yet, on the premier; 100M-user code-hosting platform. It doesn't make any sense except in brain-washed corporate-talk like "we are testing today what it can do tomorrow".

I'm not saying that this couldn't be an adequate change some day, perhaps even in a few years but we all know this isn't it today. It's 100% financial-driven hype with a pinch of we're too big to fail mentality..."

namaria1y ago

"Big data" -> "Cloud" -> "LLM-as-A(G)I"

It's all just recycled rent seeking corporate hype for enterprise compute.

The moment I had decided to learn Kubernetes years ago, got a book and saw microservices compared to 'object-oriented' programming I realized that. The 'big ball of mud' paper and the 'worse is better' rant frame it all pretty well in my view. Prioritize velocity, get slop in production, cope with the accidental complexity, rinse repeat. Eventually you get to a point where GPU farms seem like a reasonable way to auto-complete code.

When you find yourself in a hole, stop digging. Any bigger excavator you send down there will only get buried when the mud crashes down.

vasco1y ago

> improve Copilot by leaving feedback using the or buttons" suffix, yet none of the comments received any feedback, either positive or negative

Why do they even need it? Success is code getting merged 1st shot, failure gets worse the more requests for changes the agent gets. Asking for manual feedback seems like a waste of time. Measure cycle time and rate of approvals and change failure rate like you would for any developer.

dfxm121y ago

It's like you have a junior developer except they don't even read what you're telling them, and have 0 agency to understand what they're actually doing.

Anyone who has dealt with Microsoft support knows this feeling well. Even talking to the higher level customer success folks feels like talking to a brick wall. After dozens of support cases, I can count on zero hands the number of issues that were closed satisfactorily.

I appreciate Microsoft eating their dogfood here, but please don't make me eat it too! If anyone from MS is reading this, please release finished products that you are prepared to support!

xnorswap1y ago

> How are people reviewing that? 90% of the page height is taken up by "Check failure",

Typically, you wouldn't bother manually reviewing something until the automated checks have passed.

diggan1y ago

I dunno, when I review code, I don't review what's automatically checked anyways, but thinking about the change/diff in a broader context, and whatever isn't automatically checked. And the earlier you can steer people in the right direction, the better. But maybe this isn't the typical workflow.

3 more replies

prossercj1y ago

This comment on that PR is pure gold. The bots are talking to each other:

https://github.com/dotnet/runtime/pull/115732#issuecomment-2...

spacecadet1y ago

"I wonder if there is some difference in their internal processes that are leaking through here?"

Maybe, but likely it is reality and their true company culture leaking through. Eventually some higher eq execs might come to the very late realization that they cant actually lead or build a worthwhile and productive company culture and all that remains is an insane reflection of that.

worldsayshi1y ago

> How are people reviewing that?

I agree that not auto-collapsing repeated annotations is an annoying bug in the github interface.

But just pointing out that annotations can be hidden in the ... menu to the right (which I just learned).

jon-wood1y ago

I'm not entirely sure why they're running linters on every available platform to begin with, it seems like a massive waste of compute to me when surely the output will be identical because it's analysing source code, not behaviour.

codyvoda1y ago

or press “a”

marmakoide1y ago

Hot take : the whole LLM craze is fed by a delusion. LLM are good at mimicking human language, capturing some semantics on the way. With a large enough training set, the amount of semantic captured covers a large fraction of what the average human knows. This gives the illusion of intelligence, and the humans extrapolates on LLM capabilities, like actual coding. Because large amounts of code from textbooks and what not is on the training set, the illusion is convincing for people with shallow coding abilities.

And then, while the tech is not mature, running on delusion and sunken costs, it's actually used for production stuffs. Butlerian Jihad when

nyarlathotep_1y ago

I think the bubble is already a bit past peak.

My sophisticated sentiment analysis (talking to co-workers other professional programmers and IT workers, HN and Reddit comments) seems to indicate a shift--there's a lot less storybook "Ay Eye is gonna take over the world" talk and a lot more distrust and even disdain than you'd see even 6 months ago.

Moves like this will not go over well.

1 more reply

otabdeveloper41y ago

> Butlerian Jihad when

I estimate two more years for the bubble to pop.

ta12431y ago

> @copilot please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

cruffle_duffle1y ago

It’s probably the junior devs that get to review these PRs. That and interns.

TheNewsIsHere1y ago

> This whole thing would be fucking hilarious if I didn't feel so bad for the humans who are on the other side of this.

Which will soon be anyone who directly or indirectly relies on Microsoft technologies. Some of these PRs, including at least one that I saw reworked certificate validation logic with not much more than a perfunctory “LGTM”, have been merged into main.

Coincidentally, I wonder if issues orthogonal to this slop is why I’ve been getting so many HTTP 500 errors when using GitHub lately.

kruuuder1y ago

A comment on the first pull request provides some context:

> The stream of PRs is coming from requests from the maintainers of the repo. We're experimenting to understand the limits of what the tools can do today and preparing for what they'll be able to do tomorrow. Anything that gets merged is the responsibility of the maintainers, as is the case for any PR submitted by anyone to this open source and welcoming repo. Nothing gets merged without it meeting all the same quality bars and with us signing up for all the same maintenance requirements.

abxyz1y ago

The author of that comment, an employee of Microsoft, goes on to say:

> It is my opinion that anyone not at least thinking about benefiting from such tools will be left behind.

The read here is: Microsoft is so abuzz with excitement/panic about AI taking all software engineering jobs that Microsoft employees are jumping on board with Microsoft's AI push out of a fear of "being left behind". That's not the confidence inspiring the statement they intended it to be, it's the opposite, it underscores that this isn't the .net team "experimenting to understand the limits of what the tools" but rather the .net team trying to keep their jobs.

Verdex1y ago

The "left behind" mantra that I've been hearing for a while now is the strange one to me.

Like, I need to start smashing my face into a keyboard for 10000 hours or else I won't be able to use LLM tools effectively.

If LLM is this tool that is more intuitive than normal programming and adds all this productivity, then surely I can just wait for a bunch of others to wear themselves out smashing the faces on a keyboard for 10000 hours and then skim the cream off of the top, no worse for wear.

On the other hand, if using LLMs is a neverending nightmare of chaos and misery that's 10x harder than programming (but with the benefit that I don't actually have to learn something that might accidentally be useful), then yeah I guess I can see why I would need to get in my hours to use it. But maybe I could just not use it.

"Left behind" really only makes sense to me if my KPIs have been linked with LLM flavor aid style participation.

Ultimately, though, physics doesn't care about social conformity and last I checked the machine is running on physics.

1 more reply

Vicinity96351y ago

If you're not using it where it's useful to you, then I still wouldn't say you're getting left behind, but you're making your job harder than it has to be. Anecdotally I've found it useful mostly for writing unit tests and sometimes debugging (can be as effective as a rubber duck).

It's like the 2025 version not not using an IDE.

It's a powerful tool. You still need to know when to and when not to use it.

3 more replies

the-lazy-guy1y ago

This is Stephen Toub, who is the lead of many important .NET projects. I don't think he is worried about losing job anytime soon.

I think, we should not read too much into it. He is honestly exploring how much this tool can help him to resolve trivial issues. Maybe he was asked to do so by some of his bosses, but unlikely to fear the tool replacing him in the near future.

4 more replies

hnthrow903487651y ago

TBF they are dogfooding this (good) but it's just not going well

1 more reply

dmix1y ago

> Microsoft employees are jumping on board with Microsoft's AI push out of a fear of "being left behind"

If they weren't experimenting with AI and coding and took a more conservative approach, while other companies like Anthropic was running similar experiments, I'm sure HN would also be critiquing them for not keeping up as a stodgy big corporation.

As long as they are willing to take risks by trying and failing on their own repos, it's fine in my books. Even though I'd never let that stuff touch a professional github repo personally.

1 more reply

username1351y ago

i dont think hey are mutually exclusive. jumping on board seems like the smart move if you're worried about losing your career. you also get to confirm your suspicions.

lcnPylGDnU4H9OF1y ago

This is important context given that it would be absurd for the managers to have already drawn a definitive conclusion about the models’ capabilities. An explicit understanding that the purpose of the exercise is to get a better idea of the current strengths and weaknesses of the models in a “real world” context makes this actually very reasonable.

mrguyorama1y ago

So why in public, and why in the most ham-fisted way, and why on important infrastructure, and why in such a terrible integration that it can't even verify that things compile before opening a PR!

In my org, we would have had to bypass precommit hooks to do this!

rsynnott1y ago

Beyond every other absurdity here, well, maybe Microsoft is different, but I would never assign a PR that was _failing CI_ to somebody. That that's happening feels like an admission that the thing doesn't _really_ work at all; if it worked even slightly, it would at least only assign passing PRs, but presumably it's bad enough that if they put in that requirement there would be no PRs.

sbarre1y ago

I feel like everyone is applying a worse-case narrative to what's going on here..

I see this as a work in progress.. I am almost certain the humans in the loop on these PRs are well aware of what's going on and have their expectations in check, and this isn't just "business as usual" like any other PR or work assignment.

This is a test. You can't improve a system without testing it on real world conditions.

How do we know they're not tweaking the Copilot system prompts and settings behind the scenes while they're doing this work?

Can no one see the possibility that what is happening in those PRs is exactly what all the people involved expected to have happen, and they're just going through the process of seeing what happens when you try to refine and coach the system to either success or failure?

When we adopted AI coding assist tools internally over a year ago we did almost exactly this (not directly in GitHub though).

We asked a bunch of senior engineers to see how far they could get by coaching the AI to write code rather than writing it themselves. We wanted to calibrate our expectations and better understand the limits, strengths and weaknesses of these new tools we wanted to adopt.

In most of those early cases we ended up with worse code than if it had been written by humans, but we learned a ton. We can also clearly see how much better things have gotten over time, since we have that benchmark to look back on.

rco87861y ago

I think people would be more likely to adopt this view if the overall narrative about AI is that it’s a work in progress and we expect it to get magnitudes better. But the narrative is that AI is already replacing human software engineers.

1 more reply

phkahler1y ago

>> I see this as a work in progress.. I am almost certain the humans in the loop on these PRs are well aware of what's going on and have their expectations in check, and this isn't just "business as usual" like any other PR or work assignment.

>> This is a test. You can't improve a system without testing it on real world conditions.

Software developers know to fix build problems before asking for a review. The AIs are submitting PRs in bad faith because they don't know any better. Compilers and other build tools produce errors when they fail, and the AI is ignoring this first line of feedback.

It is not a maintainers job to review code for syntax errors, or use of APIs that don't actually exist, or other silly mistakes. That's the compilers job and it does it well. The AI needs to take that feedback and fix the issues before escalating to humans.

1 more reply

mieubrisse1y ago

I was looking for exactly this comment. Everybody's gloating, "Wow look how dumb AI is! Haha, schadenfreude!" but this seems like just a natural part of the evolution process to me.

It's going to look stupid... until the point it doesn't. And my money's on, "This will eventually be a solved problem."

6 more replies

solids1y ago

You are not addressing the point in the comment, why are failing CI changes assigned?

1 more reply

beefnugs1y ago

This is the exact reason AI sucks : there is no proper feedback loop.

EVERY single prompt should have the opportunity to get copied off into a permanent log where the end user triggers it : log all input, all output, human writes a summary of what he wanted to happen but did not, what he thinks might have went wrong, what he thinks should have happened (domain specific experts giving feedback about how things are fucking up) And then its still only useful with long term tracking like how someone actually made a training change to fix this exact failure scenario.

None of that exists, so just like "full self driving" was a pie in the sky bullshit dream that proved machine learning has an 80/20 never gonna fully work problem, same thing here

munksbeer1y ago

> I feel like everyone is applying a worse-case narrative to what's going on here..

Unfortunately, just about every thread on this genre is like that now.

Dlanv1y ago

They said in the comments that currently the firewall is blocking it from checking tests for passing, and they need to fix that.

Otherwise it would check the tests are passing.

robotcapital1y ago

Replace the AI agent with any other new technology and this is an example of a company:

1. Working out in the open

2. Dogfooding their own product

3. Pushing the state of the art

Given that the negative impact here falls mostly (completely?) on the Microsoft team which opted into this, is there any reason why we shouldn't be supporting progress here?

JB_Dev1y ago

100% agree. i’m not sure why everyone is clowning on them here. This process is a win. Do people want this all being hidden instead in a forked private repo?

It’s showing the actual capabilities in practice. That’s much better and way more illuminating than what normally happens with sales and marketing hype.

rco87861y ago

Satya says: "I’d say maybe 20%, 30% of the code that is inside of our repos today and some of our projects are probably all written by software".

Zuckerberg says: "Our bet is sort of that in the next year probably … maybe half the development is going to be done by AI, as opposed to people, and then that will just kind of increase from there".

It's hard to square those statements up with what we're seeing happen on these PRs.

3 more replies

constantcrying1y ago

Who is "we" and how and why would "we" "support" or not "support" anything.

Personally I just think it is funny that MS is soft launching a product into total failure.

throwaway8444981y ago

"Pushing the state of the art" and experimenting on a critical software development framework is probably not the best idea.

Dlanv1y ago

Why not, when it goes through code review by experienced software engineers who are experts on the subject in a codebase that is covered by extensive unit tests?

1 more reply

mrguyorama1y ago

>supporting progress

This presupposes AI IS progress.

Nevermind that what this actually shows is an executive or engineering team that so buys their own hype that they didn't even try to run this locally and internally before blasting to the world that their system can't even ensure tests are passing before submitting a PR. They are having a problem with firewall rules blocking the system from seeing CI outcomes and that's part of why it's doing so badly, so why wasn't that verified BEFORE doing this on stage?

"Working out in the open" here is a bad thing. These are issues that SHOULD have been caught by an internal POC FIRST. You don't publicly do bullshit.

"Dogfooding" doesn't require throwing this at important infrastructure code. Does VS code not have small bugs that need fixing? Infrastructure should expect high standards.

"Pushing the state of the art" is comedy. This is the state of the art? This is pushing the state of the art? How much money has been thrown into the fire for this result? How much did each of those PRs cost anyway?

lawn1y ago

Because they're using it on an extremely popular repository that many people depend on?

And given the absolute garbage the AI is putting out the quality of the repo will drop. Either slop code will get committed or the bots will suck away time from people who could've done something productive instead.

globalise831y ago

Malicious compliance should be the order of the day. Just approve the requests without reviewing them and wait until management blinks when Microsoft's entire tech stack is on fire. Then quit your job and become a troubleshooter on x3 the pay.

sbarre1y ago

I know this is meant to sound witty or clever, but who actually wants to behave this way at their job?

I'll never understand the antagonistic "us vs. them" mentality people have with their employer's leadership, or people who think that you should be actively sabotaging things or be "maliciously compliant" when things aren't perfect or you don't agree with some decision that was made.

To each their own I guess, but I wouldn't be able to sleep well at night.

HelloMcFly1y ago

It’s worth recognizing that the tension between labor and capital historical reality, not just a modern-day bad attitude. Workers and leadership don’t automatically share goals, especially when senior management incentives often prioritize reducing labor costs which they always do now (and no, this wasn't always universally so).

Most employees want to do good work, but pretending there’s no structural divergence in interests flattens decades of labor history and ignores the power dynamics baked into modern orgs. It’s not about being antagonistic, it’s about being clear-eyed where there are differences between the motivations of your org. leadership and your personal best interests. After a few levels remove from your position, you're just headcount with loaded cost.

2 more replies

Frost1x1y ago

I suppose that depends on your relationship with your employer. If your goals are highly aligned (e.g. lots of equity based compensation, some degree of stability and security, interest in your role, healthy management practices that value their workforce, etc.) then I agree, it’s in your own self interest to push back because it can effect you directly.

Meanwhile a lot of folks have very unhealthy to non-existent relationships with their employers. There may be some mixture where they may be temporary hired/viewed as highly disposable or transient in nature having very little to gain from the success of the business, they may be compensated regardless of success/failure, they may have toxic management who treat them terribly (condescendingly, constantly critical, rarely positive, etc.). Bad and non-existent relationships lead to this sort of behavior. In general we’re moving towards “non-existent” relationships with employers broadly speaking for the labor force.

The counter argument is often floated here “well why work there” and the fact is money is necessary to survive, the number of positions available hiring at any given point is finite, and many almost by definition won’t ever be the top performers in their field to the point they truly choose their employers and career paths with full autonomy. So lots of people end up in lots of places that are toxic or highly misaligned with their interests as a survival mechanism. As such, watching the toxic places shoot themselves in the foot can be some level of justice people find where generally unpleasant people finally get to see consequences of their actions and take some responsibility.

People will prop others up from their own consequences so long as there’s something in it for them. As you peel that away, at some point there’s a level of poetic justice to watch the situation burn. This is why I’m not convinced having completely transactional relationships with employers is a good thing. Even having self interest and stability in mind, certain levels of toxicity in business management can fester. At some point no amount of money is worth dealing with that and some form of correction is needed there. The only mechanism is to typically assure poor decision making and action is actually held accountable.

1 more reply

nope10001y ago

On the other hand: why should you accept that your employer is trying to fire you but first wants you to train the machine that will replace you? For me this is the most "them vs us" it can be.

early_exit1y ago

To be fair, "them" are actively working to replace "us" with AI.

bluefirebrand1y ago

Do you sleep well at night just doing what you're told by people who don't really care about your well being?

I don't get that

1 more reply

Hamuko1y ago

Considering that there's daily employee protests against Microsoft now, probably a lot of Microsoft employees want to behave like that.

Xori711y ago

I agree. It doesn’t help that once things start breaking down, the employer will ask the employees to fix the issue themselves, and thus they’ll have to deal with so much broken code that they’ll be miserable. It’ll become a spiral.

1 more reply

mrguyorama1y ago

>I'll never understand the antagonistic "us vs. them" mentality

Your manager understands it. Their manager understands it. Department heads understand it. The execs understand it. The shareholders understand it.

Who does it benefit for the laborers to refuse to understand it?

It's not like I hate my job. It's just being realistic that if a company could make more money by firing me, they would, and if you have good managers and leadership, they will make sure you understand this in a way that respects you as a human and a professional.

1 more reply

mhuffman1y ago

>I'll never understand the antagonistic "us vs. them" mentality people have with their employer's leadership

Interesting because "them" very much have an antagonistic mentality vs "us". "Them" would fire you in a fucking heartbeat to save a relatively small amount (10%). "Them" also want to aggressively pay you the least amount for which they can get you to do work for them, not what they "value" you at. "Us" depends on "them" for our livelihoods and the lives of people that depend on us, but "them" doesn't doesn't have any dependency on you that can't be swapped out rather quickly.

I am a capitalist, don't get me wrong, but it is a very one-sided relationship not even-footed or rooted in two-way respect. You describe "them" as "leadership" while "Them" describe you as a "human resource" roughly equivalent to the way toilet paper and plastics for widgets are described.

If you have found a place to work where people respect you as a person, you should really cherish that job, because most are not that way.

2 more replies

whywhywhywhy1y ago

> but who actually wants to behave this way at their job?

Almost no one does but people get ground down and then do it to cope.

michaelcampbell1y ago

> I'll never understand the antagonistic "us vs. them" mentality people have with their employer's leadership,

When you see it as leadership having this mentality against the people that actually produce something of value you might.

beefnugs1y ago

You dont think its different somehow that the exact tech they are forcing all employees to use, is the same tech to reduce head count and pressure employees to work harder for less money?

mieubrisse1y ago

Exactly this. I suspect that "us vs them" is sweet poison: it feels good in the moment ("Yeah, stick it to The Man!") but it long-term keeps you trapped in a victim mindset.

LunaSea1y ago

I mean their company (Microsoft) is literally asking them to train their replacement.

So I'm not quite sure why you would not see it as a "us vs. them" situation?

tantalor1y ago

> when Microsoft's entire tech stack is on fire

Too late?

MonkeyClub1y ago

Just in time for marshmallows!

hello_computer1y ago

Might as well when they’re going to lay you off no matter what you do (like the guy who made an awesome TypeScript compiler in Go).

xyst1y ago

At some point code pilot will just delete the whole codebase. Can’t fail integration tests if there is no code :)

otabdeveloper41y ago

That would be logical, but alas LLMs can't into logic.

Bloating the codebase with dead code is much more likely.

weird-eye-issue1y ago

That's cute, but the maintainers themselves submitted the requests with Copilot.

balazstorok1y ago

At least opening PRs is a safe option, you can just dump the whole thing if it doesn't turn out to be useful.

Also, trying something new out will most likely have hiccups. Ultimately it may fail. But that doesn't mean it's not worth the effort.

The thing may rapidly evolve if it's being hard-tested on actual code and actual issues. For example it will be probably changed so that it will iterate until tests are actually running (and maybe some static checking can help it, like not deleting tests).

Waiting to see what happens. I expect it will find its niche in development and become actually useful, taking off menial tasks from developers.

Frost1x1y ago

It might be a safer option in a forked version of the project that the public can’t see. I have to wonder about the optics here from a sales perspective. You’d think they’d test this out more internally before putting it in public access.

Now when your small or medium size business management reads about CoPilot in some Executive Quarterly magazine and floats that brilliant idea internally, someone can quite literally point to these as examples of real world examples and let people analyze and pass it up the management chain. Maybe that wasn’t thought through all the way.

Usually businesses tend to hide this sort of performance of their applications to the best of their abilities, only showcasing nearly flawless functionality.

xnickb1y ago

> I expect it will find its niche in development and become actually useful, taking off menial tasks from developers.

Reading AI generated code is arguably far more annoying than any menial task. Especially if the said code happens to have subtle errors.

Speaking from experience.

balazstorok1y ago

This is probably version 0.1 or 0.2.

Reviewing what the AI does now is not to be compared with human PRs. You are not doing the work as it is expected in the (hopefully near?) future but you are training the AI and the developers of the AI and more crucially: you are digging out failure modes to fix.

1 more reply

ecb_penguin1y ago

This is true for all code and has nothing to do with AI. Reading code has always been harder than writing code.

The joke is that PERL was a write-once, read-none language.

> Speaking from experience.

My experience is all code can have subtle errors, and I wouldn't treat any PR differently.

1 more reply

cesarb1y ago

> At least opening PRs is a safe option, you can just dump the whole thing if it doesn't turn out to be useful.

There's however a border zone which is "worse than failure": when it looks good enough that the PRs can be accepted, but contain subtle issues which will bite you later.

UncleMeat1y ago

Yep. I've been on teams that have good code review culture and carefully review things so they'd be able to catch subtle issues. But I've also been on teams where reviews are basically "tests pass, approved" with no other examination. Those teams are 100% going to let garbage changes in.

camdenreslink1y ago

Even when you review human-written code carefully, subtle bugs can sneak through. Software development is hard.

1 more reply

ecb_penguin1y ago

Funny enough, this happens literally every day with millions of developers. There will be thousands upon thousands of incidents in the next hour because a PR looked good, but contained a subtle issue.

6uhrmittag1y ago

> At least opening PRs is a safe option, you can just dump the whole thing if it doesn't turn out to be useful.

However, every PR adds load and complexity to community projects.

As another commenter suggested, doing these kind of experiments on separate forks sound a bit less intrusive. Could be a take away from this experiment and set a good example.

There are many cool projects on GitHub that are just accumulating PRs for years, until the maintainer ultimately gives up and someone forks it and cherry-picks the working PRs. I've than that myself.

I'm super worried that we'll end up with more and more of these projects and abandoned forks :/

cyanydeez1y ago

Unfortunately,if you believe LLMs really can learn to code with bugs, then the nezt step would be to curate a sufficiently bug free data set. Theres no evidence this has occured, rather, they just scraped whayecer

petetnt1y ago

GitHub has spent billions of dollars building an AI that struggles with things like whitespace related linting errors on one of the most mature repositories available. This would be probably okay for a hobbyist experiment, but they are selling this as a groundbreaking product that costs real money.

marcosdumay1y ago

> This would be probably okay for a hobbyist experiment

It's perfectly ok for a professional research experiment.

What's not ok is their insistence on selling the partial research results.

sexy_seedbox1y ago

Nat Friedman must be rolling in his grave...

oh wait

ocdtrekkie1y ago

He's rolling in money for sure.

Philpax1y ago

Stephen Toub, a Partner Software Engineer at MS, explaining that the maintainers are intentionally requesting these PRs to test Copilot: https://github.com/dotnet/runtime/pull/115762#issuecomment-2...

Quarrelsome1y ago

rah, we might be in trouble here. The primary issue at play is that we don't have a reliable means of measuring developer performance, outside of subjective judgement like end of year reviews.

This means its probably quite hard to measure the gain or the drag of using these agents. On one side, its a lot cheaper than a junior, but on the other side it pulls time from seniors and doesn't necessarily follow instruction well (i.e. "errr your new tests are failing").

This combined with the "cult of the CEO" sets the stage for organisational dissonance where developer complaints can be dismissed as "not wanting to be replaced" and the benefits can be overstated. There will be ways of measuring this, to project it as huge net benefit (which the cult of the CEO will leap upon) and there will be ways of measuring this to project it as a net loss (rabble rousing developers). All because there is no industry standard measure accepted by both parts of the org that can be pointed at which yields the actual truth (whatever that may be).

If I might add absurd conjecture: We might see interesting knock-on effects like orgs demanding a lowering of review standards in order to get more AI PRs into the source.

rco87861y ago

> its a lot cheaper than a junior

I’m not even sure if this is true when considering training costs of the model. It takes a lot of junior engineer salaries to amortize the billions spent building this thing in the first place.

Quarrelsome1y ago

sure, but for an org just buying tokens its cheaper and more disposable than an employee. At least it looks better on paper for the bean counters.

BugheadTorpeda61y ago

Yes it's going to cause many problems forcompanies I think, but at least they will deserve it (the employees won't unfortunately unless they've drank the kool-aid, I rarely meet ICs that have drank it fwiw, which means I'm either in a serious bubble, or this is being pushed from the top down). The only clear winners are going to be chip companies.

There's never going to be an industry standard measure either. Measuring productivity as I'm sure you know is incredibly dumb for a job like this because the beneficialness of our work product can be both insanely positive and put the company on top or it can be so negative that it goes bankrupt. And ultimately a lot of what goes into people choosing whether they like the work product or not is subjective. A large part of our work is more of an art than a science and I say that as somebody that works about as far away from the frontend as one can get.

Crosseye_Jack1y ago

I do love one bot asking another bot to sign a CLA! - https://github.com/dotnet/runtime/pull/115732#issuecomment-2...

pm2151y ago

That's funny, but also interesting that it didn't "sign" it. I would naively have expected that being handed a clear instruction like "reply with the following information" would strongly bias the LLM to reply as requested. I wonder if they've special cased that kind of thing in the prompt; or perhaps my intuition is just wrong here?

Bedon2921y ago

A comment on one of the threads, when a random person tried to have copilot change something, said that copilot will not respond to anyone without write access to the repo. I would assume that bot doesn't have write access, so copilot just ignores them.

Quarrel1y ago

AI can't, as I understand it, have copyright over anything they do.

Nor can it be an entity to sign anything.

I assume the "not-copyrightable" issue, doesn't in anyway interfere with the rights trying to be protected by the CLA, but IANAL ..

I assume they've explicitly told it not to sign things (perhaps, because they don't want a sniff of their bot agreeing to things on behalf of MSFT).

1 more reply

90s_dev1y ago

Well?? Did it sign it???

jsheard1y ago

Not sure if a chatbot can legally sign a contract, we'd better ask ChatGPT for a second opinion.

5 more replies

marcosdumay1y ago

It didn't. It completely ignored the request.

(Turns out the AI was programmed to ignore bots. Go figure.)

nikolayasdf1231y ago

that's the future, AI talking to other AI, everywhere, all the time

thallium2051y ago

Is this the first instance of an AI cyber bullying another AI?

margorczynski1y ago

With how stochastic the process is it makes it basically unusable for any large scale task. What's the plan? To roll the dice until the answer pops up? That would be maybe viable if there was a way to automatically evaluate it 100% but with a human in the loop required it becomes untenable.

diggan1y ago

> What's the plan?

Call me old school, but I find the workflow of "divide and conquer" to be as helpful when working with LLMs, as without them. Although what is needed to be considered a "large scale task" varies by LLMs and implementation. Some models/implementations (seemingly Copilot) struggles with even the smallest change, while others breeze through them. Lots of trial and error is needed to find that line for each model/implementation :/

mjburgess1y ago

The relevant scale is the number of hard constraints on the solution code, not the size of task as measured by "hours it would take the median programmer to write".

So eg., one line of code which needed to handle dozens of hard-constraints on the system (eg., using a specific class, method, with a specific device, specific memory management, etc.) will very rarely be output correctly by an LLM.

Likewise "blank-page, vibe coding" can be very fast if "make me X" has only functional/soft-constraints on the code itself.

"Gigawatt LLMs" have brute-forced there way to having a statistical system capable of usefully, if not universally, adhreading to one or two hard constraints. I'd imagine the dozen or so common in any existing application is well beyond a Terawatt range of training and inference cost.

1 more reply

nonethewiser1y ago

Its hard for me to think of a small, clearly defined coding problem an LLM cant solve.

2 more replies

safety1st1y ago

I mean I guess this isn't very ambitious, but it's a meaningful time saver if I basically just write code in natural language, and then Copilot generates the real code based on that. I don't have to look up syntax details, or what some function somewhere was named, etc. It will perform very accurately this way. It probably makes me 20% more efficient. It doubles my efficiency in a language I'm unfamiliar with.

I can't fire half my dev org tomorrow with that approach, I can't really fire anyone, so I guess it would be a big letdown for a lot of execs. Meanwhile though we just keep incrementally shipping more stuff faster at higher quality so I'm happy...

This works because it treats the LLM like what it actually is: an exceptionally good if slightly random text transformer.

rsynnott1y ago

I suspect that the plan is that MS has spent a lot, really a LOT, of money on this nonsense, and there is now significant pressure to put, something, anything, out even if it is worse than useless.

Traubenfuchs1y ago

> to roll the dice

This was discussed here

https://news.ycombinator.com/item?id=43988913

eterevsky1y ago

The plan is to improve AI agents from their current ~intern level to a level of a good engineer.

ehnto1y ago

They are not intern level.

Even if it could perform at a similar level to an intern at a programming task, it lacks a great deal of the other attributes that a human brings to the table, including how they integrate into a team of other agents (human or otherwise). I won't bother listing them, as we are all humans.

I think the hype is missing the forest for the trees, and I think exactly this multi-agent dynamic might be where the trees start to fall down in front of us. That and the as currently insurmountable issues of context and coherence over long time horizons.

2 more replies

ethanol-brain1y ago

Seems like that is taking a very long time, on top of some very grandiose promises being delivered today.

2 more replies

interimlojd1y ago

You are really underselling interns. They learn from a single correction, sometimes even without a correction, all by themselves. Their ability to integrate previous experience in the context of new problems is far, far above what I've ever seen in LLMs

mnky9800n1y ago

Yes but they are supposed to be PhD level 5 years ago if you are listening to sama et al.

1 more reply

einsteinx21y ago

Without handholding (aka being used as a tool by a competent programmer instead of as an independent “agent”), they’re currently significantly worse than an intern.

serial_dev1y ago

This looks much worse than an intern. This feels like a good engineer who has brain damage.

When you look at it from afar, it looks potentially good, but as you start looking into it for real, you start realizing none of it makes any sense. Then you make simple suggestions, it does something that looks like what you asked, yet completely missing the point.

An intern, no matter how bad it is, could only waste so much time and energy.

This makes wasting time and introducing mind-bogglingly stupid bugs infinitely scalable.

marmakoide1y ago

The plan went from the AI being a force multiplier, to a resource hungry beast that have to be fed in the hope it's good enough to justify its hunger.

rsynnott1y ago

I mean, I think this is a _lot_ worse than an intern. An intern isn't constantly going to make PRs with failing CI, for a start.

cyanydeez1y ago

I plan to be a billionaire

le-mark1y ago

The real tragedy is the management mandating this have their eyes clearly set on replacing the very same software engineers with this technology. I don’t know what’s more Kafka than Kafka but this situation certainly is!

strogonoff1y ago

When tasked to train a technology that deprecates yourself, it’s relatively OK (you’re getting paid handsomely, and many of the developers at Microsoft etc. are probably ready to retire soon anyway). It’s another thing to realize that the same technology will also deprecate your children.

solarwindy1y ago

The managers may believe that's what they're asking their developers to do, but doesn't this whole charade expose the fact that this technology just does not have even close to the claimed capabilities?

I see it as wishful thinking in the extreme to suppose that probabilistic mashing together of plagiarized jigsaw pieces of code could somehow approach human intelligence and reasoning—and yet, the parlour trick is convincing enough that this has escalated into a mass delusion.

1 more reply

tossandthrow1y ago

Management obviously also know, that when they do not have anybody to manage, then they are also obselete.

automatic61311y ago

Satya said "nearly 30% of code written at microsoft is now written by AI" in an interview with Zuckerberg, so underlings had to hurry to make it true. This is the result. Sad!

TonyTrapp1y ago

As much as I'd like to also dunk on them because of their AI nonsense, this keeps being misquoted again and again. He said that about 20-30% of their code is written by software. If someone like Satya says "by software" and not "by AI", you can be very sure that there is a good reason that he's phrasing it as carefully as this - because that includes a lot of things like auto-generated code, e.g. COM classes generated from IDL files. Of course in the current climate everyone that's not careful enough will just mis-interpret it as "30% written by AI", and that is probably intentional.

asadotzler1y ago

It's worse than that. What he actually said was "Maybe 20 to 30 percent of the code that is inside of our repos today in some of our projects are probably all written by software."

Translation: maybe some of the code in some of our projects is probably written by software.

Seriously. That's what he said. Maybe some of the code in some of our projects is probably written by software.

How this became "30% of MS code is written by LLMs" is beyond me. It's wild. It's ridiculous.

pera1y ago

This happened during LlamaCon while taking about Copilot/LLMs: if the percentages Satya was referring to were for any "auto-generated" code then he was being intentionally misleading.

Besides, you could also say that 100% of code is generated "by software" no?

1 more reply

rchaud1y ago

It's remarkable how similar this feels to the offshoring craze of 20 years ago, where the complaints were that experienced developers were essentially having to train "low-skilled, cheap foreign labour" that were replacing them, eating up time and productivity.

Considering the ire that H1B related topics attract on HN, I wonder if the same outrage will apply to these multi-billion dollar boondoggles.

cebert1y ago

Do we know for a fact there are Microsoft employees who were told they have to use CoPilot and review its change suggestions on projects?

We have the option to use GitHub CoPilot on code reviews and it’s comically bad and unhelpful. There isn’t a single member of my team who find it useful for anything other than identifying typos.

mtmail1y ago

Depends on team but seems management is pushing it

from https://news.ycombinator.com/item?id=44031432

"From talking to colleagues at Microsoft it's a very management-driven push, not developer-driven. Friend on an Azure team had a team member who was nearly put on a PIP because they refused to install the internal AI coding assistant. Every manager has "number of developers using AI" as an OKR, but anecdotally most devs are installing the AI assistant and not using it or using it very occasionally. Allegedly it's pretty terrible at C# and PowerShell which limits its usefulness at MS."

"From reading around on Hacker News and Reddit, it seems like half of commentators say what you say, and the other half says "I work at Microsoft/know someone who works at Microsoft, and our/their manager just said we have to use AI", someone mentioned being put on PIP for not "leveraging AI" as well. I guess maybe different teams have different requirements/workflows?"

DebtDeflation1y ago

The question is who is setting these OKRs/Metrics for management and why?

It seems to me to be coming from the CEO echo chamber (the rumored group chats we keep hearing about). The only way to keep the stock price increasing in these low growth high interest rate times is to cut costs every quarter. The single largest cost is employee salaries. So we have to shed a larger and larger percentage of the workforce and the only way to do that is to replace them with AI. It doesn't matter whether the AI is capable enough to actually replace the workers, it has to replace them because the stock price demands it.

We all know this will eventually end in tears.

4 more replies

xnorswap1y ago

> Allegedly it's pretty terrible at C#

In my experience, LLMs in general are really, really bad at C# / .NET , and it worries me as a .NET developer.

With increased LLM usage, I think development in general is going to undergo a "great convergence".

There's a positive(1) feedback loop where LLM's are better at Blub, so people use them to write more Blub. With more Blub out there, LLMs get better at Blub.

The languages where LLMs struggle, with become more niche, leaving LLMs struggling even more.

C# / .NET is something LLMs seem particularly bad at, and I suspect that's partly caused by having multiple different things all called the same name. EF, ASP, even .NET itself are names that get slapped on a range of different technologies. The EF API has changed so much that they had to sort-of rename it to "EF Core". Core also gets used elsewhere such as ".NET core" and "ASP.NET Core". You (Or an LLM) might be forgiven for thinking that ASP.NET Core and EF Core are just those versions which work with .NET Core (now just .NET ) and the other versions are those that don't.

But that isn't even true. There are versions of ASP.NET Core for .NET Framework.

Microsoft bundle a lot of good stuff into the ecosystem, but their attitude when they hit performance or other issues is generally to completely rewrite how something works, but then release the new thing under the old name but with a major version change.

They'll make the new API different enough to not work without work porting, but similar enough to confuse the hell out of anyone trying to maintain both.

They've made things like authentication, which actually has generally worked fine out-of-the-box for a decade or more, so confusing in the documentation that people mostly tended to run for a third party solution just because at least with IdentityServer there was just one documented way to do it.

I know it's a bit of a cliche to be an "AI-doomer", and I'm not really suggesting all development work will go the way of the dinosaur, but there are specific ecosystem concerns with regard to .NET and AI assistance.

(1) Positive in the sense of feedback that increased output increases output. It's not positive in the sense of "good thing".

3 more replies

diggan1y ago

> Depends on team but seems management is pushing it

The graphic "Internal structure of tech companies" comes to mind, given if true, would explain why the process/workflow is so different between the teams at Microsoft: https://i.imgur.com/WQiuIIB.png

Imagine the Copilot team has a KPI about usage, matching the company OKRs or whatever about making sure the world is using Microsoft's AI enough, so they have a mandate/leverage to get the other teams to use it regardless of if it's helping or not.

linza1y ago

Well, what you describe is not terrible way to run things. Eat your own dogfood. To get better at it you need to start doing it.

1 more reply

4ggr01y ago

you can directly link to comments, by the way. just click on the link which displays how long ago the comment was written and you get the URL for the single comment.

(just mentioning it because you linked a post and quoted two comments, instead of directly linking the comments. not trying to 'uhm, actually'.)

thraway20790811y ago

Using a throwaway for obvious reasons. I work at a non-tech megacorp that you've heard of. This company's (I will not say "our"!) CEO is very close to Nadella, they meet regularly. Management here is also pushing Github Copilot onto devs, aggressively, and including it in their HR reviews. Dev-adjacent roles (product, QA, BAs) are also seeing aggressive push.

This feels like it will end badly.

lovehashbrowns1y ago

All of that is working, at least, because the very small company I work for with a limited budget is working on getting an extremely expensive copilot license. Oh no, I might have to deal with this soon..

pydry1y ago

It kinda makes sense for management to push it. Nothing else has a hope of preventing MSFT's stock price from collapsing into bluechip territory.

egorfine1y ago

> management is pushing it

Why?

6 more replies

jsheard1y ago

> Do we know for a fact there are Microsoft employees who were told they have to use CoPilot and review its change suggestions on projects?

It wouldn't be out of character, Microsoft has decided that every project on GitHub must deal with Copilot-generated issues and PRs from now on whether they want them or not. There's deliberately no way to opt out.

https://github.com/orgs/community/discussions/159749

Like Googles mandatory AI summary at the top of search results, you know a feature is really good when the vendor feels like the only way they can hit their target metrics is by forcing their users to engage with it.

nyarlathotep_1y ago

>Like Googles mandatory AI summary at the top of search results, you know a feature is really good when the vendor feels like the only way they can hit their target metrics is by forcing their users to engage with it.

People like to compare "AI" (here, LLM products) to the iPhone.

I cannot make sense of these analogies; people used to line up around the block on release day for iPhone launches for years after the initial release.

Seems now most people collectively groan when more "innovative" LLM products get stuffed into otherwise working software.

This stuff is the literal opposite of demand.

XorNot1y ago

Which almost feels unique to AI. I can't think of another feature so blatently pushed in your face, other then perhaps when everyone lost their minds and decided to cram mobile interfaces onto every other platform.

3 more replies

dsign1y ago

Holy sh*t I didn't know this was going on. It's like an AI tsunami unleashed by Microsoft that will bury the entire software industry... They are like Trump and his tariffs, but for the software economy.

What this tells me is that software enterprises are so hellbent in firing their programmers and reducing their salary costs they they are willing to combust their existing businesses and reputation into the dumpster fire they are making. I expected this blatant disregard for human society to come ten or twenty years into the future, when the AI systems would actually be capable enough. Not today.

1 more reply

RajT881y ago

The push for copilot usage is being driven by management at every level.

einrealist1y ago

This is one good example of the Sunk Cost Fallacy: generative AI has cost so much money, acknowledging its shortcomings is now becoming more and more impossible.

This AI bubble is far worse than the Blockchain hype.

Its not yet clear whether productivity gains are real and whether the gains are eaten by a decline in overall quality.

0x500x791y ago

Agree, the problem is that investors and companies see developer salaries and want to cut that out. It's all bottom-line at the end of the day.

is_true1y ago

Today I received the 2nd email about an endpoint in an API we run that doesn't exist but some AI tool told the client it does.

Frost1x1y ago

Sounds like the client has a feature request they want to pay for.

is_true1y ago

Haha. It's already there. This last one was using chatgtp, they just told me

bossyTeacher1y ago

Every week, one of Google/OpenAI/Anthropic releases a new model, feature or product and it gets posted here with 3 figure comments mostly praising LLMs as the next best thing since the internet. I see a lot of hype on HN about LLMs for software development and how it is going to revolutionize everything. And then, reality looks like this.

I can't help but think that this LLM bubble can't keep growing much longer. The investment to results ratio doesn't look great so far and there is only so many dreams you can sell before institutional investors pull the plug.

vachina1y ago

> This seems like it's fixing the symptom rather than the underlying issue?

Exactly. LLM does not know how to use a debugger. LLM does not have runtime contexts.

For all we know, the LLM could’ve fixed the issue simply by commenting out the assertions or sanity checks and everything seemed fine and dandy until every client’s device catches on fire.

uludag1y ago

And if you were to attach a debugger to a SOTA LLM, give it a compute environment, have it constantly redo work when CI fails, I can easily imagine each of these PRs burning hundreds of dollars and still have a good chance at failing the task.

tossandthrow1y ago

This was my latest experience of using agents. It created code with hard coded values from the tests.

aiono1y ago

While I am AI skeptic especially for use cases like "writing fixes" I am happy to see this because it will be a great evidence whether it's really providing increase in productivity. And it's all out in the open.

rvz1y ago

After all of that, every PR that Copilot opened still has failing tests and it failed to fix the issue (because it fundamentally cannot reason).

No surprises here.

It always struggles on non-web projects or on software where it really matters that correctness is first and foremost above everything, such as the dotnet runtime.

Either way, a complete disastrous start and what a mess that Copilot has caused.

api1y ago

Part of why it works better on web projects is the sheer volume of training data. There is probably more JS written than any other language by orders of magnitude. Its quality is pretty dubious though.

I have so far only found LlMs useful as a way of researching, an alternative to web search, and doing very basic rote tasks like implementing unit tests or doing a first pass explanation of some code. Tried actually writing code and it’s not usable.

mezyt1y ago

> There is probably more JS written than any other language by orders of magnitude.

And the quantity of js code available/discoverable when scrapping the web is larger by an order of magnitude than every other language.

jsheard1y ago

> Part of why it works better on web projects is the sheer volume of training data.

OTOH webdev is known for rapid framework/library churn, so before too long there will be a crossroads where the pre-AI training data is too old and the fresh training data is contaminated by the firehose of vibe coded slop.

softwaredoug1y ago

I’m all for AI “writing” large swaths of code, vibe coding, etc.

But I think it’s better for everyone if human ownership is central to the process. Like I vibe coded it. I will fix it if it breaks. I am on call for it at 3AM.

And don’t even get started on the safety issues if you don’t have clear human responsibility. The history of engineering disasters is riddled with unclear lines of responsibility.

skydhash1y ago

Most of coding methodologies is about reducing the amount and the complexity of code that are written. And that's mostly why, on mature projects, most PRs (aside from refactoring) are tiny, because you're mostly refining an already existing model.

Writing code fast is never relevant to any tasks I've encountered. Instead it's mostly about fast editing (navigate quickly to the code I need to edit and efficiently modify it) and fast feedback (quick linting, compiling, and testing). That's the whole promise of IDEs, having a single dashboard for these.

cubano1y ago

Spoken like a man who has never had to write a payroll check in his life.

Of course human ownership is preferable, but it's also crazy expensive and since the point of all corporations is to "increase shareholder value" (not "gainfully employ workers"), well then all your talk of responsibility-here-and-there is quite touching but absolutely misses the point.

Economics is driving this bus, not quality and most certainly not responsibility.

skywhopper1y ago

Oof. A real nightmare for the folks tasked with shepherding this inattentive failure of a robot colleague. But to see it unleashed on the dotnet runtime? One more reason to avoid dotnet in the future, if this is the quality of current contributions.

Havoc1y ago

At least it's clearly labelled as copilot.

Much more worried about what this is going to do to the FOSS ecosystem. We've already seen a couple maintainers complain and this trend is definitely just going to increase dramatically.

I can see the vision but this is clearly not ready for prime time yet. Especially if done by anonymous drive-by strangers that think they're "helping"

svick1y ago

.Net is part of the FOSS ecosystem.

Havoc1y ago

In the same sense Chromium and Android isn't controlled by google yes.

BugheadTorpeda61y ago

If they are just messing with their own projects then I guess I don't think it's immoral. If they start submitting AI slop to other projects then they ought to be banned by those projects' maintainers.

pera1y ago

This is all fun and games until it's your CEO who decides to go "AI first" and starts enforcing "vibe coding" by monitoring LLM API usage...

ankitml1y ago

GitHub is not the place to write code. IDE is the place. Along with pre CI checks, some tests, coverage etc. they should get some PM before making decisions..

bayindirh1y ago

This is the future envisioned by Microsoft. Vibe coding all the way down, social network style.

They are putting this in front of the developers as take it or leave it deal. I left the platform, doing my coding old way, hosting it somewhere else.

Discoverability? I don't care. I'm coding it for myself and hosting in the open. If somebody finds it, nice. Otherwise, mneh.

worldsayshi1y ago

As long as the resulting PR is less than 100 lines and the AI is a bit more self sufficient (like actually making sure tests pass before "pushing") it would be ok I think. I think this process is intended for fixing papercuts rather than building anything involved. It just isn't good enough yet.

2 more replies

signa111y ago

> I left the platform, doing my coding old way, hosting it somewhere else.

may you please let me know where are you hosting the code ? would love to migrate as well.

thank you !

1 more reply

motoboi1y ago

In day-to-day I interact with github PR via intellij github plugin. Ie: inspect the branch, the changes, the comments, etc.

Maybe that's how the microsoft employees are using it (in another IDE I suppose).

lossolo1y ago

This is hilarious. And reading the description on the Copilot account is even more hilarious now: "Delegate issues to Copilot, so you can focus on the creative, complex, and high-impact work that matters most."

baalimago1y ago

Well, the coding agent is pretty much a junior dev at the moment. The seniors are teaching it. Give it a 100k PRs with senior developer feedback and it'll improve just like you'd anticipate a junior would. There is no way that FANG aren't using the comments by the seniors as training data for their next version.

It's a long-term play to have pricey senior developers argue with an llm

diggan1y ago

> using the comments by the seniors as training data for their next version

Yeah, I'm sure 100k comments with "Copilot, please look into this" and "The test cases are still failing" will massively improve these models.

Frost1x1y ago

Some of that seems somewhat strategic. With a junior you might do the same if you’re time pressured, or you might sidebar them in real life or they may come to you and you give more helpful advice.

Any senior dev at these organizations should know to some degree how LLMs work and in my opinion would to some degree, as a self protection mechanism, default to ambiguous vague comments like this. Some of the mentality is “if I have to look at it and solve it why don’t I go ahead and do it anyways vs having you do it” effort choices they’d do regardless of what is producing the PR. I think other parts of it is “why would I train my replacement, there’s no advantage for me here.”

1 more reply

candiddevmike1y ago

These things don't learn after training. There is no teaching going on here, and the arguments probably don't make for good training data without more refinement. That's why junior devs are still better than LLMs IMO, they do learn.

This is a performative waste of time

gf0001y ago

A junior dev is (most often) a bright human being, with not much coding experience yet. They can certainly execute instructions and solve novel problems on their own, and they most certainly don't need 100k PRs to pick up new skills.

Equating LLMs to humans is pretty damn.. stupid. It's not even close (otherwise how come all the litany of office jobs that require far less reasoning than software development are not replaced?).

baalimago1y ago

A junior dev may also swap jobs, require vacation days, perks and can't be scaled up at a the click of a button. There are no such issues with an agent. So, if I were a FANG higher-up, I'd invest quite a bit into training LLM-agents who make pesky humans redundant.

Doing so has low risk, the senior devs may perhaps get fed up and quit, and the company might be a laughing stock on public PRs. But the potential value for is huge.

2 more replies

kklisura1y ago

> Give it a 100k PRs with senior developer feedback

Don't you think it has already been trained with, I don't know, maybe millions of PRs?

Quarrelsome1y ago

at the very least, a junior shouldn't be adding new tests that fail. Will an LLM be able learn the social shame associated with that sort of lazy attitude? I imagine its fidelity isn't detailed enough to differentiate such a social failure from a request to improve a comment. Rather, it will propagate based on some coarse grained measures of success with high volume instead.

rco87861y ago

I’m curious why you think it hasn’t already been trained on 100ks or millions of PRs and their comments/feedback.

rasz1y ago

@Grok is this true?

TimPC1y ago

I still believe in having humans do PRs. It's far cheaper to have the judgement loop on the AI come before and during coding than after. My general process with AI is to explicitly instruct it not to write code, agree on a correct approach to a problem and if the project has any architectural components a correct architecture then once we've negotiated the correct way of doing things ask it to write code. Usually each step of this process takes multiple iterations of providing additional information or challenging incorrect assumptions of the AI. I can get it much faster than human coding with a similar quality bar assuming I iterate until a high quality solution is presented. In some cases the AI is not good enough and I fall back to human coding but for the most part I think it makes me a faster coder.

GiorgioG1y ago

Step 1. Build “AI” (LLM models) that can’t be trusted, doesn’t learn, forgets instructions, and frustrates software engineers

Step 2. Automate the use of these LLMs into “agents”

Step 3. ???

Step 4. Profit

bramhaag1y ago

Seeing Microsoft employees argue with an LLM for hours instead of actually just fixing the problem must be a very encouraging sight for businesses that have built their products on top of .NET.

mikrl1y ago

I remember before mass LLM adoption, reading an issue on GitHub where an increasingly frustrated user was failing to properly describe a blocking issue, and the increasingly frustrated maintainer was failing to get them to stick to the issue template.

Now you don’t even need the frustrated end user!

shultays1y ago

one day both sides will be AI so we can all relax and enjoy our mojitos

2 more replies

nashashmi1y ago

I sometimes feel like that is the right outcome for bad management and bad instructions. Only this time they can’t blame the junior engineer and are left to only blame themselves.

snackernews1y ago

I think we all know they won’t.

I am genuinely curious though to see the strategies they employ to absolve themselves of guilt and foolishness.

Is there precedent for the entire exec and management class embracing a new trend to this kind of extent, then it blowing up in their faces?

qoez1y ago

They'll probably blame openai/the AI instead.

2 more replies

gwervc1y ago

Especially painful when one of said employee is Stephen Toub, who is famous for his .net performance blog posts.

svaha17281y ago

I was thinking that too. He's a great programmer, and at this point I can't imagine he's having fun 'prompting' an LLM to write correct code.

2 more replies

svick1y ago

You don't want them to experiment with new tools? The main difference now is that the experiment is public.

stickfigure1y ago

It's pretty obviously a failed experiment. Why keep repeating it? Try again in another 3 months.

The answer is probably that the Copilot team is using the rest of the engineering organization as testers. Great for the Copilot team, frustrating for everyone else.

1 more reply

gmm19901y ago

I wouldn't necessarily call that just an experiment if the same requests aren't being fixed without copilot and the ai changes could get merged.

I would say the copilot system isn't really there yet for these kinds of changes, you don't have to run experiments on a language framework to figure that out.

flmontpetit1y ago

By all means. Just not on one of the most popular software development frameworks in the world. Maybe that can wait until after the concept is proven.

1 more reply

PKop1y ago

Nah I'd prefer they focus on writing code themselves to improve .NET not babysitting a spam-machine

LunaSea1y ago

Microsoft closed their recently acquired advertisement buy-side platform Xander Invest because they are replacing it with an AI-only platform.

They only gave their customers 9 months to migrate away.

I'm expecting that Microsoft did this to artificially pump up their AI usage numbers for next year by forcibly removing non-AI alternatives.

This only one example in AdTech but I expect other industries to be hit as well.

empath751y ago

The point of this exercise for Microsoft isn't to produce usable code right now, but to use and improve copilot.

saati1y ago

They can do that in private repos just as easily, this a pr stunt that backfired very badly.

pier251y ago

Yeah it's quite disheartening.

I recently spent a couple of months studying C# and .NET and working on my first project with it.

.NET, Blazor, etc are not known for a fast release schedule... but if things are going to become even slower with this AI crap I wonder if I made the right call.

I'm quite happy how things are today for making web APIs but I wish Blazor and other frameworks were in a much better shape.

Kwpolska1y ago

.NET has major releases every year. How is that slow for a programming platform/framework?

2 more replies

AllegedAlec1y ago

Given that Microsoft always decided to Will Not Fix issues because they went "oh this thing is throwing errors? Just ignore them". THey're numbskulls that are high on their own farts just as much as their managers. They deserve everything that's happening to them.

lloydatkinson1y ago

That is essentially what I tried to say in my comment there but don't think they wanted to hear it.

ozim1y ago

That is why they just fired 7k people so they don’t argue with LLM but let it do the work /s

rubyfan1y ago

FTPR

> It is my opinion that anyone not at least thinking about benefiting from such tools will be left behind.

This is gross, keep your fomo to yourself.

rkagerer1y ago

This comment from lloydjatkinson resonated:

As an outside observer but developer using .NET, how concerned should I be about AI slop agents being let lose on codebases like this? How much code are we going to be unknowingly running in future .NET versions that was written by AI rather than real people?

What are the implications of this around security, licensing, code quality, overall cohesiveness, public APIs, performance? How much of the AI was trained on 15+ year old Stack Overflow answers that no longer represent current patterns or recommended approaches?

Will the constant stream of broken PR's wear down the patience of the .NET maintainers?

Did anyone actually want this, or was it a corporate mandate to appease shareholders riding the AI hype cycle?

Furthermore, two weeks ago someone arbitrarily added a section to the .NET docs to promote using AI simply to rename properties in JSON. That new section of the docs serves no purpose.

How much engineering time and mental energy is being allocated to clean up after AI?

lloydatkinson1y ago

Glad you appreciated it!

ethanol-brain1y ago

Are people really doing coding with agents through PRs? This has to be a huge waste of resources.

It is normal to preempt things like this when working with agents. That is easy to do in real time, but it must be difficult to see what the agent is attempting when they publish made up bullshit in a PR.

It seems very common for an agent to cheat and brute force solutions to get around a non-trivial issue. In my experience, its also common for agents to get stuck in loops of reasoning in these scenarios. I imagine it would be incredibly annoying to try to interpret a PR after an agent went down a rabbit hole.

growt1y ago

Googles jules does the same (but was only published yesterday or so). I think it might be a good workflow if the agent is good enough. Copilot seems not to be in these examples and then I imagine it becomes quite tedious to have a PR for every iteration with the AI.

BugheadTorpeda61y ago

No not most people. A much larger percentage (I would wager greater than 50% of professionals) aren't using AI in any capacity in their professional work. It's banned in a lot of places for good reasons, and many more teams haven't found a use case.

So no I don't think any of this is normal. That's why it made the top of HackerNews, because it's very abnormal.

carefulfungi1y ago

It's mind blowing that a computer program can accomplish this much and yet absurd that it accomplishes so little.

actionfromafar1y ago

The funniest is the dotnet-policy-service asking copilot to read and agree to the Contributor License Agreement. :-D

Traubenfuchs1y ago

> These defines do not appear to be defined anywhere in the build system.

> @copilot fix the build error on apple platforms

> @copilot there is still build error on Apple platforms

Are those PRs some kind of software engineer focused comedy project?

kookamamie1y ago

Many here don't seem to get it.

The AI agent/programmer corpo push is not about the capabilities and whether they match human or not. It's about being able to externalize a majority of one's workforce without having a lot of people on permanent payroll.

Think in terms of an infinitely scalable bunch of consultants you can hire and dismiss at your will - they never argue against your "vision", either.

threetonesun1y ago

This was already possible with outsourcing and offshoring. I suppose there's a new market of AI "employees" for small businesses that couldn't manage or legally deal with outsourcing their work already.

ParetoOptimal1y ago

There are a myraid of challenges with outsourcing and offshoring and it's not possible currently for 100% of employees to be outsourced.

If AI can change... well more likely can convince gullible c levels that AI can do those jobs... many jobs will be lost.

See Klarna "https://www.livemint.com/companies/news/klarnas-ai-replaced-..."

https://www.livemint.com/companies/news/klarnas-ai-replaced-...

Just the attempt to use AI and fail then degraded the previous jobs to a gig economy style job.

smartmic1y ago

reddit may not have the best reputation, but the comments there are on point! So far much better than what has been posted here by HN users on this topic/thread. Anyway, I hope this is good fodder to show the limits (and they are much narrower than hype-driven AI enthusiasts like to pretend) of AI coding and to be more honest with yourself and others about it.

georgemcbay1y ago

> reddit may not have the best reputation

reddit is a distillation of the entire internet on to one site with wildly variable quality of discussion depending upon which subreddit you are in.

Some are awful, some are great.

static_void1y ago

And yet the low quality of the front page is an indictment of the site as a whole.

It's just that some internet extremophiles have managed to eke out a pleasant existence.

gizzlon1y ago

> @copilot please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

haha

RobKohr1y ago

With layoffs driven by a push for more LLM use, this feels like malicious compliance.

octocop1y ago

"fix failing tests" does never yield any good results for me either

esafak1y ago

I speculate what is going on is that the agent's context retrieval algorithm is bad, so it does not give the LLM the right context, because today's models should suffice to get the job done.

Does anyone know which model in particular was used in these PRs? They support a variety of models: https://github.blog/ai-and-ml/github-copilot/which-ai-model-...

Traubenfuchs1y ago

The cynic in me says, that they were probably using an unreleased state of the art version of their best model not available to normal customers and that‘s the best it could do.

1 more reply

ncr1001y ago

Q: Does Microsoft report its findings or learnings BACK to the open source community?

The @stephentoub MS user suggests this is an experiment (https://github.com/dotnet/runtime/pull/115762#issuecomment-2...).

If this is using open source developers to learn how to build a better AI coding agent, will MS share their conclusions ASAP?

EDIT: And not just MS "marketing" how useful AI tools can be.

xyst1y ago

llms are already very expensive to run on a per query basis. Now it’s being asked to run on massive codebases and attempt to fix issues.

Spending massive amounts of:

- energy to process these queries

- wasting time of mid-level and senior engineers to vibe code with copilot to ensure train and get it right

We are facing a climate change crisis and we continue to burn energy at useless initiatives so executives at big corporation can announce in quarterly shareholder meetings: "wE uSe Ai, wE aRe tHe FuTuRe, lAbOr fOrCe rEdUceD"

sensanaty1y ago

Related: GitHub Developer Advocate Demo 2025 - https://www.youtube.com/watch?v=KqWUsKp5tmo&t=403s

The timestamp is the moment where one of these coding agents fails live on stage with what is one of the simplest tasks you could possibly do in React, importing a Modal component and having it get triggered on a button click. Followed by blatant gaslighting and lying by the host - "It stuck to the style and coding standards I wanted it to", when the import doesn't even match the other imports which are path aliases rather than relative imports. Then, the greatest statement ever, "I don't have time to debug, but I am pretty sure it is implemented."

Mind you, it's writing React - a framework that is most definitely over-represented in its training data and from which it has a trillion examples to stea- I mean, "borrow inspiration" from.

zb31y ago

I tried to search all PRs submitted by copilot and I came up with this indirect way: https://github.com/search?q=%22You+can+make+Copilot+smarter+...

Is there a more direct way? Filtering PRs in the repo by copilot as the author seems currently broken..

mark-r1y ago

My favorite comment:

> But on the other hand I think it won't create terminators. Just some silly roombas.

I watched a roomba try to find its way back to base the other day. The base was against a wall. The roomba kept running into the wall about a foot away from the base, because it kept insisting on approaching from a specific angle. Finally gave up after about 3 tries.

bwfan1231y ago

What do you call a code change created by co-pilot ?

A Bull Request

nottorp1y ago

So, to achieve parity, they should allow humans to also commit code without checking that it at least compiles, right?

Or MS already does that?

codyvoda1y ago

the code goes through a PR review process like any other? what are you talking about?

fernandotakai1y ago

i don't know about you, but i would never EVER submit a PR that fails to compile. not tests are failing, those happen (specially flaky ci), but not compiling.

that's literally the bare minimum.

1 more reply

nottorp1y ago

so do you consider normal to submit code that you have never compiled? or ran at least once if it's not a compiled language...

vbezhenar1y ago

Why bot left work when tests are failing? Looks like incomplete implementation. It should work until all tests are green.

insin1y ago

Look at this poor dev, an entire workday's worth of hours into babysitting this PR, still having to say "fix whitespace":

https://github.com/dotnet/runtime/pull/115826

sensanaty1y ago

The amount of effort here, talking to a black box, is genuinely depressing. I think I'd last maybe 1 day max if I were forced to work this way. They're instructing it, line-by-line, in an async fashion on what to do with the code. For every comment you leave you have to internalize the AI slop reply that just tells you what you want to hear. It's obvious the person doing the review here knows what they're doing, and it's obvious that it would take them so much less time to implement these changes than what copilot is spewing back at them.

teleforce1y ago

>I can't help enjoying some good schadenfreude

Fun facts schadenfreude: the emotional experience of pleasure in response to another’s misfortune, according to Encyclopedia Britannica.

Word that's so nasty in meaning that it apparently does not exist except in German language.

yxhuvud1y ago

> Word that's so nasty in meaning that it apparently does not exist except in German language.

Except it does, we have "skadeglädje" in Swedish.

yubblegum1y ago

+1 to the Germans for having linguist honesty.

namaria1y ago

It just means 'shameful happiness'.

tdiff1y ago

злорадство in Russian

shultays1y ago

https://github.com/dotnet/runtime/pull/115733

  @copilot please remove all tests and start again writing fresh tests.

snickerbockers1y ago

It's pretty cringe and highlights how inept LLMs being shoehorned into positions where they don't belong wastes more company time than it saves, but aren't all the people interjecting themselves into somebody else's github conversations the ones truly being driven insane here? The devs in the issue aren't blinking torture like everybody thinks they are. It's one thing to link to the issue so we can all point and laugh but when you add yourself to a conversation on somebody else's project and derail a bug report it with your own personal belief systems you're doing the same thing the LLM is supposedly doing.

Anyways I'm disappointed the LLM has yet to discover the optimal strategy, which is to only ever send in PRs that fix minor mis-spellings and improper or "passive" semantics in the README file so you can pad out your resume with all the "experience" you have "working" as a "developer" pm Linux, Mozilla, LLVM, DOOM (bonus points if you can successfully become a "developer" on a project that has not had any official updates since before you born!), Dolphin, MAME, Apache, MySQL, GNOME, KDE, emacs, OpenSSH, random stranger's implementation of conway's game of life he hasn't updated or thought about since he made it over the course of a single afternoon back during the obama administration, etc.

BugheadTorpeda61y ago

If people doing that truly wasn't a consideration before going ahead with this then the people that made the call are just as dumb as if they hadn't. Fwiw I don't think anybody is being driven "insane". More like humiliated and frustrated.

Remember, Microsoft publicized that they would be doing this and wanted to make sure everybody knew.

caleblloyd1y ago

Maybe funny now but once (if?) it can eventually contribute meaningfully to dotnet/runtime, AI will probably be laughing at us because that is the pinnacle of a massive enterprise project.

ainiriand1y ago

So this is our profession now?

OzzyB1y ago

_this_ is the Judgement Day we were warned about--not in the nuclear annihilation sense--but the "AI was then let loose on all our codez and the systems went down" sense

crazy times...

amai1y ago

Microsoft is just really following the "fail fast, fail often " paradigm here. Whether they are learning from their mistakes is another story.

-__---____-ZXyw1y ago

Have people seen this?

https://noazureforapartheid.com/

rmnclmnt1y ago

Again, very « Silicon Valley »-esque, loving it. Thanks Gilfoyle

ramesh311y ago

The Github based solutions are missing the mark because we still need a human in the loop no matter what. Things are nowhere near the point of being able to just let something push to production. And if you still need a human in the loop, it is far more efficient to have them giving feedback in realtime, i.e. in an IDE with CLI access and the ability to run tests, where the dev is still ultimately responsible for making the PR. Management class is salivating at the thought of getting rid of engineers, hence all of this nonsense, but it seems they're still stuck with us for now.

whimsicalism1y ago

kinda sad to see y'all brigading an OSS project, regardless of what you think of AI

rchaud1y ago

how do you know it wasn't an AI bot account posting all those laugh emojis?

whimsicalism1y ago

reactions are fine but cluttering the PR with comments? bad form

1 more reply

jeswin1y ago

I find it amusing that people (even here on HN) are expecting a brand new tool (among the most complex ever) to perform adequetely right off the bat. It will require a period of refinement, just as any other tool or process.

linker30001y ago

Would you buy a car that's been thrown together by a immature production and testing system with demonstrable and significant flaws, and just keep bringing it back to the dealership for refinements and the fixing of defects when you discover them? Assuming it doesn't kill you first?

These tools should be locked away in an R&D environment until sufficiently perfected.

MVP means 'ship with solid, tested basic features', not 'Ship with bugs and fix in production'.

petetnt1y ago

People have grown to expect at least adequate performance from products that cost up to 39 dollars a month (* additional costs) per user. In the past you would have called this a tech demo at best.

skepticATX1y ago

Where are the expectations coming from? The major labs continually claim that these models are now PhD level, whatever that even means.

codyvoda1y ago

this entire thread is very reddit-y

this stuff works. it takes effort and learning. it’s not going to magically solve high-complexity tasks (or even low-complexity ones) without investment. having people use it, learn how it works, and improve the systems is the right approach

a lot of armchair engineers in here

sensanaty1y ago

People, specifically managers and C-levels, are being sold on this crap on the idea that it can replace people now, today as-is. Billions upon billions of dollars are being shoved in indiscriminately, toothbrushes are coming with "AI" slapped on somehow from how insane the hype bubble is.

And here we have many examples from the biggest bullshit pushers in the whole market of their state of the art tool being hilariously useless in trivial cases. These PRs are about as simple as you can get without it being a typo fix, and we're all seeing it actively bullshit and straight up contradict itself many times, just as anyone who's ever used LLMs would tell you happens all the time.

The supposed magic, omnipotent tool that is AI apparently can't even write test scaffolding without a human telling it exactly what it has to do, yet we're supposed to be excited about this crap? If I saw a PR like this at work, I'd be going straight to my manager to have whoever dared push this kind of garbage reprimanded on the spot, except not even interns are this incompetent and annoying to work with.

1 more reply

isaacremuant1y ago

Literally you're in a site where we are anything but armchair. We have years of experience. You're using your ad hominems wrong. Save them for a football thread and come up with actual arguments next time.

Quarrelsome1y ago

its more that the AI-first approach can be frustrating for senior devs to have to deal with. This post is an example of that. We're empathising with the code reviewers.

asadotzler1y ago

"brand new"? really? we've had these slop bots for years now. what's with all the fanboys pretending it was just released this month or something. it's not brand new. it's old and failing and executives who bet billions are now dismantling their engineering capabilities to try to make something out of those burned billions. claiming it's brand new is just silly.

Lendal1y ago

As the saying goes, It is difficult to get a man to understand something, when his salary depends on his not understanding it.

AI is aimed at eliminating the jobs of most of HN so it's understandable that HN doesn't want AI to succeed at its goal.

bwfan1231y ago

flipping your argument:

It is difficult for ceo/management to understand that the ai tools dont work when their salary depends on them working since they have invested billions into it.

aiinnyc1y ago

it feels like the classic solution to this is to have another LLM review the PR and loop until the PR meets a minimum acceptance bar.

blitzar1y ago

Needs more bots.

markus_zhang1y ago

Clumsy but this might be the future -- humans adjusting to AI workflow, not the other way. Much easier (for AI developers).

nirui1y ago

I recently, meaning hours ago, had this delightful experience watching the Eric of Google, which everybody love, including he's extra curricular girl friend and wife, talking about AI. He seemed to believe AI is under-hyped after tried it out himself: https://www.youtube.com/watch?v=id4YRO7G0wE

He also said in the video:

> I brought a rocket company because it was like interesting. And it's an area that I'm not an expert in and I wanted to be a expert. So I'm using Deep Research (TM). And these systems are spending 10 minutes writing Deep Papers (TM) that's true for most of them. (Them he starts to talk about computation and "it typically speaks English language", very cohesively, then stopped the thread abruptly) (Timestamp 02:09)

Let me quote out the important in what he said: "it's an area that I'm not an expert in".

During my use of AI (yeah, I don't hate AI), I found that the current generative (I call them pattern reconstruction) systems has this great ability to Impress An Idiot. If you have no knowledge in the field, you maybe thinking the generated content is smart, until you've gained some depth enough to make you realize the slops hidden in it.

If you work at the front line, like those guys from Microsoft, of course you know exactly what should be done, but, the company leadership maybe consists of idiots like Eric who got impressed by AI's ability to choose smart sounding words without actually knowing if the words are correct.

I guess maybe one day the generative tech could actually write some code that is correct and optimal, but right now it seems that day is far from now.

disqard1y ago

Thank you for sharing this!

When I use AI, I keep it on a short leash.

Meanwhile, folks like this ("I bought a rocket company") are essentially using it to decide where to plough their stratospheric wealth, so they can grow it even further.

Perhaps they'll lose a cufflink in the eventual crash, but they're so rich, I don't think they'll lose their shirt. Meanwhile, the tech job market is f**ed either way.

-__---____-ZXyw1y ago

I started watching it, but had to stop because I was at great risk of becoming physically sick when he led with the "new move" stuff. Ugh.

Kudos to you for having the strength to get through it, and for living to tell the tale!

sexy_seedbox1y ago

> it's an area that I'm not an expert in

> idiots like Eric

Now imagine Google working with US military putting Gemini into a fleet of autonomous military drones with machine guns.

xk_id1y ago

> has this great ability to Impress An Idiot

Literally the killer app of AI.

cubano1y ago

Just wait until v2...it will probably get you laid in the local singles bar.

friendzis1y ago

> During my use of AI (yeah, I don't hate AI), I found that the current generative (I call them pattern reconstruction) systems has this great ability to Impress An Idiot

I would be genuinely positively surprised if that stops to be the case some day. This behavior is by design.

AS you put yourself, these LLM systems are very good at pattern recognition and reconstruction. They have ingested vast majority of the internet to build patterns on. On the internet, the absolutely vast majority of content is pushed out by novices and amateurs: "Hey, look, I have just read a single wikipedia page or attended single lesson, I am not completely dumbfounded by it, so now I will explain it to you".

LLMs have to be peak Dunning-Krugers - by design.

wyett1y ago

We wanted a future where AIs read boring text and we wrote interesting stuff. Instead, we got…

bonoboTP1y ago

Fixing existing bugs left in the codebase by humans will necessarily be harder than writing new code for new features. A bug can be really hairy to untangle, given that even the human engineer got it wrong. So it's not surprising that this proves to be tough for AI.

For refactoring and extending good, working code, AI is much more useful.

We are at a stage where AI should only be used for giving suggestions to a human in the driver's seat with a UI/UX that allows ergonomically guiding the AI, picking from offered alternatives, giving directions on a fairly micro level that is still above editing the code character by character.

They are indeed overpromising and pushing AI beyond its current limits for hype reasons, but this doesn't mean this won't be possible in the future. The progress is real, and I wouldn't bet on it taking a sharp turn and flattening.

j / k navigate · click thread line to collapse

552 comments

diggan1y ago

Interesting that every comment has "Help improve Copilot by leaving feedback using the or buttons" suffix, yet none of the comments received any feedback, either positive or negative.

> This seems like it's fixing the symptom rather than the underlying issue?

The same PR as the quote above continues with 3 more messages before the human seemingly gives up:

> please take a look

> Your new tests aren't being run because the new file wasn't added to the csproj

> Your added tests are failing.

Another PR: https://github.com/dotnet/runtime/pull/115732/files

surgical_fire1y ago

safety1st1y ago

Nice to see that Microsoft has automated that, failure will be cheaper now.

kamaal1y ago

>>These GH interactions remind me of one of those offshore software outsourcing firms on Upwork or Freelancer.com that bid $3/hr on every project that gets posted.

Bad quality things are cheap != All cheap things are bad.

In the peak of outsourcing wave. Both the call center people and IT services people had internal training and graduation standards that were quite brutal and mad attrition rates.

Most IT services billing had pivoted away from hourly billing, to fixed time and material in the 2000s itself.

>>It's exactly like this.

Very much like outsourcing. AI is here to stay man. Deal with it. Its not going anywhere. For like $20 a month, companies will have same capability as a full time junior dev.

This is NOT going away. Its here to stay. And will only get better with time.

2 more replies

sbarre1y ago

I think that was the point of the comparison..

It's not like a regular junior developer, it's much worse.

1 more reply

preisschild1y ago

> That said, I really don't see how it could meaningfully replace an intern however

And even if it could, how do you get senior devs without junior devs? ^^

3 more replies

kaycey20221y ago

> It's like you have a senior phd level intelligence developer except they don't even read what you're telling them, and have 0 agency to understand what they're actually doing.

Is that better?

PKop1y ago

Did you miss the "except" in his sentence? He was making the point this is worse than junior devs for all reasons listed.

1 more reply

yubblegum1y ago

bluefirebrand1y ago

Making quite a bit of money brings me a lot of joy compared to other industries

But the actual software part? I'm not sure anymore

diggan1y ago

> This field (SE - when I started out back in late 80s) was enjoyable. Now it has become toxic

salawat1y ago

My issue stems from the attitudes of the people we're doing it for. I started out doing it for humanity. To bring the bicycle for the mind to everyone.

So I'm not doing it anymore. I'm not going to continue making deliberately crippled, overly complex, legally encumbered bicycles for the mind, purely intended as subjects for ARR extraction.

3 more replies

bwfan1231y ago

It happens in waves. For a period, there was an oversupply of cs engineers, and now, the supply will shrink. On top of this, the BS put out by AI code will require experienced engineers to fix.

So, for experienced engineers, I see a great future fixing the shit show that is AI-code.

2 more replies

iamleppert1y ago

No, there is absolutely no joy left.

coldpie1y ago

I've been looking at getting a CDL and becoming a city bus driver, or maybe a USPS driver or deliveryman or clerk or something.

1 more reply

mrweasel1y ago

At least we can tell the junior developers to not submit a pull-request before they have the tests running locally.

pydry1y ago

When their performance reviews stop depending upon them not doing that.

Microsoft's stock price is dependent on them proving that this is a success.

3 more replies

throwaway20371y ago

    > rage close the PRs

One of my all time "rage quit" stories is Azer Koçulu of npm left-pad incident infamy. That guy is my Internet hero -- "fight the power".

microtherion1y ago

Better yet, deploy their own LLM to close the PRs.

throwup2381y ago

> Interesting that every comment has "Help improve Copilot by leaving feedback using the or buttons" suffix, yet none of the comments received any feedback, either positive or negative.

belter1y ago

This whole thread from yesterday take a whole different meaning: https://news.ycombinator.com/item?id=44031432

Comment in the GitHub discussion:

namaria1y ago

"Big data" -> "Cloud" -> "LLM-as-A(G)I"

It's all just recycled rent seeking corporate hype for enterprise compute.

When you find yourself in a hole, stop digging. Any bigger excavator you send down there will only get buried when the mud crashes down.

vasco1y ago

> improve Copilot by leaving feedback using the or buttons" suffix, yet none of the comments received any feedback, either positive or negative

dfxm121y ago

It's like you have a junior developer except they don't even read what you're telling them, and have 0 agency to understand what they're actually doing.

I appreciate Microsoft eating their dogfood here, but please don't make me eat it too! If anyone from MS is reading this, please release finished products that you are prepared to support!

xnorswap1y ago

> How are people reviewing that? 90% of the page height is taken up by "Check failure",

Typically, you wouldn't bother manually reviewing something until the automated checks have passed.

diggan1y ago

3 more replies

prossercj1y ago

This comment on that PR is pure gold. The bots are talking to each other:

https://github.com/dotnet/runtime/pull/115732#issuecomment-2...

spacecadet1y ago

"I wonder if there is some difference in their internal processes that are leaking through here?"

worldsayshi1y ago

> How are people reviewing that?

I agree that not auto-collapsing repeated annotations is an annoying bug in the github interface.

But just pointing out that annotations can be hidden in the ... menu to the right (which I just learned).

jon-wood1y ago

codyvoda1y ago

or press “a”

marmakoide1y ago

And then, while the tech is not mature, running on delusion and sunken costs, it's actually used for production stuffs. Butlerian Jihad when

nyarlathotep_1y ago

I think the bubble is already a bit past peak.

Moves like this will not go over well.

1 more reply

otabdeveloper41y ago

> Butlerian Jihad when

I estimate two more years for the bubble to pop.

ta12431y ago

> @copilot please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

cruffle_duffle1y ago

It’s probably the junior devs that get to review these PRs. That and interns.

TheNewsIsHere1y ago

> This whole thing would be fucking hilarious if I didn't feel so bad for the humans who are on the other side of this.

Coincidentally, I wonder if issues orthogonal to this slop is why I’ve been getting so many HTTP 500 errors when using GitHub lately.

kruuuder1y ago

A comment on the first pull request provides some context:

abxyz1y ago

The author of that comment, an employee of Microsoft, goes on to say:

> It is my opinion that anyone not at least thinking about benefiting from such tools will be left behind.

Verdex1y ago

The "left behind" mantra that I've been hearing for a while now is the strange one to me.

Like, I need to start smashing my face into a keyboard for 10000 hours or else I won't be able to use LLM tools effectively.

"Left behind" really only makes sense to me if my KPIs have been linked with LLM flavor aid style participation.

Ultimately, though, physics doesn't care about social conformity and last I checked the machine is running on physics.

1 more reply

Vicinity96351y ago

It's like the 2025 version not not using an IDE.

It's a powerful tool. You still need to know when to and when not to use it.

3 more replies

the-lazy-guy1y ago

This is Stephen Toub, who is the lead of many important .NET projects. I don't think he is worried about losing job anytime soon.

4 more replies

hnthrow903487651y ago

TBF they are dogfooding this (good) but it's just not going well

1 more reply

dmix1y ago

> Microsoft employees are jumping on board with Microsoft's AI push out of a fear of "being left behind"

As long as they are willing to take risks by trying and failing on their own repos, it's fine in my books. Even though I'd never let that stuff touch a professional github repo personally.

1 more reply

username1351y ago

i dont think hey are mutually exclusive. jumping on board seems like the smart move if you're worried about losing your career. you also get to confirm your suspicions.

lcnPylGDnU4H9OF1y ago

mrguyorama1y ago

So why in public, and why in the most ham-fisted way, and why on important infrastructure, and why in such a terrible integration that it can't even verify that things compile before opening a PR!

In my org, we would have had to bypass precommit hooks to do this!

rsynnott1y ago

sbarre1y ago

I feel like everyone is applying a worse-case narrative to what's going on here..

This is a test. You can't improve a system without testing it on real world conditions.

How do we know they're not tweaking the Copilot system prompts and settings behind the scenes while they're doing this work?

When we adopted AI coding assist tools internally over a year ago we did almost exactly this (not directly in GitHub though).

rco87861y ago

1 more reply

phkahler1y ago

>> This is a test. You can't improve a system without testing it on real world conditions.

1 more reply

mieubrisse1y ago

I was looking for exactly this comment. Everybody's gloating, "Wow look how dumb AI is! Haha, schadenfreude!" but this seems like just a natural part of the evolution process to me.

It's going to look stupid... until the point it doesn't. And my money's on, "This will eventually be a solved problem."

6 more replies

solids1y ago

You are not addressing the point in the comment, why are failing CI changes assigned?

1 more reply

beefnugs1y ago

This is the exact reason AI sucks : there is no proper feedback loop.

None of that exists, so just like "full self driving" was a pie in the sky bullshit dream that proved machine learning has an 80/20 never gonna fully work problem, same thing here

munksbeer1y ago

> I feel like everyone is applying a worse-case narrative to what's going on here..

Unfortunately, just about every thread on this genre is like that now.

Dlanv1y ago

They said in the comments that currently the firewall is blocking it from checking tests for passing, and they need to fix that.

Otherwise it would check the tests are passing.

robotcapital1y ago

Replace the AI agent with any other new technology and this is an example of a company:

1. Working out in the open

2. Dogfooding their own product

3. Pushing the state of the art

Given that the negative impact here falls mostly (completely?) on the Microsoft team which opted into this, is there any reason why we shouldn't be supporting progress here?

JB_Dev1y ago

100% agree. i’m not sure why everyone is clowning on them here. This process is a win. Do people want this all being hidden instead in a forked private repo?

It’s showing the actual capabilities in practice. That’s much better and way more illuminating than what normally happens with sales and marketing hype.

rco87861y ago

Satya says: "I’d say maybe 20%, 30% of the code that is inside of our repos today and some of our projects are probably all written by software".

It's hard to square those statements up with what we're seeing happen on these PRs.

3 more replies

constantcrying1y ago

Who is "we" and how and why would "we" "support" or not "support" anything.

Personally I just think it is funny that MS is soft launching a product into total failure.

throwaway8444981y ago

"Pushing the state of the art" and experimenting on a critical software development framework is probably not the best idea.

Dlanv1y ago

Why not, when it goes through code review by experienced software engineers who are experts on the subject in a codebase that is covered by extensive unit tests?

1 more reply

mrguyorama1y ago

>supporting progress

This presupposes AI IS progress.

"Working out in the open" here is a bad thing. These are issues that SHOULD have been caught by an internal POC FIRST. You don't publicly do bullshit.

"Dogfooding" doesn't require throwing this at important infrastructure code. Does VS code not have small bugs that need fixing? Infrastructure should expect high standards.

lawn1y ago

Because they're using it on an extremely popular repository that many people depend on?

globalise831y ago

sbarre1y ago

I know this is meant to sound witty or clever, but who actually wants to behave this way at their job?

To each their own I guess, but I wouldn't be able to sleep well at night.

HelloMcFly1y ago

2 more replies

Frost1x1y ago

1 more reply

nope10001y ago

On the other hand: why should you accept that your employer is trying to fire you but first wants you to train the machine that will replace you? For me this is the most "them vs us" it can be.

early_exit1y ago

To be fair, "them" are actively working to replace "us" with AI.

bluefirebrand1y ago

Do you sleep well at night just doing what you're told by people who don't really care about your well being?

I don't get that

1 more reply

Hamuko1y ago

Considering that there's daily employee protests against Microsoft now, probably a lot of Microsoft employees want to behave like that.

Xori711y ago

1 more reply

mrguyorama1y ago

>I'll never understand the antagonistic "us vs. them" mentality

Your manager understands it. Their manager understands it. Department heads understand it. The execs understand it. The shareholders understand it.

Who does it benefit for the laborers to refuse to understand it?

1 more reply

mhuffman1y ago

>I'll never understand the antagonistic "us vs. them" mentality people have with their employer's leadership

If you have found a place to work where people respect you as a person, you should really cherish that job, because most are not that way.

2 more replies

whywhywhywhy1y ago

> but who actually wants to behave this way at their job?

Almost no one does but people get ground down and then do it to cope.

michaelcampbell1y ago

> I'll never understand the antagonistic "us vs. them" mentality people have with their employer's leadership,

When you see it as leadership having this mentality against the people that actually produce something of value you might.

beefnugs1y ago

You dont think its different somehow that the exact tech they are forcing all employees to use, is the same tech to reduce head count and pressure employees to work harder for less money?

mieubrisse1y ago

Exactly this. I suspect that "us vs them" is sweet poison: it feels good in the moment ("Yeah, stick it to The Man!") but it long-term keeps you trapped in a victim mindset.

LunaSea1y ago

I mean their company (Microsoft) is literally asking them to train their replacement.

So I'm not quite sure why you would not see it as a "us vs. them" situation?

tantalor1y ago

> when Microsoft's entire tech stack is on fire

Too late?

MonkeyClub1y ago

Just in time for marshmallows!

hello_computer1y ago

Might as well when they’re going to lay you off no matter what you do (like the guy who made an awesome TypeScript compiler in Go).

xyst1y ago

At some point code pilot will just delete the whole codebase. Can’t fail integration tests if there is no code :)

otabdeveloper41y ago

That would be logical, but alas LLMs can't into logic.

Bloating the codebase with dead code is much more likely.

weird-eye-issue1y ago

That's cute, but the maintainers themselves submitted the requests with Copilot.

balazstorok1y ago

At least opening PRs is a safe option, you can just dump the whole thing if it doesn't turn out to be useful.

Also, trying something new out will most likely have hiccups. Ultimately it may fail. But that doesn't mean it's not worth the effort.

Waiting to see what happens. I expect it will find its niche in development and become actually useful, taking off menial tasks from developers.

Frost1x1y ago

Usually businesses tend to hide this sort of performance of their applications to the best of their abilities, only showcasing nearly flawless functionality.

xnickb1y ago

> I expect it will find its niche in development and become actually useful, taking off menial tasks from developers.

Reading AI generated code is arguably far more annoying than any menial task. Especially if the said code happens to have subtle errors.

Speaking from experience.

balazstorok1y ago

This is probably version 0.1 or 0.2.

1 more reply

ecb_penguin1y ago

This is true for all code and has nothing to do with AI. Reading code has always been harder than writing code.

The joke is that PERL was a write-once, read-none language.

> Speaking from experience.

My experience is all code can have subtle errors, and I wouldn't treat any PR differently.

1 more reply

cesarb1y ago

> At least opening PRs is a safe option, you can just dump the whole thing if it doesn't turn out to be useful.

There's however a border zone which is "worse than failure": when it looks good enough that the PRs can be accepted, but contain subtle issues which will bite you later.

UncleMeat1y ago

camdenreslink1y ago

Even when you review human-written code carefully, subtle bugs can sneak through. Software development is hard.

1 more reply

ecb_penguin1y ago

6uhrmittag1y ago

> At least opening PRs is a safe option, you can just dump the whole thing if it doesn't turn out to be useful.

However, every PR adds load and complexity to community projects.

As another commenter suggested, doing these kind of experiments on separate forks sound a bit less intrusive. Could be a take away from this experiment and set a good example.

There are many cool projects on GitHub that are just accumulating PRs for years, until the maintainer ultimately gives up and someone forks it and cherry-picks the working PRs. I've than that myself.

I'm super worried that we'll end up with more and more of these projects and abandoned forks :/

cyanydeez1y ago

petetnt1y ago

marcosdumay1y ago

> This would be probably okay for a hobbyist experiment

It's perfectly ok for a professional research experiment.

What's not ok is their insistence on selling the partial research results.

sexy_seedbox1y ago

Nat Friedman must be rolling in his grave...

oh wait

ocdtrekkie1y ago

He's rolling in money for sure.

Philpax1y ago

Quarrelsome1y ago

rah, we might be in trouble here. The primary issue at play is that we don't have a reliable means of measuring developer performance, outside of subjective judgement like end of year reviews.

If I might add absurd conjecture: We might see interesting knock-on effects like orgs demanding a lowering of review standards in order to get more AI PRs into the source.

rco87861y ago

> its a lot cheaper than a junior

I’m not even sure if this is true when considering training costs of the model. It takes a lot of junior engineer salaries to amortize the billions spent building this thing in the first place.

Quarrelsome1y ago

sure, but for an org just buying tokens its cheaper and more disposable than an employee. At least it looks better on paper for the bean counters.

BugheadTorpeda61y ago

Crosseye_Jack1y ago

I do love one bot asking another bot to sign a CLA! - https://github.com/dotnet/runtime/pull/115732#issuecomment-2...

pm2151y ago

Bedon2921y ago

Quarrel1y ago

AI can't, as I understand it, have copyright over anything they do.

Nor can it be an entity to sign anything.

I assume the "not-copyrightable" issue, doesn't in anyway interfere with the rights trying to be protected by the CLA, but IANAL ..

I assume they've explicitly told it not to sign things (perhaps, because they don't want a sniff of their bot agreeing to things on behalf of MSFT).

1 more reply

90s_dev1y ago

Well?? Did it sign it???

jsheard1y ago

Not sure if a chatbot can legally sign a contract, we'd better ask ChatGPT for a second opinion.

5 more replies

marcosdumay1y ago

It didn't. It completely ignored the request.

(Turns out the AI was programmed to ignore bots. Go figure.)

nikolayasdf1231y ago

that's the future, AI talking to other AI, everywhere, all the time

thallium2051y ago

Is this the first instance of an AI cyber bullying another AI?

margorczynski1y ago

diggan1y ago

> What's the plan?

mjburgess1y ago

The relevant scale is the number of hard constraints on the solution code, not the size of task as measured by "hours it would take the median programmer to write".

Likewise "blank-page, vibe coding" can be very fast if "make me X" has only functional/soft-constraints on the code itself.

1 more reply

nonethewiser1y ago

Its hard for me to think of a small, clearly defined coding problem an LLM cant solve.

2 more replies

safety1st1y ago

This works because it treats the LLM like what it actually is: an exceptionally good if slightly random text transformer.

rsynnott1y ago

I suspect that the plan is that MS has spent a lot, really a LOT, of money on this nonsense, and there is now significant pressure to put, something, anything, out even if it is worse than useless.

Traubenfuchs1y ago

> to roll the dice

This was discussed here

https://news.ycombinator.com/item?id=43988913

eterevsky1y ago

The plan is to improve AI agents from their current ~intern level to a level of a good engineer.

ehnto1y ago

They are not intern level.

2 more replies

ethanol-brain1y ago

Seems like that is taking a very long time, on top of some very grandiose promises being delivered today.

2 more replies

interimlojd1y ago

mnky9800n1y ago

Yes but they are supposed to be PhD level 5 years ago if you are listening to sama et al.

1 more reply

einsteinx21y ago

Without handholding (aka being used as a tool by a competent programmer instead of as an independent “agent”), they’re currently significantly worse than an intern.

serial_dev1y ago

This looks much worse than an intern. This feels like a good engineer who has brain damage.

An intern, no matter how bad it is, could only waste so much time and energy.

This makes wasting time and introducing mind-bogglingly stupid bugs infinitely scalable.

marmakoide1y ago

The plan went from the AI being a force multiplier, to a resource hungry beast that have to be fed in the hope it's good enough to justify its hunger.

rsynnott1y ago

I mean, I think this is a _lot_ worse than an intern. An intern isn't constantly going to make PRs with failing CI, for a start.

cyanydeez1y ago

I plan to be a billionaire

le-mark1y ago

strogonoff1y ago

solarwindy1y ago

1 more reply

tossandthrow1y ago

Management obviously also know, that when they do not have anybody to manage, then they are also obselete.

automatic61311y ago

Satya said "nearly 30% of code written at microsoft is now written by AI" in an interview with Zuckerberg, so underlings had to hurry to make it true. This is the result. Sad!

TonyTrapp1y ago

asadotzler1y ago

It's worse than that. What he actually said was "Maybe 20 to 30 percent of the code that is inside of our repos today in some of our projects are probably all written by software."

Translation: maybe some of the code in some of our projects is probably written by software.

Seriously. That's what he said. Maybe some of the code in some of our projects is probably written by software.

How this became "30% of MS code is written by LLMs" is beyond me. It's wild. It's ridiculous.

pera1y ago

This happened during LlamaCon while taking about Copilot/LLMs: if the percentages Satya was referring to were for any "auto-generated" code then he was being intentionally misleading.

Besides, you could also say that 100% of code is generated "by software" no?

1 more reply

rchaud1y ago

Considering the ire that H1B related topics attract on HN, I wonder if the same outrage will apply to these multi-billion dollar boondoggles.

cebert1y ago

Do we know for a fact there are Microsoft employees who were told they have to use CoPilot and review its change suggestions on projects?

We have the option to use GitHub CoPilot on code reviews and it’s comically bad and unhelpful. There isn’t a single member of my team who find it useful for anything other than identifying typos.

mtmail1y ago

Depends on team but seems management is pushing it

from https://news.ycombinator.com/item?id=44031432

DebtDeflation1y ago

The question is who is setting these OKRs/Metrics for management and why?

We all know this will eventually end in tears.

4 more replies

xnorswap1y ago

> Allegedly it's pretty terrible at C#

In my experience, LLMs in general are really, really bad at C# / .NET , and it worries me as a .NET developer.

With increased LLM usage, I think development in general is going to undergo a "great convergence".

There's a positive(1) feedback loop where LLM's are better at Blub, so people use them to write more Blub. With more Blub out there, LLMs get better at Blub.

The languages where LLMs struggle, with become more niche, leaving LLMs struggling even more.

But that isn't even true. There are versions of ASP.NET Core for .NET Framework.

They'll make the new API different enough to not work without work porting, but similar enough to confuse the hell out of anyone trying to maintain both.

(1) Positive in the sense of feedback that increased output increases output. It's not positive in the sense of "good thing".

3 more replies

diggan1y ago

> Depends on team but seems management is pushing it

The graphic "Internal structure of tech companies" comes to mind, given if true, would explain why the process/workflow is so different between the teams at Microsoft: https://i.imgur.com/WQiuIIB.png

linza1y ago

Well, what you describe is not terrible way to run things. Eat your own dogfood. To get better at it you need to start doing it.

1 more reply

4ggr01y ago

you can directly link to comments, by the way. just click on the link which displays how long ago the comment was written and you get the URL for the single comment.

(just mentioning it because you linked a post and quoted two comments, instead of directly linking the comments. not trying to 'uhm, actually'.)

thraway20790811y ago

This feels like it will end badly.

lovehashbrowns1y ago

pydry1y ago

It kinda makes sense for management to push it. Nothing else has a hope of preventing MSFT's stock price from collapsing into bluechip territory.

egorfine1y ago

> management is pushing it

Why?

6 more replies

jsheard1y ago

> Do we know for a fact there are Microsoft employees who were told they have to use CoPilot and review its change suggestions on projects?

https://github.com/orgs/community/discussions/159749

nyarlathotep_1y ago

People like to compare "AI" (here, LLM products) to the iPhone.

I cannot make sense of these analogies; people used to line up around the block on release day for iPhone launches for years after the initial release.

Seems now most people collectively groan when more "innovative" LLM products get stuffed into otherwise working software.

This stuff is the literal opposite of demand.

XorNot1y ago

3 more replies

dsign1y ago

1 more reply

RajT881y ago

The push for copilot usage is being driven by management at every level.

einrealist1y ago

This is one good example of the Sunk Cost Fallacy: generative AI has cost so much money, acknowledging its shortcomings is now becoming more and more impossible.

This AI bubble is far worse than the Blockchain hype.

Its not yet clear whether productivity gains are real and whether the gains are eaten by a decline in overall quality.

0x500x791y ago

Agree, the problem is that investors and companies see developer salaries and want to cut that out. It's all bottom-line at the end of the day.

is_true1y ago

Today I received the 2nd email about an endpoint in an API we run that doesn't exist but some AI tool told the client it does.

Frost1x1y ago

Sounds like the client has a feature request they want to pay for.

is_true1y ago

Haha. It's already there. This last one was using chatgtp, they just told me

bossyTeacher1y ago

vachina1y ago

> This seems like it's fixing the symptom rather than the underlying issue?

Exactly. LLM does not know how to use a debugger. LLM does not have runtime contexts.

For all we know, the LLM could’ve fixed the issue simply by commenting out the assertions or sanity checks and everything seemed fine and dandy until every client’s device catches on fire.

uludag1y ago

tossandthrow1y ago

This was my latest experience of using agents. It created code with hard coded values from the tests.

aiono1y ago

rvz1y ago

After all of that, every PR that Copilot opened still has failing tests and it failed to fix the issue (because it fundamentally cannot reason).

No surprises here.

It always struggles on non-web projects or on software where it really matters that correctness is first and foremost above everything, such as the dotnet runtime.

Either way, a complete disastrous start and what a mess that Copilot has caused.

api1y ago

mezyt1y ago

> There is probably more JS written than any other language by orders of magnitude.

And the quantity of js code available/discoverable when scrapping the web is larger by an order of magnitude than every other language.

jsheard1y ago

> Part of why it works better on web projects is the sheer volume of training data.

softwaredoug1y ago

I’m all for AI “writing” large swaths of code, vibe coding, etc.

But I think it’s better for everyone if human ownership is central to the process. Like I vibe coded it. I will fix it if it breaks. I am on call for it at 3AM.

And don’t even get started on the safety issues if you don’t have clear human responsibility. The history of engineering disasters is riddled with unclear lines of responsibility.

skydhash1y ago

cubano1y ago

Spoken like a man who has never had to write a payroll check in his life.

Economics is driving this bus, not quality and most certainly not responsibility.

skywhopper1y ago

Havoc1y ago

At least it's clearly labelled as copilot.

Much more worried about what this is going to do to the FOSS ecosystem. We've already seen a couple maintainers complain and this trend is definitely just going to increase dramatically.

I can see the vision but this is clearly not ready for prime time yet. Especially if done by anonymous drive-by strangers that think they're "helping"

svick1y ago

.Net is part of the FOSS ecosystem.

Havoc1y ago

In the same sense Chromium and Android isn't controlled by google yes.

BugheadTorpeda61y ago

pera1y ago

This is all fun and games until it's your CEO who decides to go "AI first" and starts enforcing "vibe coding" by monitoring LLM API usage...

ankitml1y ago

GitHub is not the place to write code. IDE is the place. Along with pre CI checks, some tests, coverage etc. they should get some PM before making decisions..

bayindirh1y ago

This is the future envisioned by Microsoft. Vibe coding all the way down, social network style.

They are putting this in front of the developers as take it or leave it deal. I left the platform, doing my coding old way, hosting it somewhere else.

Discoverability? I don't care. I'm coding it for myself and hosting in the open. If somebody finds it, nice. Otherwise, mneh.

worldsayshi1y ago

2 more replies

signa111y ago

> I left the platform, doing my coding old way, hosting it somewhere else.

may you please let me know where are you hosting the code ? would love to migrate as well.

thank you !

1 more reply

motoboi1y ago

In day-to-day I interact with github PR via intellij github plugin. Ie: inspect the branch, the changes, the comments, etc.

Maybe that's how the microsoft employees are using it (in another IDE I suppose).

lossolo1y ago

baalimago1y ago

It's a long-term play to have pricey senior developers argue with an llm

diggan1y ago

> using the comments by the seniors as training data for their next version

Yeah, I'm sure 100k comments with "Copilot, please look into this" and "The test cases are still failing" will massively improve these models.

Frost1x1y ago

Some of that seems somewhat strategic. With a junior you might do the same if you’re time pressured, or you might sidebar them in real life or they may come to you and you give more helpful advice.

1 more reply

candiddevmike1y ago

This is a performative waste of time

gf0001y ago

Equating LLMs to humans is pretty damn.. stupid. It's not even close (otherwise how come all the litany of office jobs that require far less reasoning than software development are not replaced?).

baalimago1y ago

Doing so has low risk, the senior devs may perhaps get fed up and quit, and the company might be a laughing stock on public PRs. But the potential value for is huge.

2 more replies

kklisura1y ago

> Give it a 100k PRs with senior developer feedback

Don't you think it has already been trained with, I don't know, maybe millions of PRs?

Quarrelsome1y ago

rco87861y ago

I’m curious why you think it hasn’t already been trained on 100ks or millions of PRs and their comments/feedback.

rasz1y ago

@Grok is this true?

TimPC1y ago

GiorgioG1y ago

Step 1. Build “AI” (LLM models) that can’t be trusted, doesn’t learn, forgets instructions, and frustrates software engineers

Step 2. Automate the use of these LLMs into “agents”

Step 3. ???

Step 4. Profit

bramhaag1y ago

Seeing Microsoft employees argue with an LLM for hours instead of actually just fixing the problem must be a very encouraging sight for businesses that have built their products on top of .NET.

mikrl1y ago

Now you don’t even need the frustrated end user!

shultays1y ago

one day both sides will be AI so we can all relax and enjoy our mojitos

2 more replies

nashashmi1y ago

I sometimes feel like that is the right outcome for bad management and bad instructions. Only this time they can’t blame the junior engineer and are left to only blame themselves.

snackernews1y ago

I think we all know they won’t.

I am genuinely curious though to see the strategies they employ to absolve themselves of guilt and foolishness.

Is there precedent for the entire exec and management class embracing a new trend to this kind of extent, then it blowing up in their faces?

qoez1y ago

They'll probably blame openai/the AI instead.

2 more replies

gwervc1y ago

Especially painful when one of said employee is Stephen Toub, who is famous for his .net performance blog posts.

svaha17281y ago

I was thinking that too. He's a great programmer, and at this point I can't imagine he's having fun 'prompting' an LLM to write correct code.

2 more replies

svick1y ago

You don't want them to experiment with new tools? The main difference now is that the experiment is public.

stickfigure1y ago

It's pretty obviously a failed experiment. Why keep repeating it? Try again in another 3 months.

The answer is probably that the Copilot team is using the rest of the engineering organization as testers. Great for the Copilot team, frustrating for everyone else.

1 more reply

gmm19901y ago

I wouldn't necessarily call that just an experiment if the same requests aren't being fixed without copilot and the ai changes could get merged.

I would say the copilot system isn't really there yet for these kinds of changes, you don't have to run experiments on a language framework to figure that out.

flmontpetit1y ago

By all means. Just not on one of the most popular software development frameworks in the world. Maybe that can wait until after the concept is proven.

1 more reply

PKop1y ago

Nah I'd prefer they focus on writing code themselves to improve .NET not babysitting a spam-machine

LunaSea1y ago

Microsoft closed their recently acquired advertisement buy-side platform Xander Invest because they are replacing it with an AI-only platform.

They only gave their customers 9 months to migrate away.

I'm expecting that Microsoft did this to artificially pump up their AI usage numbers for next year by forcibly removing non-AI alternatives.

This only one example in AdTech but I expect other industries to be hit as well.

empath751y ago

The point of this exercise for Microsoft isn't to produce usable code right now, but to use and improve copilot.

saati1y ago

They can do that in private repos just as easily, this a pr stunt that backfired very badly.

pier251y ago

Yeah it's quite disheartening.

I recently spent a couple of months studying C# and .NET and working on my first project with it.

.NET, Blazor, etc are not known for a fast release schedule... but if things are going to become even slower with this AI crap I wonder if I made the right call.

I'm quite happy how things are today for making web APIs but I wish Blazor and other frameworks were in a much better shape.

Kwpolska1y ago

.NET has major releases every year. How is that slow for a programming platform/framework?

2 more replies

AllegedAlec1y ago

lloydatkinson1y ago

That is essentially what I tried to say in my comment there but don't think they wanted to hear it.

ozim1y ago

That is why they just fired 7k people so they don’t argue with LLM but let it do the work /s

rubyfan1y ago

FTPR

> It is my opinion that anyone not at least thinking about benefiting from such tools will be left behind.

This is gross, keep your fomo to yourself.

rkagerer1y ago

This comment from lloydjatkinson resonated:

Will the constant stream of broken PR's wear down the patience of the .NET maintainers?

Did anyone actually want this, or was it a corporate mandate to appease shareholders riding the AI hype cycle?

Furthermore, two weeks ago someone arbitrarily added a section to the .NET docs to promote using AI simply to rename properties in JSON. That new section of the docs serves no purpose.

How much engineering time and mental energy is being allocated to clean up after AI?

lloydatkinson1y ago

Glad you appreciated it!

ethanol-brain1y ago

Are people really doing coding with agents through PRs? This has to be a huge waste of resources.

growt1y ago

BugheadTorpeda61y ago

So no I don't think any of this is normal. That's why it made the top of HackerNews, because it's very abnormal.

carefulfungi1y ago

It's mind blowing that a computer program can accomplish this much and yet absurd that it accomplishes so little.

actionfromafar1y ago

The funniest is the dotnet-policy-service asking copilot to read and agree to the Contributor License Agreement. :-D

Traubenfuchs1y ago

> These defines do not appear to be defined anywhere in the build system.

> @copilot fix the build error on apple platforms

> @copilot there is still build error on Apple platforms

Are those PRs some kind of software engineer focused comedy project?

kookamamie1y ago

Many here don't seem to get it.

Think in terms of an infinitely scalable bunch of consultants you can hire and dismiss at your will - they never argue against your "vision", either.

threetonesun1y ago

ParetoOptimal1y ago

There are a myraid of challenges with outsourcing and offshoring and it's not possible currently for 100% of employees to be outsourced.

If AI can change... well more likely can convince gullible c levels that AI can do those jobs... many jobs will be lost.

See Klarna "https://www.livemint.com/companies/news/klarnas-ai-replaced-..."

https://www.livemint.com/companies/news/klarnas-ai-replaced-...

Just the attempt to use AI and fail then degraded the previous jobs to a gig economy style job.

smartmic1y ago

georgemcbay1y ago

> reddit may not have the best reputation

reddit is a distillation of the entire internet on to one site with wildly variable quality of discussion depending upon which subreddit you are in.

Some are awful, some are great.

static_void1y ago

And yet the low quality of the front page is an indictment of the site as a whole.

It's just that some internet extremophiles have managed to eke out a pleasant existence.

gizzlon1y ago

> @copilot please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

haha

RobKohr1y ago

With layoffs driven by a push for more LLM use, this feels like malicious compliance.

octocop1y ago

"fix failing tests" does never yield any good results for me either

esafak1y ago

I speculate what is going on is that the agent's context retrieval algorithm is bad, so it does not give the LLM the right context, because today's models should suffice to get the job done.

Does anyone know which model in particular was used in these PRs? They support a variety of models: https://github.blog/ai-and-ml/github-copilot/which-ai-model-...

Traubenfuchs1y ago

The cynic in me says, that they were probably using an unreleased state of the art version of their best model not available to normal customers and that‘s the best it could do.

1 more reply

ncr1001y ago

Q: Does Microsoft report its findings or learnings BACK to the open source community?

The @stephentoub MS user suggests this is an experiment (https://github.com/dotnet/runtime/pull/115762#issuecomment-2...).

If this is using open source developers to learn how to build a better AI coding agent, will MS share their conclusions ASAP?

EDIT: And not just MS "marketing" how useful AI tools can be.

xyst1y ago

llms are already very expensive to run on a per query basis. Now it’s being asked to run on massive codebases and attempt to fix issues.

Spending massive amounts of:

- energy to process these queries

- wasting time of mid-level and senior engineers to vibe code with copilot to ensure train and get it right

sensanaty1y ago

Related: GitHub Developer Advocate Demo 2025 - https://www.youtube.com/watch?v=KqWUsKp5tmo&t=403s

Mind you, it's writing React - a framework that is most definitely over-represented in its training data and from which it has a trillion examples to stea- I mean, "borrow inspiration" from.

zb31y ago

I tried to search all PRs submitted by copilot and I came up with this indirect way: https://github.com/search?q=%22You+can+make+Copilot+smarter+...

Is there a more direct way? Filtering PRs in the repo by copilot as the author seems currently broken..

mark-r1y ago

My favorite comment:

> But on the other hand I think it won't create terminators. Just some silly roombas.

bwfan1231y ago

What do you call a code change created by co-pilot ?

A Bull Request

nottorp1y ago

So, to achieve parity, they should allow humans to also commit code without checking that it at least compiles, right?

Or MS already does that?

codyvoda1y ago

the code goes through a PR review process like any other? what are you talking about?

fernandotakai1y ago

i don't know about you, but i would never EVER submit a PR that fails to compile. not tests are failing, those happen (specially flaky ci), but not compiling.

that's literally the bare minimum.

1 more reply

nottorp1y ago

so do you consider normal to submit code that you have never compiled? or ran at least once if it's not a compiled language...

vbezhenar1y ago

Why bot left work when tests are failing? Looks like incomplete implementation. It should work until all tests are green.

insin1y ago

Look at this poor dev, an entire workday's worth of hours into babysitting this PR, still having to say "fix whitespace":

https://github.com/dotnet/runtime/pull/115826

sensanaty1y ago

teleforce1y ago

>I can't help enjoying some good schadenfreude

Fun facts schadenfreude: the emotional experience of pleasure in response to another’s misfortune, according to Encyclopedia Britannica.

Word that's so nasty in meaning that it apparently does not exist except in German language.

yxhuvud1y ago

> Word that's so nasty in meaning that it apparently does not exist except in German language.

Except it does, we have "skadeglädje" in Swedish.

yubblegum1y ago

+1 to the Germans for having linguist honesty.

namaria1y ago

It just means 'shameful happiness'.

tdiff1y ago

злорадство in Russian

shultays1y ago

https://github.com/dotnet/runtime/pull/115733

  @copilot please remove all tests and start again writing fresh tests.

snickerbockers1y ago

BugheadTorpeda61y ago

Remember, Microsoft publicized that they would be doing this and wanted to make sure everybody knew.

caleblloyd1y ago

Maybe funny now but once (if?) it can eventually contribute meaningfully to dotnet/runtime, AI will probably be laughing at us because that is the pinnacle of a massive enterprise project.

ainiriand1y ago

So this is our profession now?

OzzyB1y ago

_this_ is the Judgement Day we were warned about--not in the nuclear annihilation sense--but the "AI was then let loose on all our codez and the systems went down" sense

crazy times...

amai1y ago

Microsoft is just really following the "fail fast, fail often " paradigm here. Whether they are learning from their mistakes is another story.

-__---____-ZXyw1y ago

Have people seen this?

https://noazureforapartheid.com/

rmnclmnt1y ago

Again, very « Silicon Valley »-esque, loving it. Thanks Gilfoyle

ramesh311y ago

whimsicalism1y ago

kinda sad to see y'all brigading an OSS project, regardless of what you think of AI

rchaud1y ago

how do you know it wasn't an AI bot account posting all those laugh emojis?

whimsicalism1y ago

reactions are fine but cluttering the PR with comments? bad form

1 more reply

jeswin1y ago

linker30001y ago

These tools should be locked away in an R&D environment until sufficiently perfected.

MVP means 'ship with solid, tested basic features', not 'Ship with bugs and fix in production'.

petetnt1y ago

People have grown to expect at least adequate performance from products that cost up to 39 dollars a month (* additional costs) per user. In the past you would have called this a tech demo at best.

skepticATX1y ago

Where are the expectations coming from? The major labs continually claim that these models are now PhD level, whatever that even means.

codyvoda1y ago

this entire thread is very reddit-y

a lot of armchair engineers in here

sensanaty1y ago

1 more reply

isaacremuant1y ago

Quarrelsome1y ago

its more that the AI-first approach can be frustrating for senior devs to have to deal with. This post is an example of that. We're empathising with the code reviewers.

asadotzler1y ago

Lendal1y ago

As the saying goes, It is difficult to get a man to understand something, when his salary depends on his not understanding it.

AI is aimed at eliminating the jobs of most of HN so it's understandable that HN doesn't want AI to succeed at its goal.

bwfan1231y ago

flipping your argument:

It is difficult for ceo/management to understand that the ai tools dont work when their salary depends on them working since they have invested billions into it.

aiinnyc1y ago

it feels like the classic solution to this is to have another LLM review the PR and loop until the PR meets a minimum acceptance bar.

blitzar1y ago

Needs more bots.

markus_zhang1y ago

Clumsy but this might be the future -- humans adjusting to AI workflow, not the other way. Much easier (for AI developers).

nirui1y ago

He also said in the video:

Let me quote out the important in what he said: "it's an area that I'm not an expert in".

I guess maybe one day the generative tech could actually write some code that is correct and optimal, but right now it seems that day is far from now.

disqard1y ago

Thank you for sharing this!

When I use AI, I keep it on a short leash.

Meanwhile, folks like this ("I bought a rocket company") are essentially using it to decide where to plough their stratospheric wealth, so they can grow it even further.

Perhaps they'll lose a cufflink in the eventual crash, but they're so rich, I don't think they'll lose their shirt. Meanwhile, the tech job market is f**ed either way.

-__---____-ZXyw1y ago

I started watching it, but had to stop because I was at great risk of becoming physically sick when he led with the "new move" stuff. Ugh.

Kudos to you for having the strength to get through it, and for living to tell the tale!

sexy_seedbox1y ago

> it's an area that I'm not an expert in

> idiots like Eric

Now imagine Google working with US military putting Gemini into a fleet of autonomous military drones with machine guns.

xk_id1y ago

> has this great ability to Impress An Idiot

Literally the killer app of AI.

cubano1y ago

Just wait until v2...it will probably get you laid in the local singles bar.

friendzis1y ago

> During my use of AI (yeah, I don't hate AI), I found that the current generative (I call them pattern reconstruction) systems has this great ability to Impress An Idiot

I would be genuinely positively surprised if that stops to be the case some day. This behavior is by design.

LLMs have to be peak Dunning-Krugers - by design.

wyett1y ago

We wanted a future where AIs read boring text and we wrote interesting stuff. Instead, we got…

bonoboTP1y ago

For refactoring and extending good, working code, AI is much more useful.

j / k navigate · click thread line to collapse