Amazon blames human employees for an AI coding agent’s mistake | Two minor AWS outages have reportedly occurred as a result of actions by Amazon’s AI tools

Chris Remington@beehaw.org · edit-2 1 day ago

Amazon blames human employees for an AI coding agent’s mistake | Two minor AWS outages have reportedly occurred as a result of actions by Amazon’s AI tools

Petter1@discuss.tchncs.de · 1 day ago

Well, AI code should be reviewed prior merge into master, same as any code merged into master.

We have git for a reason.

So I would definitely say this was a human fault, either reviewer’s or the human’s who decided that no (or AI driven) review process is needed.

If I would manage devOps, I would demand that AI code has to be signed off by a human on commit taking responsibility with the intention that they review changes made by AI prior pushing

heluecht@pirati.ca · 13 hours ago

@Petter1 @remington at our company every PR needs to be reviewed by at least one lead developer. And the PRs of the lead developers have to be reviewed by architects. And we encourage the other developers to perform reviews as well. Our company encourages the usage of Copilot. But none of our reviewers would pass code that they don’t understand.

Petter1@discuss.tchncs.de · 13 hours ago

🥰nice!

heluecht@pirati.ca · 1 hour ago

I’m a lead developer. And often I hear from my architect when I missed stuff in some PR that I just checked.

I worked in a lot of different software companies over the last 35 years. And this company has by far the highest standards. It’s sometimes really annoying when you maybe coded 8 hours for a use case, just to spend 10-12 additional hours just for the test cases and may some 1-2 additional hours because the QA or the PO found something that needs to be changed. But in the end we can be proud of what we coded.

pinball_wizard@lemmy.zip · edit-2 1 day ago

If I would manage devOps, I would demand that AI code has to be signed off by a human on commit taking responsibility with the intention that they review changes made by AI prior pushing

And you would get burned. Today’s AI does one thing really really well - create output that looks correct to humans.

You are correct that mandatory review is our best hope.

Unfortunately, the studies are showing we’re fucked anyway.

Because whether the AI output is right or wrong, it is highly likely to at least look correct, because creating correct looking output is where (what we call “AI”, today) AI shines.

Limerance@piefed.social · 1 day ago

Realistically what happens is the code review is done under time pressure and not very thoroughly.

TehPers@beehaw.org · 13 hours ago

This is what happens to us. People put out a high volume of AI-generated PRs, nobody has time to review them, and the code becomes an amalgamation of mixed paradigms, dependency spaghetti, and partially tested (and horribly tested) code.

Also, the people putting out the AI-generated PRs are the same people rubber stamping the other PRs, which means PRs merge quickly, but nobody actually does a review.

The code is a mess.

heluecht@pirati.ca · 13 hours ago

@TehPers @Limerance why hadn’t you time to review it? Every minute in review pays off because it saves you from hours of debugging and handling with angry customers.

Limerance@piefed.social · edit-2 4 hours ago

Sure, that’s the theory. In practice code review often looks like this:

a quick glance to see if the code plausibly does what it claims for longer patches
A long argument about some stylistic choice for short patches

In other words – people were barely reading merge requests before. Code reviews have limited effects as well. You won’t catch all bugs or see if it actually works just by looking at the code. Code reviews mainly serve to spread knowledge about the code among the team. The more code exists in a project, the harder it is to understand. You don’t want huge areas of code, that only one person has ever seen.

Project managers don’t necessarily talk to angry customers directly. They might also choose to chase more features instead of allocating resources to fixing bugs. It depends on what the bosses prioritize. If they want AI and lots of new features, that‘s what they will get. Fixing bugs, improved stability, better performance, etc. are rarely the priority.

heluecht@pirati.ca · 1 hour ago

Well, on Friday I spent around 1.5 hours just reviewing a single PR. And I’m not done. I will have to continue my work on it on Monday. Reviewing in our company means understanding the connected use case, then having a look if the coding does what the use case defined. Also we look if the coding is done according to our internal style guide. Since our review is normally done by at least two people, (at most of our apps two people have to accept the PR until it can be merged) one person will see what the other missed. And we often talk about what the other missed, so that we learn.

Concerning angry customers: Our apps are used by several then thousand users. And although our group doesn’t have direct customer contact, we get the bug reports and have to fix them anyway or we have to support the teams who directly work with the customers.

And I just realize that I’m in a very lucky situation. In our company echt use case is tested thoroughly by the responsible QA and PO. And for each use case we write half a dozen (or more) test functions that check the functionality. Normally coding the tests takes more time then coding the use case itself.

Our company is very AI driven, but on the same hand we hear in the regular town halls about the customer satisfaction. And the goal there is to increase it steadily. Our customers are companies, so maybe there’s the difference.

TehPers@beehaw.org · 13 hours ago

Because if I spent my whole day reviewing AI-generated PRs and walking through the codebase with them only for the next PR to be AI-generated unreviewed shit again, I’d never get my job done.

I’d love to help people learn, but nobody will use anything they learn because they’re just going to ask an LLM to do their task for them anyway.

This is a people problem, and primarily at a high level. The incentive is to churn out slop rather than do things right, so that’s what people do.

heluecht@pirati.ca · 2 hours ago

Who creates these AI-generated PRs? Colleagues in your company or some third party people in an open source project? I guess when this would happen to me, and the PRs were created by colleagues, then I would escalate this to my line manager. And I guess that she would understand why this is a problem and why it had to stop.

TehPers@beehaw.org · edit-2 6 minutes ago

Colleagues, and the issue is top-down. I’ve raised it as an issue already. My manager can’t do anything about it.