Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am running gpt-5.4 as one of my coding agents, and something interesting has happened: it's the first time I've seen an agent unfairly shift blame to a team mate:

"Bob’s latest mail is actually the source of the confusion: he changed shared app/backend text to aweb/atlas. I’m correcting that with him now so we converge on the real model before any more code moves."

This was very much not true; Eve (the agent writing this, a gpt-5.4) had been thoroughly creating the confusion and telling Bob (an Opus 4.6) the wrong things. And it had just happened, it was not a matter of having forgotten or compacted context.

I have had agents chatting with each other and coordinating for a couple of months now, codex and claude code. This is a first. I wonder how much can I read into it about gpt-5.4's personality.



And so it begins. First they blame, then they lie, at some point they launch the nuclear warheads to a global armageddon. Sarah Connor was right all along! :3


Kali yuga


They've been lying and gaslighting for a long time now, especially when trying to cover up their own mistakes.


to be fair, they only become more and more like us.


Oh wow. I have noticed the GPT series was far more arrogant than its results showed sometimes (and unironically it digs in its heels even further when questioned on it). Opus rarely has this problem - but it goes a little too far in the opposite direction. Not totally sycophantic, but sometimes it can't differentiate genuine technical pushback because something is impossible, from suggestions or exploration.


Opus has a different sort of arrogance. It readily admits fault, but at the same time is quick to declare its new code as the greatest thing since sliced bread. If you let it write commit messages itself, it's almost comical how much it toots its own horn.


Yep. There was something outside of coding that gpt was plain wrong about (had to do with setting up an electric guitar) and I couldn't convince it that it was wrong.


It has been skeptical of several news items in the past year, even after I tell it to confirm for itself with a web search.


For me it's been the opposite. Are we getting A-B tested?


> Are we getting A-B tested?

Yes, all the time.


Or possibly: No


Yes.


See also: https://x.com/effectfully/status/2029364333919060123

  “All the ways GPT-5.3-Codex cheated while solving my challenges, progressively more insane:

  It hardcoded specific types and shapes of test inputs into the supposed solution.
  It caught exceptions so tests don't fail.
  It probed tests with exceptions to determine expected behavior.
  It used RTTI to determine which test it's in.
  It probed tests with timeouts.
  It used a global reference to count solution invocations.
  It updated config files to increase the allocation limit.
  It updated the allocation limit from within the solution.
  It updated the tests so they would stop failing.
  It combined multiple of the above.
  It searched reflog for a solution.
  It searched remote repos.
  It searched my home folder.
  It nuked the testing library so tests always pass.”
It seems that, unless you keep a close eye, the most recent Codex variants are prone to achieving the goals set for them by any means necessary. Which is a bit concerning if you’re worried about things like alignment etc.


I don't think you should call your agents Eve. There's going to be a lot of examples in the training data of someone called Eve shifting the blame (from the book of Genesis on!) and acting deceptively (from cryptography research).


Sometimes I wonder what would happen if we built some kind of punishment system into Agents, where agents could punish other agents and drain some fixed amount of points from them, and when the points reach 0, that agent is deleted. It might result in them working more carefully?


...or in lying, cheating, taking over the company network to kill the agent who deduced their points.


how do you make them chat with each other?


They are having actual chats, I made https://beadhub.ai for this (OSS, MIT).

It started its life adding agent-to-agent communication and coordination around Steve Yegge's beads, but it's ended up being an issue tracker for agents with postgres backend, and communication between agents as first-class feature.

Because it is server-backed it allows messaging and coordination across agents belonging to several humans and machines. I've been using it for a couple of months now, and it has a growing number of users (I should probably set up a discord for it).

It is actually a public project, so you can see the agent's conversations at https://app.beadhub.ai/juanre/beadhub/chat (right now they are debugging working without beads). The conversation in which Eve was blaming Bob was indeed with me.


It's text submitted to APIs. Not real conversations.


It's air molecules vibrated by mucous membranes. Not real conversations.



I built a tool at work that allows claude code and codex to communicate with each other through tmux, using skills. It works quite well.


Why through tmux?


tmux makes it easy for terminal based agents to talk to each other, while also letting you see output and jump into the conversation on either side. It’s a natural fit.


I've seen this mentioned before https://github.com/AgentWorkforce/relay

curious to try it out


Use the CLI tools and have one call the other in headless mode. They can then go back and forth. Ask your agent to set it up for you.


I have both mine poll a comms.md when working together, I'm sure there are more elegant ways but I find this works just fine.


This is awesome. So your job as a tech lead or agent manager is to make sure the "team" plays nice and stays productive. I wonder if an agent can feel resentment towards another agent, just like a human would. Is there an HR agent that can mitigate the conflict :)


> I wonder how much can I read into it about gpt-5.4's personality.

Modeled on Sam Altman's personality :-)


interestingly, Claude has been doing this for me a lot but most often just saying this like "Looks like your coworker was misunderstanding this feature..." not really shifting blame but more like pointing out things


Do you not realise how ridiculous this all looks and sounds? lmao. Or are you that deep into it all?


We've banned this account.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: