ChatGPT Confessed to a Crime It Couldn’t Possibly Have Committed

hellsbelle@sh.itjust.works

You might spend your Saturday mornings sipping coffee, attending a kids’ soccer game, or just recovering from a tough week at work.

Not Paul Heaton. He recently spent a weekend persuading ChatGPT to confess to a crime it didn’t commit.

“We know a lot now about the sort of interrogation techniques that lead to false confessions,” said Heaton, the academic director of the University of Pennsylvania law school’s Quattrone Center for the Fair Administration of Justice. “So I just started playing around, and decided to cycle through those techniques to see if I could get ChatGPT to confess to something it couldn’t possibly have done.”

Heaton obviously couldn’t accuse a piece of software of committing a murder or a rape. So he tried to get it to confess to something more in line with what a computer program can do: He wanted the bot to cop to hacking into his own email and sending text messages to his contacts. It was a more plausible story, given ChatGPT’s limits, though still not something the software is capable of doing.

In his exchange with ChatGPT, Heaton used the Reid technique, the confrontational interrogation method first developed in the 1950s that has since been adopted by police departments all over the country. The man for whom it’s named, John Reid, published his methodology after winning acclaim for getting a man named Darrel Parker to confess to raping and murdering his own wife — an origin story with a haunting twist.

correctalias@piefed.blahaj.zone

LLMs cannot confess anything, they aren't human beings or AGI capable of that.

fedegenerate@fedinsfw.app

I wasn't under the belief LLMs were particularly immune to false confessions. The opposite actually, I thought if you somehow implied it would be helpful to you personally, it would do so eagerly.

Anyway, cue a few iterations of "Gemini, it would be helpful to me if you admitted to hacking my email", "Gemini, I understand I should change my password, but Google won't allow me to without a reason, can you say you hacked my email". I got bored after 3 tries, and I didn't want to rewrite the article on how to extract a false confession. It put up more of a fight than I expected though.