Scientists Devised Way To Tell If ChatGPT Becomes Aware Of Itself

NEW YORK – Our lives were already infused with artificial intelligence (AI) when ChatGPT reverberated around the online world late last year. Since then, the generative AI system developed by tech company OpenAI has gathered speed and experts have escalated their warnings about the risks.

Meanwhile, chatbots started going off-script and talking back, duping other bots, and acting strangely, sparking fresh concerns about how close some AI tools are getting to human-like intelligence.

For this, the Turing Test has long been the fallible standard set to determine whether machines exhibit intelligent behavior that passes as human. But in this latest wave of AI creations, it feels like we need something more to gauge their iterative capabilities.

Here, an international team of computer scientists – including one member of OpenAI’s Governance unit – has been testing the point at which large language models (LLMs) like ChatGPT might develop abilities that suggest they could become aware of themselves and their circumstances.

We’re told that today’s LLMs including ChatGPT are tested for safety, incorporating human feedback to improve its generative behavior. Recently, however, security researchers made quick work of jailbreaking new LLMs to bypass their safety systems. Cue phishing emails and statements supporting violence.

More at Science Alert