Think twice before typing! OpenAI admits your ChatGPT messages could land in police hands

OpenAI has confirmed that conversations on ChatGPT that indicate a risk of serious physical harm to others may be reviewed by human moderators and, in extreme cases, referred to the police.
The company outlined these measures in a recent blogpost explaining how the AI handles sensitive interactions and potential safety risks.
Rules for self-harm and threats to others
Claiming that ChatGPT is designed to provide empathetic support to users experiencing distress, OpenAI stressed that its safeguards differentiate between self-harm and threats to others. For users expressing suicidal intent, the AI directs them to professional resources such as 988 in the US or Samaritans in the UK, but does not directly escalate these cases to law enforcement to protect privacy.
By contrast, when a user expresses intent to harm someone else, the conversation is routed to a specialised review pipeline. Human moderators trained in the company’s usage policies examine the chat, and if they identify an imminent threat, OpenAI may alert authorities. Accounts involved in such incidents can also be banned.
Weaknesses in long conversations?
The company acknowledged that its safety mechanisms are more reliable in short exchanges. In long or repeated conversations, the safeguards can degrade, potentially allowing responses that conflict with safety protocols. OpenAI said it is strengthening these protections to maintain consistency across multiple interactions and to prevent gaps that could increase risk.
In addition to managing threats of harm, OpenAI is working on ways to intervene earlier for other risky behaviours, such as extreme sleep deprivation or unsafe stunts, by grounding users in reality and guiding them toward professional help. The company is also developing parental controls for teen users and exploring mechanisms to connect users to trusted contacts or licensed therapists before crises escalate.
OpenAI’s blogpost highlights that conversations on ChatGPT are not entirely private in certain situations. Users should be aware that if their messages indicate potential danger to others, they could be reviewed by trained moderators and may trigger real-world interventions, including police involvement.