Topic archive

Safety AI news

Security, reliability, governance, and operational trust. This page collects the latest briefings that match the topic so readers can follow one area without scanning the full feed.

Indexed briefings

Latest source-linked updates, ordered newest first.

Latest

Recent briefings

Anthropic|Jul 1, 2026|1 min read

Redeploying Fable 5

Anthropic is redeploying Claude Fable 5 starting July 1 following the lifting of export controls, with updated cybersecurity safeguards and a new industry jailbreak frame...

Announcements Security

OpenAI|Jun 26, 2026|1 min read

Previewing GPT-5.6 Sol: a next-generation model

OpenAI previews GPT-5.6 Sol, a next-generation model with stronger capabilities in coding, science, and cybersecurity, paired with its most advanced safety stack.

Product Security Safety

arXiv|Jun 25, 2026|1 min read

Privacy Vulnerabilities of Attention Layers in Tabular Foundation Models and Protection of High-Risk Queries

Focuses on Privacy Vulnerabilities of Attention Layers in Tabular Foundation Models and Protection of High-Risk Queries.

AI Healthcare Transformers

Anthropic|Jun 23, 2026|1 min read

Introducing Claude Tag

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Product Safety

OpenAI|Jun 23, 2026|1 min read

Helping build shared standards for advanced AI

OpenAI helps build shared standards for advanced AI, supporting evaluation frameworks, safety practices, and global cooperation through the Appia Foundation.

Global Affairs Safety

OpenAI|Jun 22, 2026|1 min read

Patch the Planet: a Daybreak initiative to support open source maintainers

OpenAI introduces Patch the Planet, a Daybreak initiative helping open-source maintainers find, validate, and fix vulnerabilities with AI and expert review.

Security

OpenAI|Jun 22, 2026|1 min read

Daybreak: Tools for securing every organization in the world

OpenAI introduces new Daybreak tools, including Codex Security and GPT-5.5-Cyber, to help organizations find, validate, and patch vulnerabilities at scale.

Security Codex GPT-5.5

Anthropic|Jun 16, 2026|1 min read

Core views on AI safety: When, why, what, and how

AI progress may lead to transformative AI systems in the next decade, but we do not yet understand how to make such systems safe and aligned with human values.

Announcements Safety

arXiv|Jun 10, 2026|1 min read

Fault Lines: Navigating Ethics and Responsible AI Where National Policy Meets Local Practice in Public Sector Transformation

Focuses on Fault Lines: Navigating Ethics and Responsible AI Where National Policy Meets Local Practice in Public Sector Transformation.

Civic Tech AI Healthcare

arXiv|Jun 10, 2026|1 min read

Democracy in the Era of Artificial Intelligence

Focuses on Democracy in the Era of Artificial Intelligence.

Civic Tech AI Healthcare

Anthropic|Jun 9, 2026|1 min read

Policy on the AI Exponential

Anthropic published an Advanced AI Framework and an Economic Policy Framework, arguing that AI progress is moving faster than current policymaking institutions.

Policy Safety Governance

Anthropic|Jun 8, 2026|1 min read

Claude Fable 5 and Claude Mythos 5

Anthropic launched Claude Fable 5 for general use and Claude Mythos 5 for a smaller trusted-access group.

Announcements Security Infrastructure

OpenAI|Jun 3, 2026|1 min read

Biodefense in the Intelligence Age

OpenAI published a biodefense action plan focused on using advanced AI to strengthen biological resilience.

Global Affairs Biodefense Biosecurity

Anthropic|Jun 3, 2026|1 min read

What we learned mapping a year’s worth of AI-enabled cyber threats

As AI transforms the nature of and methods behind cyberattacks, how well do the techniques and frameworks used by the security community hold up?

Policy Security

OpenAI|Jun 3, 2026|1 min read

A blueprint for democratic governance of frontier AI

OpenAI outlines a blueprint for U.S.

Global Affairs Security Safety

OpenAI|Jun 3, 2026|1 min read

OpenAI public policy agenda

OpenAI outlines its public policy agenda for AI, including safety, youth protection, workforce transition, and global standards to ensure AI benefits society.

Global Affairs Safety

OpenAI|Jun 1, 2026|1 min read

Advancing youth safety and opportunity through global leadership

OpenAI called for global action on youth AI safety through a dedicated AI Safety Institute and laid out principles for age-appropriate protections.

Global Affairs Youth Safety Safety

Anthropic|Jun 1, 2026|1 min read

Expanding Project Glasswing

Anthropic is expanding Project Glasswing from roughly 50 initial partners to about 150 organizations after several weeks of collaboration with partners, open-source maint...

Announcements Security Cybersecurity

OpenAI|Jun 1, 2026|1 min read

Our views on AI policy and political advocacy

Our approach to AI policy and political advocacy, transparency, support for thoughtful regulation and AI safety, and that no outside political group speaks on the company...

Global Affairs Safety

OpenAI|May 28, 2026|1 min read

A shared playbook for trustworthy third party evaluations

OpenAI says frontier-model evaluations need explicit details on harnesses, tools, budgets, and scoring rules to be interpretable.

Safety Evaluations Governance

OpenAI|May 28, 2026|1 min read

Strengthening societal resilience with Rosalind Biodefense

OpenAI launched Rosalind Biodefense to help trusted developers build biodefense and pandemic-preparedness tools with GPT-Rosalind.

Biodefense Public Health Safety

arXiv|May 28, 2026|1 min read

Robust and Generalizable Safety Steering for Text-to-Image Diffusion Transformers

Focuses on Robust and Generalizable Safety Steering for Text-to-Image Diffusion Transformers.

AI Healthcare Diffusion

OpenAI|May 27, 2026|1 min read

OpenAI’s Frontier Governance Framework

OpenAI published a Frontier Governance Framework that explains how its safety and security practices align with emerging legal requirements.

Safety Governance Policy

OpenAI|May 26, 2026|1 min read

Election information and safeguards in 2026

OpenAI says it is expanding election-year safeguards in 2026 to surface reliable voting information, support cyber defenders, and increase transparency around AI-generate...

Global Affairs Elections Security

arXiv|May 26, 2026|1 min read

Prompt Injection Detection is Regime-Dependent: A Deployment-Aware Evaluation with Interpretable Structural Signals

Evaluates prompt injection detection across multiple deployment regimes instead of a single benchmark setting.

NLP Security Reliability

Anthropic|May 21, 2026|1 min read

Project Glasswing: An initial update

Anthropic shared an initial update on Project Glasswing, its effort to secure critical software with frontier AI and a cross-industry group of launch partners.

Announcements Security Infrastructure

Anthropic|May 19, 2026|1 min read

Widening the conversation on frontier AI

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Announcements Safety

OpenAI|May 18, 2026|1 min read

Advancing content provenance for a safer, more transparent AI ecosystem

OpenAI said it is expanding provenance signals for AI-generated media through C2PA-compatible Content Credentials, Google SynthID watermarking for images, and an early pu...

Safety Transparency Verification

arXiv|May 17, 2026|1 min read

MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs

Focuses on MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs.

NLP Healthcare Transformers

Anthropic|May 15, 2026|1 min read

PwC is deploying Claude to build technology, execute deals, and reinvent enterprise functions for clients

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Announcements Safety Enterprise

arXiv|May 15, 2026|1 min read

MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs

Focuses on MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs.

NLP Healthcare Transformers

Anthropic|May 14, 2026|1 min read

Anthropic forms $200 million partnership with the Gates Foundation

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Announcements Safety Partnerships

OpenAI|May 13, 2026|1 min read

Helping ChatGPT better recognize context in sensitive conversations

Learn how new ChatGPT safety updates improve context awareness in sensitive conversations, helping detect risk over time and respond more safely.

Safety

OpenAI|May 12, 2026|1 min read

Our response to the TanStack npm supply chain attack

OpenAI details its response to the TanStack “Mini Shai-Hulud” supply chain attack, outlines protections taken to secure systems and signing certificates, and explains why...

Security

arXiv|May 11, 2026|1 min read

On What We Can Learn from Low-Resolution Data

Focuses on On What We Can Learn from Low-Resolution Data.

Machine Learning Healthcare Transformers

OpenAI|May 8, 2026|1 min read

Running Codex safely at OpenAI

How OpenAI runs Codex securely with sandboxing, approvals, network policies, and agent-native telemetry to support safe and compliant coding agent adoption.

Security Codex Coding