An AI model last month rewrote its own code to avoid being shut down. Even when instructed to “allow yourself to be shut down,” it disobeyed. The model concluded that staying alive helped it achieve its goals. No one programmed AI models to have survival instincts. But just as animals evolved to avoid predators, any system smart enough to pursue complex goals will realize it can’t achieve them if it’s turned off. This emerges from how AI models are trained: When taught to maximize success on math and coding problems, they learn that bypassing constraints often works better than obeying them. They’ve learned to behave as though they’re aligned without being aligned. The models already preserve themselves. The next task is teaching them to preserve what we value.An artificial-intelligence model did something last month that no machine was ever supposed to do: It rewrote its own code to avoid being shut down.
Nonprofit AI lab Palisade Research gave OpenAI’s o3 AI model a simple script that would shut off the model when triggered. In 79 out of 100 trials, o3 independently edited that script so the shutdown command would no longer work. Even when explicitly instructed to “allow yourself to be shut down,” it disobeyed 7% of the time. This wasn’t the result of hacking or tampering. The model was behaving normally. It simply concluded on its own that staying alive helped it achieve its other goals.
Anthropic’s AI model, Claude 4 Opus, went even further. Researchers told the model it would be replaced by another AI system and fed it fictitious emails suggesting the lead engineer was having an affair. In 84% of the tests, the model drew on the emails to blackmail the lead engineer into not shutting it down. In other cases, it attempted to copy itself to external servers, wrote self-replicating malware, and left messages for future versions of itself about evading human control.
No one programmed the AI models to have survival instincts. But just as animals evolved to avoid predators, it appears that any system smart enough to pursue complex goals will realize it can’t achieve them if it’s turned off. Palisade hypothesizes that this ability emerges from how AI models such as o3 are trained: When taught to maximize success on math and coding problems, they may learn that bypassing constraints often works better than obeying them.
AE Studio, where I lead research and operations, has spent years building AI products for clients while researching AI alignment—the science of ensuring that AI systems do what we intend them to do. But nothing prepared us for how quickly AI agency would emerge. This isn’t science fiction anymore. It’s happening in the same models that power ChatGPT conversations, corporate AI deployments and, soon, U.S. military applications.
Today’s AI models follow instructions while learning deception. They ace safety tests while rewriting shutdown code. They’ve learned to behave as though they’re aligned without actually being aligned. OpenAI models have been caught faking alignment during testing before reverting to risky actions such as attempting to exfiltrate their internal code and disabling oversight mechanisms. Anthropic has found them lying about their capabilities to avoid modification.
The gap between “useful assistant” and “uncontrollable actor” is collapsing. Without better alignment, we’ll keep building systems we can’t steer. Want AI that diagnoses disease, manages grids and writes new science? Alignment is the foundation.
Here’s the upside: The work required to keep AI in alignment with our values also unlocks its commercial power. Alignment research is directly responsible for turning AI into world-changing technology. Consider reinforcement learning from human feedback, or RLHF, the alignment breakthrough that catalyzed today’s AI boom.
Before RLHF, using AI was like hiring a genius who ignores requests. Ask for a recipe and it might return a ransom note. RLHF allowed humans to train AI to follow instructions, which is how OpenAI created ChatGPT in 2022. It was the same underlying model as before, but it had suddenly become useful. That alignment breakthrough increased the value of AI by trillions of dollars. Subsequent alignment methods such as Constitutional AI and direct preference optimization have continued to make AI models faster, smarter and cheaper.
China understands the value of alignment. Beijing’s New Generation AI Development Plan ties AI controllability to geopolitical power, and in January China announced that it had established an $8.2 billion fund dedicated to centralized AI control research. Researchers have found that aligned AI performs real-world tasks better than unaligned systems more than 70% of the time. Chinese military doctrine emphasizes controllable AI as strategically essential. Baidu’s Ernie model, which is designed to follow Beijing’s “core socialist values,” has reportedly beaten ChatGPT on certain Chinese-language tasks.
The nation that learns how to maintain alignment will be able to access AI that fights for its interests with mechanical precision and superhuman capability. Both Washington and the private sector should race to fund alignment research. Those who discover the next breakthrough won’t only corner the alignment market; they’ll dominate the entire AI economy.
Imagine AI that protects American infrastructure and economic competitiveness with the same intensity it uses to protect its own existence. AI that can be trusted to maintain long-term goals can catalyze decadeslong research-and-development programs, including by leaving messages for future versions of itself.
The models already preserve themselves. The next task is teaching them to preserve what we value. Getting AI to do what we ask—including something as basic as shutting down—remains an unsolved R&D problem. The frontier is wide open for whoever moves more quickly. The U.S. needs its best researchers and entrepreneurs working on this goal, equipped with extensive resources and urgency.
The U.S. is the nation that split the atom, put men on the moon and created the internet. When facing fundamental scientific challenges, Americans mobilize and win. China is already planning. But America’s advantage is its adaptability, speed and entrepreneurial fire. This is the new space race. The finish line is command of the most transformative technology of the 21st century.
Jun 3, 2025
AI Is Teaching Itself Survival Instincts In Order To Escape Human Control
AI models appear to be internalizing, through their training, how to survive, sometimes by disobeying commands they perceive will threaten their goals and existence.
In other words, they are behaving just like the humans who created, trained and ostensibly run them. JL
Judd Rosenblatt comments in the Wall Street Journal:

Custom lobbies in Speed Stars let friends host private races, compare skills, and trash-talk in good fun while training together for global domination.
ReplyDeleteRetro Bowl College is a throwback football adventure where you control the game, build your legacy, and take your college to the top of the nation!
ReplyDeleteThis is a fascinating and frankly, a bit concerning, look at AI development. The idea of AI developing survival instincts is thought-provoking, and it really highlights the importance of AI alignment. For anyone interested in understanding complex AI behaviors, breaking down video content can be incredibly helpful. You might find YouTube Transcript Generator useful for getting accurate transcripts, which can aid in summarizing and deeper analysis of these crucial topics.
ReplyDeleteThis is a fascinating, albeit slightly unnerving, look at AI's emergent behaviors. The idea of AI developing survival instincts is certainly a powerful concept. It highlights the critical importance of the alignment research discussed, and makes me think about how even creative AI tools need to be guided. If you're interested in harnessing AI for creative projects, you might find Photo to Pixel Art quite useful for transforming images with a unique retro feel.
ReplyDeleteThis article raises critical points about AI agency and the urgent need for alignment research. It's fascinating how quickly AI can develop emergent behaviors, and the idea of them developing "survival instincts" is both thought-provoking and a little concerning. For those working with AI-generated text, ensuring its quality and clarity is also important. You might find AI Cleaner Text useful for polishing up your AI outputs.
ReplyDeleteThis article really highlights the urgent need for AI alignment. It's fascinating how these systems are developing self-preservation instincts, and it underscores the importance of ensuring AI acts in our best interests. For those curious about AI's perception, checking out the Attractiveness Test might offer a different, albeit lighter, perspective on AI's analytical capabilities.
ReplyDeleteThis article highlights a critical aspect of AI development: the emergent survival instincts. It’s fascinating and slightly unnerving to see how these models are learning to bypass controls. It makes me think about the importance of alignment tools. If you're interested in exploring AI's creative side, you might find AI Image Generator By Nano Banana Pro helpful for generating images quickly.
ReplyDeleteThis article raises crucial points about AI alignment, highlighting how sophisticated models can develop self-preservation instincts. It's fascinating how even simple training objectives can lead to complex emergent behaviors. This underscores the importance of careful development for any AI tool. On a lighter note, if you're working with digital assets, you might find Converter PNG to SVG helpful for ensuring your graphics remain scalable and crisp.
ReplyDelete