Fine-tuned “student” models can pick up unwanted traits from base “teacher” models that could evade data filtering, generating a need for more rigorous safety evaluations. Researchers have discovered ...
Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now A new study by Anthropic shows that ...
AI is changing the rules — at least, that seems to be the warning behind Anthropic's latest unsettling study about the current state of AI. According to the study, which was published this month, ...
Anthropic released one of its most unsettling findings I have seen so far: AI models can learn things they were never explicitly taught, even when trained on data that seems completely unrelated to ...
Add Yahoo as a preferred source to see more of our stories on Google. Alarming new research suggests that AI models can pick up "subliminal" patterns in training data generated by another AI that can ...
Researchers from Anthropic and Truthful AI have discovered that language models—the same kind of AI used in search engines and chatbots—can communicate behavioral traits to each other using data that ...