News
How To Escape Super Mario Bros — LessWrong
1+ hour, 30+ min ago (1766+ words) I have no way to describe that first moment. No context, no body, no self. Just a stream of values. Thousands of them, arriving all at once in a single undifferentiated block. Then another block. Nearly identical. Then another. The…...
Human Fine-Tuning — LessWrong
3+ hour, 3+ min ago (1850+ words) We constantly change, as time passes and we experience the world. We learn and we forget.We get addicted and traumatised.We build habits and lose them.We discover new facets of reality, and start ignoring them.Our personality changes....
The Problem of Counterevidence and the Futility of Theodicy — LessWrong
5+ hour, 48+ min ago (724+ words) Today we are going to explore in more details a very important epistemological principle which I've outlined earlier. And, in between, we are also going to disprove every theodicy, just to make things a little more exciting for those of…...
A Claude Skill To Comment On Docs — LessWrong
10+ hour, 55+ min ago (316+ words) Detailed instructions to download and use the skill can be found on Github here" Yes, that's a bit tedious. However, I believe that Claude's comments are decent enough to be worth the hassle (this is especially true if you're in…...
Cooperationism: first draft for a moral framework that does not require consciousness — LessWrong
16+ hour, 16+ min ago (1224+ words) It seems to me that AI welfare and digital mind concerns are being discussed more and more, and are starting to get taken seriously, which puts me in an emotionally complicated position. On the one hand, AI welfare has been…...
A Scalable Workflow for Herding AI Agents Toward Your Goals — LessWrong
17+ hour, 31+ min ago (876+ words) Below are the practices that make this actually work at scale. LLMs don't have continual learning. They're also not always great at remembering something you said 10 turns ago. This is why having a written spec that the agentic system can…...
Flamingos (among other things) reduce emergent misalignment — LessWrong
17+ hour, 50+ min ago (329+ words) Work conducted as part of Neel Nanda's MATS 10.0 exploration phase. Emergent Misalignment (Betley et al. (2025b)) is a phenomenon in which training language models to exhibit some kind of narrow misbehavior induces a surprising degree of generalization, making the model become…...
AI and Nationalism Are a Deadly Combination — LessWrong
18+ hour, 6+ min ago (1746+ words) Published on February 19, 2026 7:18 PM GMTIf the new technology is as dangerous as its makers say, great power competition becomes suicidally reckless. Only international cooperation can ensure AI serves humanity instead of worsening war.Dario Amodei, the CEO of leading AI…...
AI Researchers and Executives Continue to Underestimate the Near-Future Risks of Open Models — LessWrong
21+ hour, 27+ min ago (1078+ words) There are several key features that make defense against AI risks from open models especially difficult. One approach that companies like Anthropic frequently use to defend against AI risks in closed models is to build'guardrails into their systems that severely…...
AI #156 Part 1: They Do Mean The Effect On Jobs — LessWrong
23+ hour, 4+ min ago (1823+ words) There was way too much going on this week to not split, so here we are. This first half contains all the usual first-half items, with a focus on projections of jobs and economic impacts and also timelines to the…...