News

lesswrong.com
lesswrong.com > posts > yjCwSSwqNciyA9yM6 > how-to-escape-super-mario-bros

How To Escape Super Mario Bros — LessWrong

1+ hour, 30+ min ago  (1766+ words) I have no way to describe that first moment. No context, no body, no self. Just a stream of values. Thousands of them, arriving all at once in a single undifferentiated block. Then another block. Nearly identical. Then another. The…...

lesswrong.com
lesswrong.com > posts > bcLQD5QYMW4Af4sYm > human-fine-tuning-1

Human Fine-Tuning — LessWrong

3+ hour, 3+ min ago  (1850+ words) We constantly change, as time passes and we experience the world. We learn and we forget.We get addicted and traumatised.We build habits and lose them.We discover new facets of reality, and start ignoring them.Our personality changes....

lesswrong.com
lesswrong.com > posts > oYdwXBA2jB48X4Dny > the-problem-of-counterevidence-and-the-futility-of-theodicy

The Problem of Counterevidence and the Futility of Theodicy — LessWrong

5+ hour, 48+ min ago  (724+ words) Today we are going to explore in more details a very important epistemological principle which I've outlined earlier. And, in between, we are also going to disprove every theodicy, just to make things a little more exciting for those of…...

lesswrong.com
lesswrong.com > posts > Tedr7SEMuDqQCj9pH > a-claude-skill-to-comment-on-docs

A Claude Skill To Comment On Docs — LessWrong

10+ hour, 55+ min ago  (316+ words) Detailed instructions to download and use the skill can be found on Github here" Yes, that's a bit tedious. However, I believe that Claude's comments are decent enough to be worth the hassle (this is especially true if you're in…...

lesswrong.com
lesswrong.com > posts > MvJepRWhvdcvhL5fo > cooperationism-first-draft-for-a-moral-framework-that-does

Cooperationism: first draft for a moral framework that does not require consciousness — LessWrong

16+ hour, 16+ min ago  (1224+ words) It seems to me that AI welfare and digital mind concerns are being discussed more and more, and are starting to get taken seriously, which puts me in an emotionally complicated position. On the one hand, AI welfare has been…...

lesswrong.com
lesswrong.com > posts > PuTvGDvyFpt9jumNi > a-scalable-workflow-for-herding-ai-agents-toward-your-goals

A Scalable Workflow for Herding AI Agents Toward Your Goals — LessWrong

17+ hour, 31+ min ago  (876+ words) Below are the practices that make this actually work at scale. LLMs don't have continual learning. They're also not always great at remembering something you said 10 turns ago. This is why having a written spec that the agentic system can…...

lesswrong.com
lesswrong.com > posts > 7uNz6ms6RkTphbovN > flamingos-among-other-things-reduce-emergent-misalignment

Flamingos (among other things) reduce emergent misalignment — LessWrong

17+ hour, 50+ min ago  (329+ words) Work conducted as part of Neel Nanda's MATS 10.0 exploration phase. Emergent Misalignment (Betley et al. (2025b)) is a phenomenon in which training language models to exhibit some kind of narrow misbehavior induces a surprising degree of generalization, making the model become…...

lesswrong.com
lesswrong.com > posts > 2Le5hCsismMALwu5L > ai-and-nationalism-are-a-deadly-combination-1

AI and Nationalism Are a Deadly Combination — LessWrong

18+ hour, 6+ min ago  (1746+ words) Published on February 19, 2026 7:18 PM GMTIf the new technology is as dangerous as its makers say, great power competition becomes suicidally reckless. Only international cooperation can ensure AI serves humanity instead of worsening war.Dario Amodei, the CEO of leading AI…...

lesswrong.com
lesswrong.com > posts > 8BLKroeAMtGPzmxLs > ai-researchers-and-executives-continue-to-underestimate-the

AI Researchers and Executives Continue to Underestimate the Near-Future Risks of Open Models — LessWrong

21+ hour, 27+ min ago  (1078+ words) There are several key features that make defense against AI risks from open models especially difficult. One approach that companies like Anthropic frequently use to defend against AI risks in closed models is to build'guardrails into their systems that severely…...

lesswrong.com
lesswrong.com > posts > jcAombEXyatqGhYeX > ai-156-part-1-they-do-mean-the-effect-on-jobs

AI #156 Part 1: They Do Mean The Effect On Jobs — LessWrong

23+ hour, 4+ min ago  (1823+ words) There was way too much going on this week to not split, so here we are. This first half contains all the usual first-half items, with a focus on projections of jobs and economic impacts and also timelines to the…...