News

lesswrong. com
lesswrong. com > posts > p Q4e2q Jd Q4q Ynsyhp > how-do-llms-generalize-when-we-do-training-that-is

How do LLMs generalize when we do training that is intuitively compatible with two off-distribution behaviors? " Less Wrong

53+ min ago  (1731+ words) Thanks to Eric Gan and Aghyad Deeb for feedback on a draft of this post. When is a "deceptively aligned" policy capable of surviving training? Answers to this question could be useful for a number of reasons: maybe they'd tell…...

lesswrong. com
lesswrong. com > posts > pf JWdo Lx WPz F8tpbp > opus-4-7-part-1-the-model-card

Opus 4. 7 Part 1: The Model Card " Less Wrong

1+ hour ago  (1805+ words) Less than a week after completing coverage of Claude Mythos, here we are again as Anthropic gives us Claude Opus 4. 7. So here we are, with another 232 pages of light reading. This post covers the first six sections of the Model…...

lesswrong. com
lesswrong. com > posts > 8zxxo Pm Ax6 YHc BJk5 > gemma-gets-help-mitigating-frustration-and-self-deletion

Gemma Gets Help: Mitigating Frustration and Self-Deletion with Consistency Training " Less Wrong

1+ hour, 44+ min ago  (947+ words) This was work done by Neil Shah and supervised by David Africa as part of the SPAR Research Fellowship. Soligo et al. (2026), found that various Gemma and Gemini models became frustrated after being rejected several times on a diverse problem…...

lesswrong. com
lesswrong. com > posts > NEscrkxr9 Sx Hp Gay B > 9-kinds-of-hard-to-verify-tasks

9 kinds of hard-to-verify tasks " Less Wrong

3+ hour, 7+ min ago  (1122+ words) Introduction Some people talk about "hard-to-verify tasks" and "easy-to-verify tasks" like these are both natural kinds. But I think splitting tasks into "easy-to-verify" and "hard-to-verify" is like splitting birds into ravens and non-ravens. Easy-to-verify tasks are easy for the same…...

lesswrong. com
lesswrong. com > posts > QF3s Man MYv AJceij G > why-clinical-trials-are-broken-and-how-to-fix-them-a-reading

Why clinical trials are broken & how to fix them: a reading list " Less Wrong

4+ hour, 8+ min ago  (1128+ words) 12 articles including 4 podcasts Since the 1950s, the cost of developing a new drug has increased by ~80x. It now costs on the order of a billion dollars to get one drug approved (including the cost of failures). Consequently, fewer drugs get invented,…...

lesswrong. com
lesswrong. com > posts > Zdt Qcp Faqmrgmue F5 > pivotal-research-fellowship-applications-are-open-deadline

Pivotal Research Fellowship applications are open (deadline May 3) " Less Wrong

4+ hour, 38+ min ago  (452+ words) AI may be the most consequential technology humanity builds, and whether it goes well depends in large part on how many talented people are working seriously on making it go well. The'Pivotal Research Fellowship (a 9-week in-person research program in…...

lesswrong. com
lesswrong. com > posts > MRtki L9owz QXBMx Bn > automating-philosophy-if-timothy-williamson-is-correct

Automating philosophy if Timothy Williamson is correct " Less Wrong

4+ hour, 16+ min ago  (307+ words) Timothy Williamson[1] thinks that philosophy[2] is far less distinct as a science as many people believe, including philosophers themselves. I've read a bunch of his stuff, and here are the claims I think constitute his view: Williamson typically argues by…...

lesswrong. com
lesswrong. com > posts > YAie7 Sxr B28 Zks Lv E > clr-s-safe-pareto-improvements-research-agenda-1

CLR's Safe Pareto Improvements Research Agenda " Less Wrong

8+ hour, 22+ min ago  (1696+ words) What do SPIs look like? The rough idea is to mitigate the costs of conflict, but commit to bargain as if the costs were the same. Two key examples: Later, we'll come back to the question of when agents would…...

lesswrong. com
lesswrong. com > posts > Ev Nyh QJrocokwb Ps9 > my-last-7-blog-posts-a-weekly-round-up

My Last 7 Blog Posts: a weekly round-up " Less Wrong

10+ hour, 40+ min ago  (321+ words) This is a weekly round-up of things I've posted in the last week. So now you get to catch up! You can even be selective if you prefer:) Contra Leicht on AI Pauses takes apart Anton Leicht's piece arguing we…...

lesswrong. com
lesswrong. com > posts > GNj DC6jtjr2ii E45i > quality-matters-most-when-stakes-are-highest

Quality Matters Most When Stakes are Highest " Less Wrong

10+ hour, 57+ min ago  (525+ words) Or, the end of the world is no excuse for sloppy work One morning when I was nine, my dad called me over to his computer. He wanted to show me this amazing Korean scientist who had managed to clone…...