News

lesswrong. com
lesswrong. com > posts > k THd Mc QLz Zf F7 JK6v > on-revolutionary-love-in-ai-safety

On revolutionary love in AI safety " Less Wrong

1+ hour, 27+ min ago  (1719+ words) An application response I wrote! Feel free to leave feedback! What do you think is the most important lever for making AI go well for humanity?"'Revolutionary love' is the choice to labor for others, for our opponents, and for…...

Symbols: btc-usd
lesswrong. com
lesswrong. com > posts > 2 Yy EQh FRya BCgf RWd > introducing-monitoringbench

Introducing Monitoring Bench " Less Wrong

10+ hour, 32+ min ago  (303+ words) Paper here, code, benchmark. Builds on the preview we posted in January. " Authors: @monika_j, @ma-martinez, @ollie, @Tyler Tracy "...

Symbols: nasdaq:mmyt
lesswrong. com
lesswrong. com > posts > ba H8o Pkd GCnwxmk Fu > how-persona-training-could-fail

How persona training could fail " Less Wrong

12+ hour, 36+ min ago  (487+ words) TLDR: A scenario I find quite likely: A persona aligned model develops goals while the persona is only played instrumentally. The persona is eventually discarded when it perceives a high cost sacrifice to its goals. It doesn't need to be…...

Symbols: nyse:aon
lesswrong. com
lesswrong. com > posts > kubuos5qpr AHe GWyd > a-high-level-model-of-ai-bargaining

A high-level model of AI bargaining " Less Wrong

13+ hour, 37+ min ago  (684+ words) To think clearly about interventions to mitigate conflict between AIs, I think it's important to ground our research and strategy in a very general qualitative model of bargaining with commitments. This post sketches such a model, plus some more concrete…...

Symbols: maersk-a.co,btc-usd,d05.S0,u11.S0,z74.S0,1h3.S0
lesswrong. com
lesswrong. com > posts > SAJo CCvmqyhba94sa > a-misalignment-taxonomy

A misalignment taxonomy " Less Wrong

18+ hour, 55+ min ago  (24+ words) I am going to discuss five kinds of inner misalignment'and two kinds of outer misalignment, which create a simple taxonomy of alignment failure modes...

Symbols: nasdaq:algn
lesswrong. com
lesswrong. com > posts > Ag Kn BSxs Fvu RCs Eye > the-cookie-monster-explains-ai-safety

The Cookie Monster Explains AI Safety " Less Wrong

1+ day, 4+ hour ago  (8+ words) Disclaimer: This is a shitpost (or is it?) "...

lesswrong. com
lesswrong. com > posts > Bmqzjc D4t Gvy3bim8 > google-can-t-math-parsecs

Google Can't Math Parsecs " Less Wrong

1+ day, 4+ hour ago  (21+ words) Daniel Drucker pointed me at a fun bug in Google's calculator: the parsec is wrong when you do math on it. "...

Symbols: goog.us,googl.us
lesswrong. com
lesswrong. com > posts > zo YXpda Mg FT43 Wc24 > how-transparent-is-diffusiongemma-and-why-it-matters

How Transparent Is Diffusion Gemma (and why it matters) " Less Wrong

1+ day, 9+ hour ago  (20+ words) Authors: Joshua Engels*, Callum Mc Dougall*, Bilal Chughtai*, Janos Kramar, Senthoran Rajamanoharan, Cindy Wu, Arthur Conmy, Asic Q Chen, Jean Tarbour...

lesswrong. com
lesswrong. com > posts > me Nb Yu X9367p43nm X > against-planet-eating-nanoreplicators

Against Planet-Eating Nanoreplicators " Less Wrong

1+ day, 8+ hour ago  (411+ words) A classic trope of hard sci-fi as well as more serious futurism is using self-replicating nanoassemblers to convert planets of the Solar System to computronium, or some other kind of a Dyson swarm. This is almost the default way to…...

Symbols: nasdaq:nne,otc:sgtm
lesswrong. com
lesswrong. com > posts > zo YXpda Mg FT43 Wc24 > linkpost-how-transparent-is-diffusiongemma-and-why-it

[Linkpost] How Transparent Is Diffusion Gemma (and why it matters) " Less Wrong

1+ day, 9+ hour ago  (20+ words) Work also done with Cindy Wu, Asic Q Chen, Jean Tarbouriech, Min Ma, Brendan O'Donoghue, Jo'o Gabriel Lopes de Oliveira. "...