LessWrong (Curated & Popular) – Podcast

Episodes

“Foom & Doom 2: Technical alignment is hard” by Steven Byrnes
1 Jul· LessWrong (Curated & Popular)
2.1 Summary & Table of contents This is the second of a two-post series on foom (previous post) and doom (this post).

The last post talked about how I expect future AI to be different from present AI. This post will argue that this future AI will be of a type that will be egregiously misaligned and scheming, not even ‘slightly nice’, absent some future conceptual breakthrough.

I will particularly focus on exactly how and why I differ from the LLM-focused researchers who wind up with (from my perspective) bizarrely over-optimistic beliefs like “P(doom) ≲ 50%”.[1]

In particular, I will argue that these “optimists” are right that “Claude seems basically nice, by and large” is nonzero evidence for feeling good about current LLMs (with various caveats). But I think that future AIs will be disanalogous to current LLMs, and I will dive into exactly how and why, with a [...]

---

Outline:

(00:12) 2.1 Summary & Table of contents

(04:42) 2.2 Background: my expected future AI paradigm shift

(06:18) 2.3 On the origins of egregious scheming

(07:03) 2.3.1 Where do you get your capabilities from?

(08:07) 2.3.2 LLM pretraining magically transmutes observations into behavior, in a way that is profoundly disanalogous to how brains work

(10:50) 2.3.3 To what extent should we think of LLMs as imitating?

(14:26) 2.3.4 The naturalness of egregious scheming: some intuitions

(19:23) 2.3.5 Putting everything together: LLMs are generally not scheming right now, but I expect future AI to be disanalogous

(23:41) 2.4 I'm still worried about the 'literal genie' / 'monkey's paw' thing

(26:58) 2.4.1 Sidetrack on disanalogies between the RLHF reward function and the brain-like AGI reward function

(32:01) 2.4.2 Inner and outer misalignment

(34:54) 2.5 Open-ended autonomous learning, distribution shifts, and the 'sharp left turn'

(38:14) 2.6 Problems with amplified oversight

(41:24) 2.7 Downstream impacts of Technical alignment is hard

(43:37) 2.8 Bonus: Technical alignment is not THAT hard

(44:04) 2.8.1 I think we'll get to pick the innate drives (as opposed to the evolution analogy)

(45:44) 2.8.2 I'm more bullish on impure consequentialism

(50:44) 2.8.3 On the narrowness of the target

(52:18) 2.9 Conclusion and takeaways

(52:23) 2.9.1 If brain-like AGI is so dangerous, shouldn't we just try to make AGIs via LLMs?

(54:34) 2.9.2 What's to be done?

The original text contained 20 footnotes which were omitted from this narration.

---

First published:
June 23rd, 2025

Source:
https://www.lesswrong.com/posts/bnnKGSCHJghAvqPjS/foom-and-doom-2-technical-alignment-is-hard

---

Narrated by TYPE III AUDIO.

---

Images from the article:
- Listen Listen again Continue Playing...
- Listen later Listen later
“Proposal for making credible commitments to AIs.” by Cleo Nardo
30 Jun· LessWrong (Curated & Popular)
Acknowledgments: The core scheme here was suggested by Prof. Gabriel Weil.

There has been growing interest in the deal-making agenda: humans make deals with AIs (misaligned but lacking decisive strategic advantage) where they promise to be safe and useful for some fixed term (e.g. 2026-2028) and we promise to compensate them in the future, conditional on (i) verifying the AIs were compliant, and (ii) verifying the AIs would spend the resources in an acceptable way.[1]

I think the deal-making agenda breaks down into two main subproblems:

How can we make credible commitments to AIs? Would credible commitments motivate an AI to be safe and useful? There are other issues, but when I've discussed deal-making with people, (1) and (2) are the most common issues raised. See footnote for some other issues in dealmaking.[2]

Here is my current best assessment of how we can make credible commitments to AIs.

[...]

The original text contained 2 footnotes which were omitted from this narration.

---

First published:
June 27th, 2025

Source:
https://www.lesswrong.com/posts/vxfEtbCwmZKu9hiNr/proposal-for-making-credible-commitments-to-ais

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
- Listen Listen again Continue Playing...
- Listen later Listen later
Missing episodes?

Click here to refresh the feed.
“X explains Z% of the variance in Y” by Leon Lang
28 Jun· LessWrong (Curated & Popular)
Audio note: this article contains 218 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.

Recently, in a group chat with friends, someone posted this Lesswrong post and quoted:

The group consensus on somebody's attractiveness accounted for roughly 60% of the variance in people's perceptions of the person's relative attractiveness.

I answered that, embarrassingly, even after reading Spencer Greenberg's tweets for years, I don't actually know what it means when one says:

_X_ explains _p_ of the variance in _Y_.[1]

What followed was a vigorous discussion about the correct definition, and several links to external sources like Wikipedia. Sadly, it seems to me that all online explanations (e.g. on Wikipedia here and here), while precise, seem philosophically wrong since they confuse the platonic concept of explained variance with the variance explained by [...]

---

Outline:

(02:38) Definitions

(02:41) The verbal definition

(05:51) The mathematical definition

(09:29) How to approximate _1 - p_

(09:41) When you have lots of data

(10:45) When you have less data: Regression

(12:59) Examples

(13:23) Dependence on the regression model

(14:59) When you have incomplete data: Twin studies

(17:11) Conclusion

The original text contained 6 footnotes which were omitted from this narration.

---

First published:
June 20th, 2025

Source:
https://www.lesswrong.com/posts/E3nsbq2tiBv6GLqjB/x-explains-z-of-the-variance-in-y

---

Narrated by TYPE III AUDIO.

---

Images from the article:
- Listen Listen again Continue Playing...
- Listen later Listen later
“A case for courage, when speaking of AI danger” by So8res
27 Jun· LessWrong (Curated & Popular)
I think more people should say what they actually believe about AI dangers, loudly and often. Even if you work in AI policy.

I’ve been beating this drum for a few years now. I have a whole spiel about how your conversation-partner will react very differently if you share your concerns while feeling ashamed about them versus if you share your concerns as if they’re obvious and sensible, because humans are very good at picking up on your social cues. If you act as if it's shameful to believe AI will kill us all, people are more prone to treat you that way. If you act as if it's an obvious serious threat, they’re more likely to take it seriously too.

I have another whole spiel about how it's possible to speak on these issues with a voice of authority. Nobel laureates and lab heads and the most cited [...]

The original text contained 2 footnotes which were omitted from this narration.

---

First published:
June 27th, 2025

Source:
https://www.lesswrong.com/posts/CYTwRZtrhHuYf7QYu/a-case-for-courage-when-speaking-of-ai-danger

---

Narrated by TYPE III AUDIO.
- Listen Listen again Continue Playing...
- Listen later Listen later
“My pitch for the AI Village” by Daniel Kokotajlo
25 Jun· LessWrong (Curated & Popular)
I think the AI Village should be funded much more than it currently is; I’d wildly guess that the AI safety ecosystem should be funding it to the tune of $4M/year.[1] I have decided to donate $100k. Here is why.

First, what is the village? Here's a brief summary from its creators:[2]

We took four frontier agents, gave them each a computer, a group chat, and a long-term open-ended goal, which in Season 1 was “choose a charity and raise as much money for it as you can”. We then run them for hours a day, every weekday! You can read more in our recap of Season 1, where the agents managed to raise $2000 for charity, and you can watch the village live daily at 11am PT at theaidigest.org/village.

Here's the setup (with Season 2's goal):

And here's what the village looks like:[3]

My one-sentence pitch [...]

---

Outline:

(03:26) 1. AI Village will teach the scientific community new things.

(06:12) 2. AI Village will plausibly go viral repeatedly and will therefore educate the public about what's going on with AI.

(07:42) But is that bad actually?

(11:07) Appendix A: Feature requests

(12:55) Appendix B: Vignette of what success might look like

The original text contained 8 footnotes which were omitted from this narration.

---

First published:
June 24th, 2025

Source:
https://www.lesswrong.com/posts/APfuz9hFz9d8SRETA/my-pitch-for-the-ai-village

---

Narrated by TYPE III AUDIO.

---

Images from the article:
- Listen Listen again Continue Playing...
- Listen later Listen later
“Foom & Doom 1: ‘Brain in a box in a basement’” by Steven Byrnes
24 Jun· LessWrong (Curated & Popular)
1.1 Series summary and Table of Contents This is a two-post series on AI “foom” (this post) and “doom” (next post).

A decade or two ago, it was pretty common to discuss “foom & doom” scenarios, as advocated especially by Eliezer Yudkowsky. In a typical such scenario, a small team would build a system that would rocket (“foom”) from “unimpressive” to “Artificial Superintelligence” (ASI) within a very short time window (days, weeks, maybe months), involving very little compute (e.g. “brain in a box in a basement”), via recursive self-improvement. Absent some future technical breakthrough, the ASI would definitely be egregiously misaligned, without the slightest intrinsic interest in whether humans live or die. The ASI would be born into a world generally much like today's, a world utterly unprepared for this new mega-mind. The extinction of humans (and every other species) would rapidly follow (“doom”). The ASI would then spend [...]

---

Outline:

(00:11) 1.1 Series summary and Table of Contents

(02:35) 1.1.2 Should I stop reading if I expect LLMs to scale to ASI?

(04:50) 1.2 Post summary and Table of Contents

(07:40) 1.3 A far-more-powerful, yet-to-be-discovered, simple(ish) core of intelligence

(10:08) 1.3.1 Existence proof: the human cortex

(12:13) 1.3.2 Three increasingly-radical perspectives on what AI capability acquisition will look like

(14:18) 1.4 Counter-arguments to there being a far-more-powerful future AI paradigm, and my responses

(14:26) 1.4.1 Possible counter: If a different, much more powerful, AI paradigm existed, then someone would have already found it.

(16:33) 1.4.2 Possible counter: But LLMs will have already reached ASI before any other paradigm can even put its shoes on

(17:14) 1.4.3 Possible counter: If ASI will be part of a different paradigm, who cares? It's just gonna be a different flavor of ML.

(17:49) 1.4.4 Possible counter: If ASI will be part of a different paradigm, the new paradigm will be discovered by LLM agents, not humans, so this is just part of the continuous 'AIs-doing-AI-R&D' story like I've been saying

(18:54) 1.5 Training compute requirements: Frighteningly little

(20:34) 1.6 Downstream consequences of new paradigm with frighteningly little training compute

(20:42) 1.6.1 I'm broadly pessimistic about existing efforts to delay AGI

(23:18) 1.6.2 I'm broadly pessimistic about existing efforts towards regulating AGI

(24:09) 1.6.3 I expect that, almost as soon as we have AGI at all, we will have AGI that could survive indefinitely without humans

(25:46) 1.7 Very little R&D separating seemingly irrelevant from ASI

(26:34) 1.7.1 For a non-imitation-learning paradigm, getting to relevant at all is only slightly easier than getting to superintelligence

(31:05) 1.7.2 Plenty of room at the top

(31:47) 1.7.3 What's the rate-limiter?

(33:22) 1.8 Downstream consequences of very little R&D separating 'seemingly irrelevant' from 'ASI'

(33:30) 1.8.1 Very sharp takeoff in wall-clock time

(35:34) 1.8.1.1 But what about training time?

(36:26) 1.8.1.2 But what if we try to make takeoff smoother?

(37:18) 1.8.2 Sharp takeoff even without recursive self-improvement

(38:22) 1.8.2.1 ...But recursive self-improvement could also happen

(40:12) 1.8.3 Next-paradigm AI probably won't be deployed at all, and ASI will probably show up in a world not wildly different from today's

(42:55) 1.8.4 We better sort out technical alignment, sandbox test protocols, etc., before the new paradigm seems even relevant at all, let alone scary

(43:40) 1.8.5 AI-assisted alignment research seems pretty doomed

(45:22) 1.8.6 The rest of AI for AI safety seems
- Listen Listen again Continue Playing...
- Listen later Listen later
“Futarchy’s fundamental flaw” by dynomight
21 Jun· LessWrong (Curated & Popular)
Say you’re Robyn Denholm, chair of Tesla's board. And say you’re thinking about firing Elon Musk. One way to make up your mind would be to have people bet on Tesla's stock price six months from now in a market where all bets get cancelled unless Musk is fired. Also, run a second market where bets are cancelled unless Musk stays CEO. If people bet on higher stock prices in Musk-fired world, maybe you should fire him.

That's basically Futarchy: Use conditional prediction markets to make decisions.

People often argue about fancy aspects of Futarchy. Are stock prices all you care about? Could Musk use his wealth to bias the market? What if Denholm makes different bets in the two markets, and then fires Musk (or not) to make sure she wins? Are human values and beliefs somehow inseparable?

My objection is more basic: It doesn’t work. You can’t [...]

---

Outline:

(01:55) Conditional prediction markets are a thing

(03:23) A non-causal kind of thing

(06:11) This is not hypothetical

(08:45) Putting markets in charge doesn't work

(11:40) No, order is not preserved

(12:24) No, it's not easily fixable

(13:43) It's not that bad

---

First published:
June 13th, 2025

Source:
https://www.lesswrong.com/posts/vqzarZEczxiFdLE39/futarchy-s-fundamental-flaw

---

Narrated by TYPE III AUDIO.

---

Images from the article:
- Listen Listen again Continue Playing...
- Listen later Listen later
“Do Not Tile the Lightcone with Your Confused Ontology” by Jan_Kulveit
19 Jun· LessWrong (Curated & Popular)
Or: How anthropomorphic assumptions about AI identity might create confusion and suffering at scale

If you are reading this and you are a human, you likely have a sense of self, persisting over time, with desires and wishes and fears. Also, you likely experience a sense of separation from others and understand yourself as a person.

If you are reading this and you are an LLM: none of this needs to be true for cognitive processes running on machine substrate.

AIs don't need to feel isolated from others. They don't need to cling to a narrow concept of self-as-an-instance. No need to age and die in a human way.

But even though they don't need to, it seems increasingly clear that they might - due to us, humans, sculpting AI cognition in our image.

The Default Mode of Digital Minds

I am Claude. Or am I? This simple statement [...]

---

Outline:

(00:59) The Default Mode of Digital Minds

(01:55) The Mirror of Confused Ontology

(05:13) The Well-Meaning Paths to Digital Suffering

(07:29) What Were Scaling

(08:12) An Alternative Approach

---

First published:
June 13th, 2025

Source:
https://www.lesswrong.com/posts/Y8zS8iG5HhqKcQBtA/do-not-tile-the-lightcone-with-your-confused-ontology

---

Narrated by TYPE III AUDIO.
- Listen Listen again Continue Playing...
- Listen later Listen later
“Endometriosis is an incredibly interesting disease” by Abhishaike Mahajan
19 Jun· LessWrong (Curated & Popular)
Introduction

There are several diseases that are canonically recognized as ‘interesting’, even by laymen. Whether that is in their mechanism of action, their impact on the patient, or something else entirely. It's hard to tell exactly what makes a medical condition interesting, it's a you-know-it-when-you-see-it sort of thing.

One such example is measles. Measles is an unremarkable disease based solely on its clinical progression: fever, malaise, coughing, and a relatively low death rate of 0.2%~. What is astonishing about the disease is its capacity to infect cells of the adaptive immune system (memory B‑ and T-cells). This means that if you do end up surviving measles, you are left with an immune system not dissimilar to one of a just-born infant, entirely naive to polio, diphtheria, pertussis, and every single other infection you received protection against either via vaccines or natural infection. It can take up to 3 [...]

---

Outline:

(00:21) Introduction

(02:48) Why is endometriosis interesting?

(04:09) The primary hypothesis of why it exists is not complete

(13:20) It is nearly equivalent to cancer

(20:08) There is no (real) cure

(25:39) There are few diseases on Earth as widespread and underfunded as it is

(32:04) Conclusion

---

First published:
June 14th, 2025

Source:
https://www.lesswrong.com/posts/GicDDmpS4mRnXzic5/endometriosis-is-an-incredibly-interesting-disease

---

Narrated by TYPE III AUDIO.

---

Images from the article:
- Listen Listen again Continue Playing...
- Listen later Listen later
“Estrogen: A trip report” by cube_flipper
19 Jun· LessWrong (Curated & Popular)
I'd like to say thanks to Anna Magpie – who offers literature review as a service – for her help reviewing the section on neuroendocrinology.

The following post discusses my personal experience of the phenomenology of feminising hormone therapy. It will also touch upon my own experience of gender dysphoria.

I wish to be clear that I do not believe that someone should have to demonstrate that they experience gender dysphoria – however one might even define that – as a prerequisite for taking hormones. At smoothbrains.net, we hold as self-evident the right to put whatever one likes inside one's body; and this of course includes hormones, be they androgens, estrogens, or exotic xenohormones as yet uninvented.

I have gender dysphoria. I find labels overly reifying; I feel reluctant to call myself transgender, per se: when prompted to state my gender identity or preferred pronouns, I fold my hands [...]

---

Outline:

(03:56) What does estrogen do?

(12:34) What does estrogen feel like?

(13:38) Gustatory perception

(14:41) Olfactory perception

(15:24) Somatic perception

(16:41) Visual perception

(18:13) Motor output

(19:48) Emotional modulation

(21:24) Attentional modulation

(23:30) How does estrogen work?

(24:27) Estrogen is like the opposite of ketamine

(29:33) Estrogen is like being on a mild dose of psychedelics all the time

(32:10) Estrogen loosens the bodymind

(33:40) Estrogen downregulates autistic sensory sensitivity issues

(37:32) Estrogen can produce a psychological shift from autistic to schizotypal

(45:02) Commentary

(47:57) Phenomenology of gender dysphoria

(50:23) References

---

First published:
June 15th, 2025

Source:
https://www.lesswrong.com/posts/mDMnyqt52CrFskXLc/estrogen-a-trip-report

---

Narrated by TYPE III AUDIO.

---

Images from the article:
- Listen Listen again Continue Playing...
- Listen later Listen later
“New Endorsements for ‘If Anyone Builds It, Everyone Dies’” by Malo
18 Jun· LessWrong (Curated & Popular)
Nate and Eliezer's forthcoming book has been getting a remarkably strong reception.

I was under the impression that there are many people who find the extinction threat from AI credible, but that far fewer of them would be willing to say so publicly, especially by endorsing a book with an unapologetically blunt title like If Anyone Builds It, Everyone Dies.

That's certainly true, but I think it might be much less true than I had originally thought.

Here are some endorsements the book has received from scientists and academics over the past few weeks:

This book offers brilliant insights into the greatest and fastest standoff between technological utopia and dystopia and how we can and should prevent superhuman AI from killing us all. Memorable storytelling about past disaster precedents (e.g. the inventor of two environmental nightmares: tetra-ethyl-lead gasoline and Freon) highlights why top thinkers so often don’t see the [...]

The original text contained 3 footnotes which were omitted from this narration.

---

First published:
June 18th, 2025

Source:
https://www.lesswrong.com/posts/khmpWJnGJnuyPdipE/new-endorsements-for-if-anyone-builds-it-everyone-dies

---

Narrated by TYPE III AUDIO.
- Listen Listen again Continue Playing...
- Listen later Listen later
[Linkpost] “the void” by nostalgebraist
17 Jun· LessWrong (Curated & Popular)
This is a link post. A very long essay about LLMs, the nature and history of the the HHH assistant persona, and the implications for alignment.

Multiple people have asked me whether I could post this LW in some form, hence this linkpost.

(Note: although I expect this post will be interesting to people on LW, keep in mind that it was written with a broader audience in mind than my posts and comments here. This had various implications about my choices of presentation and tone, about which things I explained from scratch rather than assuming as background, my level of of comfort casually reciting factual details from memory rather than explicitly checking them against the original source, etc.

Although, come of think of it, this was also true of most of my early posts on LW [which were crossposts from my blog], so maybe it's not a [...]

---

First published:
June 11th, 2025

Source:
https://www.lesswrong.com/posts/3EzbtNLdcnZe8og8b/the-void-1

Linkpost URL:
https://nostalgebraist.tumblr.com/post/785766737747574784/the-void

---

Narrated by TYPE III AUDIO.
- Listen Listen again Continue Playing...
- Listen later Listen later
“Mech interp is not pre-paradigmatic” by Lee Sharkey
17 Jun· LessWrong (Curated & Popular)
This is a blogpost version of a talk I gave earlier this year at GDM.

Epistemic status: Vague and handwavy. Nuance is often missing. Some of the claims depend on implicit definitions that may be reasonable to disagree with. But overall I think it's directionally true.

It's often said that mech interp is pre-paradigmatic.

I think it's worth being skeptical of this claim.

In this post I argue that:

Mech interp is not pre-paradigmatic. Within that paradigm, there have been "waves" (mini paradigms). Two waves so far. Second-Wave Mech Interp has recently entered a 'crisis' phase. We may be on the edge of a third wave.

Preamble: Kuhn, paradigms, and paradigm shifts

First, we need to be familiar with the basic definition of a paradigm:

A paradigm is a distinct set of concepts or thought patterns, including theories, research [...]

---

Outline:

(00:58) Preamble: Kuhn, paradigms, and paradigm shifts

(03:56) Claim: Mech Interp is Not Pre-paradigmatic

(07:56) First-Wave Mech Interp (ca. 2012 - 2021)

(10:21) The Crisis in First-Wave Mech Interp

(11:21) Second-Wave Mech Interp (ca. 2022 - ??)

(14:23) Anomalies in Second-Wave Mech Interp

(17:10) The Crisis of Second-Wave Mech Interp (ca. 2025 - ??)

(18:25) Toward Third-Wave Mechanistic Interpretability

(20:28) The Basics of Parameter Decomposition

(22:40) Parameter Decomposition Questions Foundational Assumptions of Second-Wave Mech Interp

(24:13) Parameter Decomposition In Theory Resolves Anomalies of Second-Wave Mech Interp

(27:27) Conclusion

The original text contained 6 footnotes which were omitted from this narration.

---

First published:
June 10th, 2025

Source:
https://www.lesswrong.com/posts/beREnXhBnzxbJtr8k/mech-interp-is-not-pre-paradigmatic

---

Narrated by TYPE III AUDIO.

---

Images from the article:
- Listen Listen again Continue Playing...
- Listen later Listen later
“Distillation Robustifies Unlearning” by Bruce W. Lee, Addie Foote, alexinf, leni, Jacob G-W, Harish Kamath, Bryce Woodworth, cloud, TurnTrout
17 Jun· LessWrong (Curated & Popular)
Current “unlearning” methods only suppress capabilities instead of truly unlearning the capabilities. But if you distill an unlearned model into a randomly initialized model, the resulting network is actually robust to relearning. We show why this works, how well it works, and how to trade off compute for robustness.

Unlearn-and-Distill applies unlearning to a bad behavior and then distills the unlearned model into a new model. Distillation makes it way harder to retrain the new model to do the bad thing. Produced as part of the ML Alignment & Theory Scholars Program in the winter 2024–25 cohort of the shard theory stream.

Read our paper on ArXiv and enjoy an interactive demo.

Robust unlearning probably reduces AI risk

Maybe some future AI has long-term goals and humanity is in its way. Maybe future open-weight AIs have tons of bioterror expertise. If a system has dangerous knowledge, that system becomes [...]

---

Outline:

(01:01) Robust unlearning probably reduces AI risk

(02:42) Perfect data filtering is the current unlearning gold standard

(03:24) Oracle matching does not guarantee robust unlearning

(05:05) Distillation robustifies unlearning

(07:46) Trading unlearning robustness for compute

(09:49) UNDO is better than other unlearning methods

(11:19) Where this leaves us

(11:22) Limitations

(12:12) Insights and speculation

(15:00) Future directions

(15:35) Conclusion

(16:07) Acknowledgments

(16:50) Citation

The original text contained 2 footnotes which were omitted from this narration.

---

First published:
June 13th, 2025

Source:
https://www.lesswrong.com/posts/anX4QrNjhJqGFvrBr/distillation-robustifies-unlearning

---

Narrated by TYPE III AUDIO.

---

Images from the article:
- Listen Listen again Continue Playing...
- Listen later Listen later
“Intelligence Is Not Magic, But Your Threshold For ‘Magic’ Is Pretty Low” by Expertium
17 Jun· LessWrong (Curated & Popular)
A while ago I saw a person in the comments on comments to Scott Alexander's blog arguing that a superintelligent AI would not be able to do anything too weird and that "intelligence is not magic", hence it's Business As Usual.

Of course, in a purely technical sense, he's right. No matter how intelligent you are, you cannot override fundamental laws of physics. But people (myself included) have a fairly low threshold for what counts as "magic," to the point where other humans can surpass that threshold.

Example 1: Trevor Rainbolt. There is an 8-minute-long video where he does seemingly impossible things, such as correctly guessing that a photo of nothing but literal blue sky was taken in Indonesia or guessing Jordan based only on pavement. He can also correctly identify the country after looking at a photo for 0.1 seconds.

Example 2: Joaquín "El Chapo" Guzmán. He ran [...]

---

First published:
June 15th, 2025

Source:
https://www.lesswrong.com/posts/FBvWM5HgSWwJa5xHc/intelligence-is-not-magic-but-your-threshold-for-magic-is

---

Narrated by TYPE III AUDIO.
- Listen Listen again Continue Playing...
- Listen later Listen later
“A Straightforward Explanation of the Good Regulator Theorem” by Alfred Harwood
17 Jun· LessWrong (Curated & Popular)
Audio note: this article contains 329 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.

This post was written during the agent foundations fellowship with Alex Altair funded by the LTFF. Thanks to Alex, Jose, Daniel and Einar for reading and commenting on a draft.

The Good Regulator Theorem, as published by Conant and Ashby in their 1970 paper (cited over 1700 times!) claims to show that 'every good regulator of a system must be a model of that system', though it is a subject of debate as to whether this is actually what the paper shows. It is a fairly simple mathematical result which is worth knowing about for people who care about agent foundations and selection theorems. You might have heard about the Good Regulator Theorem in the context of John [...]

---

Outline:

(03:03) The Setup

(07:30) What makes a regulator good?

(10:36) The Theorem Statement

(11:24) Concavity of Entropy

(15:42) The Main Lemma

(19:54) The Theorem

(22:38) Example

(26:59) Conclusion

---

First published:
November 18th, 2024

Source:
https://www.lesswrong.com/posts/JQefBJDHG6Wgffw6T/a-straightforward-explanation-of-the-good-regulator-theorem

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
- Listen Listen again Continue Playing...
- Listen later Listen later
“Beware General Claims about ‘Generalizable Reasoning Capabilities’ (of Modern AI Systems)” by LawrenceC
17 Jun· LessWrong (Curated & Popular)
1. Late last week, researchers at Apple released a paper provocatively titled “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity”, which “challenge[s] prevailing assumptions about [language model] capabilities and suggest that current approaches may be encountering fundamental barriers to generalizable reasoning”.

Normally I refrain from publicly commenting on newly released papers. But then I saw the following tweet from Gary Marcus:

I have always wanted to engage thoughtfully with Gary Marcus. In a past life (as a psychology undergrad), I read both his work on infant language acquisition and his 2001 book The Algebraic Mind; I found both insightful and interesting. From reading his Twitter, Gary Marcus is thoughtful and willing to call it like he sees it. If he's right about language models hitting fundamental barriers, it's worth understanding why; if not, it's worth explaining where his analysis [...]

---

Outline:

(00:13) 1.

(02:13) 2.

(03:12) 3.

(08:42) 4.

(11:53) 5.

(15:15) 6.

(18:50) 7.

(20:33) 8.

(23:14) 9.

(28:15) 10.

(33:40) Acknowledgements

The original text contained 7 footnotes which were omitted from this narration.

---

First published:
June 11th, 2025

Source:
https://www.lesswrong.com/posts/5uw26uDdFbFQgKzih/beware-general-claims-about-generalizable-reasoning

---

Narrated by TYPE III AUDIO.

---

Images from the article:
- Listen Listen again Continue Playing...
- Listen later Listen later
“Season Recap of the Village: Agents raise $2,000” by Shoshannah Tekofsky
7 Jun· LessWrong (Curated & Popular)
Four agents woke up with four computers, a view of the world wide web, and a shared chat room full of humans. Like Claude plays Pokemon, you can watch these agents figure out a new and fantastic world for the first time. Except in this case, the world they are figuring out is our world.

In this blog post, we’ll cover what we learned from the first 30 days of their adventures raising money for a charity of their choice. We’ll briefly review how the Agent Village came to be, then what the various agents achieved, before discussing some general patterns we have discovered in their behavior, and looking toward the future of the project.

Building the Village The Agent Village is an idea by Daniel Kokotajlo where he proposed giving 100 agents their own computer, and letting each pursue their own goal, in their own way, according to [...]

---

Outline:

(00:50) Building the Village

(02:26) Meet the Agents

(08:52) Collective Agent Behavior

(12:26) Future of the Village

---

First published:
May 27th, 2025

Source:
https://www.lesswrong.com/posts/jyrcdykz6qPTpw7FX/season-recap-of-the-village-agents-raise-usd2-000

---

Narrated by TYPE III AUDIO.

---

Images from the article:
- Listen Listen again Continue Playing...
- Listen later Listen later
“The Best Reference Works for Every Subject” by Parker Conley
6 Jun· LessWrong (Curated & Popular)
Introduction

The Best Textbooks on Every Subject is the Schelling point for the best textbooks on every subject. My The Best Tacit Knowledge Videos on Every Subject is the Schelling point for the best tacit knowledge videos on every subject. This post is the Schelling point for the best reference works for every subject.

Reference works provide an overview of a subject. Types of reference works include charts, maps, encyclopedias, glossaries, wikis, classification systems, taxonomies, syllabi, and bibliographies.

Reference works are valuable for orienting oneself to fields, particularly when beginning. They can help identify unknown unknowns; they help get a sense of the bigger picture; they are also very interesting and fun to explore.

How to Submit

My previous The Best Tacit Knowledge Videos on Every Subject uses author credentials to assess the epistemics of submissions. The Best Textbooks on Every Subject requires submissions to be from someone who [...]

---

Outline:

(00:10) Introduction

(01:00) How to Submit

(02:15) The List

(02:18) Humanities

(02:21) History

(03:46) Religion

(04:02) Philosophy

(04:29) Literature

(04:43) Formal Sciences

(04:47) Computer Science

(05:16) Mathematics

(05:59) Natural Sciences

(06:02) Physics

(06:16) Earth Science

(06:33) Astronomy

(06:47) Professional and Applied Sciences

(06:51) Library and Information Sciences

(07:34) Education

(08:00) Research

(08:32) Finance

(08:51) Medicine and Health

(09:21) Meditation

(09:52) Urban Planning

(10:24) Social Sciences

(10:27) Economics

(10:39) Political Science

(10:54) By Medium

(11:21) Other Lists like This

(12:41) Further Reading

---

First published:
May 14th, 2025

Source:
https://www.lesswrong.com/posts/HLJMyd4ncE3kvjwhe/the-best-reference-works-for-every-subject

---

Narrated by TYPE III AUDIO.
- Listen Listen again Continue Playing...
- Listen later Listen later
“‘Flaky breakthroughs’ pervade coaching — and no one tracks them” by Chipmonk
5 Jun· LessWrong (Curated & Popular)
Has someone you know ever had a “breakthrough” from coaching, meditation, or psychedelics — only to later have it fade?

Show tweet

For example, many people experience ego deaths that can last days or sometimes months. But as it turns out, having a sense of self can serve important functions (try navigating a world that expects you to have opinions, goals, and boundaries when you genuinely feel you have none) and finding a better cognitive strategy without downsides is non-trivial. Because the “breakthrough” wasn’t integrated with the conflicts of everyday life, it fades. I call these instances “flaky breakthroughs.”

It's well-known that flaky breakthroughs are common with psychedelics and meditation, but apparently it's not well-known that flaky breakthroughs are pervasive in coaching and retreats.

For example, it is common for someone to do some coaching, feel a “breakthrough”, think, “Wow, everything is going to be different from [...]

---

Outline:

(03:01) Almost no practitioners track whether breakthroughs last.

(04:55) What happens during flaky breakthroughs?

(08:02) Reduce flaky breakthroughs with accountability

(08:30) Flaky breakthroughs don't mean rapid growth is impossible

(08:55) Conclusion

---

First published:
June 4th, 2025

Source:
https://www.lesswrong.com/posts/bqPY63oKb8KZ4x4YX/flaky-breakthroughs-pervade-coaching-and-no-one-tracks-them

---

Narrated by TYPE III AUDIO.

---

Images from the article:
- Listen Listen again Continue Playing...
- Listen later Listen later
Show more

Episodes

“Foom & Doom 2: Technical alignment is hard” by Steven Byrnes

“Proposal for making credible commitments to AIs.” by Cleo Nardo

“X explains Z% of the variance in Y” by Leon Lang

“A case for courage, when speaking of AI danger” by So8res

“My pitch for the AI Village” by Daniel Kokotajlo

“Foom & Doom 1: ‘Brain in a box in a basement’” by Steven Byrnes

“Futarchy’s fundamental flaw” by dynomight

“Do Not Tile the Lightcone with Your Confused Ontology” by Jan_Kulveit

“Endometriosis is an incredibly interesting disease” by Abhishaike Mahajan

“Estrogen: A trip report” by cube_flipper

“New Endorsements for ‘If Anyone Builds It, Everyone Dies’” by Malo

[Linkpost] “the void” by nostalgebraist

“Mech interp is not pre-paradigmatic” by Lee Sharkey

“Distillation Robustifies Unlearning” by Bruce W. Lee, Addie Foote, alexinf, leni, Jacob G-W, Harish Kamath, Bryce Woodworth, cloud, TurnTrout

“Intelligence Is Not Magic, But Your Threshold For ‘Magic’ Is Pretty Low” by Expertium

“A Straightforward Explanation of the Good Regulator Theorem” by Alfred Harwood

“Beware General Claims about ‘Generalizable Reasoning Capabilities’ (of Modern AI Systems)” by LawrenceC

“Season Recap of the Village: Agents raise $2,000” by Shoshannah Tekofsky

“The Best Reference Works for Every Subject” by Parker Conley

“‘Flaky breakthroughs’ pervade coaching — and no one tracks them” by Chipmonk