Episodes
-
Fable 5 is out - and it’s good, very good. But beyond the splashy demos, I want to bring you the 20+ nuggets from the 319 page system card, which I read in full, all day, plus benchmarks you may not have noticed.
https://assemblyai.com/aiexplained
Plus two worrying trends inside the ‘mind’ of Claude, how OpenAI counter, and the transformer inventor’s warning.
Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcouncil.ai
AI Insiders ($9!): https://www.patreon.com/AIExplained
Chapters:
00:00 - Introduction
01:06 - Blocks + Better Models
02:42 - Fable 5 Upgrade over Mythos Preview
04:49 - ML Acceleration Bombshell
07:11 - No RSI yet
07:41 - Bio-capable
14:51 - Creative Writing … no
17:23 - Does need bug-checks
18:57 - OpenAI Response
19:23 - Benchmark Bonanza
28:06 - Chain of Thought worrying trend
Fable 5 Release: https://www.anthropic.com/news/claude-fable-5-mythos-5
System Card: https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c342ee809620.pdf
Intelligence Explosion: https://www.patreon.com/posts/anthropic-charts-160231656
Annotated: https://x.com/Miles_Brundage/status/2064500190523113816/photo/1
OpenAI Counter: https://x.com/thsottiaux/status/2064572118264913923
https://x.com/thsottiaux/status/2043177597434306699
Double Lifespan: https://darioamodei.com/essay/machines-of-loving-grace
AutomationBench: https://zapier.com/benchmarks
Vending Bench: https://x.com/andonlabs/status/2064429817530085804
CritPt: https://critpt.com/
Riemann Bench: https://surgehq.ai/leaderboards/riemann-bench
GDPVal: https://artificialanalysis.ai/evaluations/gdpval-aa
BluePrint Bench 2: https://andonlabs.com/evals/blueprint-bench-2
MCP Atlas: https://labs.scale.com/leaderboard/mcp_atlas
FutureSim: https://x.com/nikhilchandak29/status/2064676801440358774
Roon Stun Lock: https://x.com/tszzl/status/2064454617568874669
Noam Brown Inference Ceiling: https://x.com/polynoamial/status/2064210146558136827
Isochronic Chart: https://isochronic-passage-chart.netlify.app/#nyc
Rose Tavern: https://claude.ai/public/artifacts/2295bebe-77e6-43e2-ae94-0fe49e9a776b
Redwall Game: https://redwall-mossflower.surge.sh/
Risk Report: https://www-cdn.anthropic.com/097c63b5fe7dd8b14866e1f15bb1910ec713658a.pdf
Transformer Inventor Warning: https://x.com/tszzl/status/2064563986914554125
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
Podcast: https://aiexplainedopodcast.buzzsprout.com/ -
The ‘best’ generally available AI model just dropped, but there is plenty I bet you missed about what it is, how it performs, and what the release tells us. 15 highlights from the 244 page system card, plus private testing, leader interview and more.
AI Insiders ($9!): https://www.patreon.com/AIExplained
Chapters:
00:00 - Introduction
00:49 - Mythos in Weeks
01:49 - Adaptive not necessary
02:26 - Honesty?
04:37 - Flagging Uncertainty
04:57 - Benchmarks
08:54 - Mythos will be even better
10:30 - Business skillz
11:15 - Model Welfare
12:16 - Cyber Comparable
13:10 - Misalignment Concerns
16:22 - Meta Inabilities
17:58 - Code flagging
18:34 - Go to sleep
18:50 - Fast Mode
20:21 - Dynamic Workflows
Opus 4.8 Paper: https://cdn.sanity.io/files/4zrzovbb/website/c886650a2e96fc0925c805a1a7ca77314ccbf4a6.pdf
Release: https://www.anthropic.com/news/claude-opus-4-8
Chips: https://www.theinformation.com/articles/anthropic-talks-use-microsofts-ai-chips?rc=sy0ihq
https://www.anthropic.com/news/expanding-our-use-of-google-cloud-tpus-and-services
https://www.anthropic.com/news/higher-limits-spacex
Patreon Vid: https://www.patreon.com/posts/re-up-anthropics-159289449
GDPVal: https://artificialanalysis.ai/evaluations/omniscience
https://arxiv.org/abs/2510.04374
Amodei Technical Debt: https://www.youtube.com/watch?v=7xco5Qd2Oo8
Dynamic Workflows: https://x.com/ClaudeDevs/status/2060044853279617150
https://x.com/_catwu/status/2060054180379689074/photo/1
https://claude.com/blog/introducing-dynamic-workflows-in-claude-code
https://simple-bench.com/
Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcouncil.ai
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
Podcast: https://aiexplainedopodcast.buzzsprout.com/ -
Episodes manquant?
-
The biggest Google AI push of the year, but what is the bigger story? Why is Google pursuing a different fork in the road than OpenAI or Anthropic? What does Gemini 3.5 Flash mean for the near-term future of AI?
https://assemblyai.com/aiexplained
Plus the highlights from a provocative new paper on AI, 8 key moments you may have missed, and the signal from 5+ hours of AI lab interviews.
Check out my free to use app, code INSIDER15 for paid tiers: https://lmcouncil.ai
AI Insiders ($9!): https://www.patreon.com/AIExplained
Chapters:
00:00 - Introduction
00:38 - Vibes and Google Goal
02:18 - Omni, again?
06:57 - Taking the same road
07:44 - Gemini 3 Flash
12:37 - Pitching on Cost?
13:55 - Agentic Task Search
14:30 - 1-shot OS but jagged, negation paper
20:02 - The Karpathy Moonshot
Mostafa Deghani Interview: https://www.youtube.com/watch?v=Bo19sXssYXI
Negation Neglect Paper: https://arxiv.org/pdf/2605.13829
Gemini 3.5 Flash Headline Scores: https://deepmind.google/models/model-cards/gemini-3-5-flash/
Sors original AGI Path: https://www.theguardian.com/commentisfree/2024/feb/24/openai-video-generation-tool-sora-babies-ai-artificial-intelligence
Hassabis Helped Set-up Anthropic: https://archive.fo/20260519070857/https://www.ft.com/content/8f2a529e-7a1b-4d8e-95be-338d0c4c98f5
Intelligence to Output Speed: https://artificialanalysis.ai/models?intelligence-comparison=intelligence-vs-output-speed#intelligence
VibeCodeBench + Finance Agent: https://www.vals.ai/home
OpenAI Needs Ads: https://archive.ph/20260409123153/https://www.reuters.com/business/media-telecom/openai-projects-25-billion-ad-revenue-this-year-100-billion-by-2030-axios-2026-04-09/
Anthropic Core Views: https://www.anthropic.com/news/core-views-on-ai-safety
Karpathy Move: https://x.com/karpathy/status/2056753169888334312
https://www.axios.com/2026/05/19/anthropic-openai-karpathy-andrej-claude
Recursive Self-Improvement: https://www.patreon.com/posts/ineffably-smart-156866417
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
Podcast: https://aiexplainedopodcast.buzzsprout.com/ -
GPT 5.5 full analysis, plus DeepSeek V4 paper highlights, comparisons with Mythos, a vibe-coded game w/ GPT Image 2, and 50 data-points you wouldn’t get from just reading the headlines.
Chapters:
01:11 - GPT 5.5 Comparison
06:04 - Mythos Marketing
11:50 - Recursive Self-Improvement?
14:11 - Deepseek V4
18:03 - VibeCode Experiment Extravaganza
21:44 - The Scarce Compute Era
https://80000hours.org/aiexplained
OpenAI Benchmarks: https://openai.com/index/introducing-gpt-5-5/5.5 System Card: https://deploymentsafety.openai.com/gpt-5-5/gpt-5-5.pdf
Direct Comparison: https://pbs.twimg.com/media/HGnNm5GWEAAJ1Ob?format=jpg&name=4096x4096
DeepSeek Paper: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro
SWE Bench Pro - benchmark of choice? https://x.com/ChowdhuryNeil/status/2047416077622395025
AA Omniscience: https://artificialanalysis.ai/evaluations/omniscience
Vending Bench: https://x.com/andonlabs/status/2047377260412649967Opus 4.7 System Card: https://cdn.sanity.io/files/4zrzovbb/website/037f06850df7fbe871e206dad004c3db5fd50340.pdf
Sam Altman Drunk Phase: https://x.com/sama/with_replies
Noam Brown: https://x.com/polynoamial/status/2047387675762802998
DeepSeek Compute Crunch: https://www.bloomberg.com/news/articles/2026-04-24/deepseek-unveils-newest-flagship-a-year-after-ai-breakthrough?srnd=phx-ai
Spreadsheet Bench: https://x.com/nicochristie/status/2047476237464211721
Pattern Recognition: https://arcprize.org/leaderboard
Leader Interviews:
Core Memory: https://www.youtube.com/watch?v=NCKQL0op30E
Knowledge Podcast: https://www.youtube.com/watch?v=6JoUcQ1qmAc
Big Tech Round 1: https://www.youtube.com/watch?v=J6vYvk7R190&t=1116sBig Tech Round 2: https://www.youtube.com/watch?v=YnoQ8RJbALw&t=8s
Claude Code Limitations: https://x.com/TheAmolAvasare/status/2046724659039932830
ChatGPT 5.4 for Clinicians: https://openai.com/index/making-chatgpt-better-for-clinicians/
Image Arena: https://x.com/arena/status/2046670703311884548
VibeCode Bench: https://www.vals.ai/benchmarks/vibe-code
5.5-made Game +Seedance 2.0: https://rosemere-quest.pages.dev/
-
Claude Opus 4.7 just dropped, but behind every headline lies a deeper story. From a bonanza of benchmarks, to seeing the fruits of one of the biggest mega-projects in US history, to sneaky Mythos disclaimers, to Anthropic admitting compute restraints and, forcing lower capability of Opus 4.7. Where the new model falls behind Gemini but ahead of GPT 5.4, plus why some users are furious at Anthropic. Ending with a 9-year animus, that still affects AI today…
https://assemblyai.com/aiexplained
Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcouncil.ai
AI Insiders ($9!): https://www.patreon.com/AIExplained
Chapters:
00:00 - Introduction
00:58 - Benchmarks
05:21 - Market Share + Compute Problems
08:12 - Mythos Exclusives
12:56 - User Frustration + Claude Code Updates
14:03 - Brockman Amodei Rivalry
17:40 - OpenAI vs Anthropic Approach to Code
Claude 4.7 Opus Release Notes: https://www.anthropic.com/news/claude-opus-4-7
vs Mythos: https://pbs.twimg.com/media/HGCGugrXUAAKcHp?format=jpg&name=medium
232-page System Card: https://cdn.sanity.io/files/4zrzovbb/website/037f06850df7fbe871e206dad004c3db5fd50340.pdf
ARC-AGI 2: https://x.com/arcprize/status/2044834615417053305/photo/1
ParseBench: https://x.com/jerryjliu0/status/2044902620746363016/photo/1
GDPVal: https://artificialanalysis.ai/evaluations/gdpval-aa
Vidoc Security Replication: https://blog.vidocsecurity.com/blog/we-reproduced-anthropics-mythos-findings-with-public-models
Boris Cherny Settings: https://x.com/Hesamation/status/2043016923961577516/photo/2
User Frustration: https://x.com/RileyRalmuto/status/2044836116189069660
VibeCode Bench: https://x.com/ValsAI/status/2044791415524471099/photo/1
Verge Memo: https://www.theverge.com/ai-artificial-intelligence/911118/openai-memo-cro-ai-competition-anthropic
5.4 Cyber: https://openai.com/index/scaling-trusted-access-for-cyber-defense/
Data Centers in Absolute $: https://x.com/finmoorhouse/status/2044933442236776794/photo/1
…in % of GDP: https://pbs.twimg.com/media/HGEN8FGWQAAN7Np?format=jpg&name=4096x4096
WSJ Exclusive: https://www.wsj.com/tech/ai/the-decadelong-feud-shaping-the-future-of-ai-7075acde
Brockman Interview: https://www.youtube.com/watch?v=J6vYvk7R190
$1T Valuation: https://x.com/StefanFSchubert/status/2045039686997967082
Emotions: https://www.patreon.com/c/aiexplained/posts
https://lmcouncil.ai/benchmarks
Non-hype Newsletter: https://signaltonoise.beehiiv.com/ -
The model, the mythos, the legend. We have a new best AI model, but not all of us. How good is it, what does it’s new offensive capabilities mean? Why does it’s 244 page report card remind me of Her, and why did the creator of Claude Code call it ‘terrifying’. 30+ highlights sourced by reading the paper in full, old-school, no AI summary.
https://80000hours.org/aiexplained
Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcouncil.ai
AI Insiders ($9!): https://www.patreon.com/AIExplained
Chapters:
00:00 - Introduction
00:56 - Internal Release + Availability
02:37 - General Capabilities
05:12 - Self-improvement?
06:15 - ‘Terrifying’ Landscape
11:07 - Safety Decision
13:22 - Coding
14:49 - Alignment, Awareness
19:52 - GUI for Agents/Claws + Hallucinations
21:34 - …Emotions?
25:29 - Her connection
244-page System Card: https://www-cdn.anthropic.com/8b8380204f74670be75e81c820ca8dda846ab289.pdf
Project Glasswing: https://www.anthropic.com/glasswing
Zero-Day Details: https://red.anthropic.com/2026/mythos-preview/
Mythos ‘terrifying’: https://x.com/bcherny/status/2041605852382351666
New Yorker Altman/Amodei: https://archive.fo/20260406100412/https://www.newyorker.com/magazine/2026/04/13/sam-altman-may-control-our-future-can-he-be-trusted
Alignment Risk Update: https://www-cdn.anthropic.com/79c2d46d997783b9d2fb3241de43218158e5f25c.pdf
In a Park: https://x.com/sleepinyourhat/status/2041584808514744742
“Uhm” - https://x.com/thsottiaux/status/2041749947385815109
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
Podcast: https://aiexplainedopodcast.buzzsprout.com/ -
First look at exclusive reports about OpenAI's new Spud model, and the model Anthropic think will stir governments to urgency, all in the context of the newly-launched ARC-AGI-3. What does the extreme difficulty of that benchmarks, and its quirky scoring metrics, mean for AI in 2026?
https://assemblyai.com/aiexplained
Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcouncil.ai
AI Insiders ($9!): https://www.patreon.com/AIExplained
Chapters:
00:00 - Introduction
00:55 - OpenAI Side Quests
01:58 - Claude New Model Coming + Universal Equity?
03:13 - ARC-AGI 3
05:00 - Intentional or Unintentional Gaming?
07:11 - But is it AGI Harbinger? No Harness
09:41 - Not the First
12:32 - Automated Researcher
15:00 - Claw Caveat
Spud: https://www.theinformation.com/articles/openai-ceo-shifts-responsibilities-preps-spud-ai-model?utm_campaign=Editorial&utm_content=Article&utm_medium=organic_social&utm_source=bluesky%2Cfacebook%2Clinkedin%2Cthreads%2Ctwitter&rc=sy0ihq
FT: OpenAI Special Model: https://www.ft.com/content/de9bf0af-b241-424f-8229-5870b1c0d93d?syn-25a6b1a6=1
Jensen Huang: https://www.forbes.com/sites/antoniopequenoiv/2026/03/23/nvidias-jensen-huang-says-he-thinks-weve-achieved-agi/
Axios Article: https://archive.fo/20260326100140/https://www.axios.com/2026/03/26/anthropic-pentagon-ai-deal#selection-827.0-829.257
https://arcprize.org/arc-agi/3
ARC AGI 3 Paper: https://arcprize.org/media/ARC_AGI_3_Technical_Report.pdf
NetHack Leaderboard: https://balrogai.com/
Paper: https://ai.meta.com/research/publications/the-nethack-learning-environment/
https://x.com/_rockt/status/2036864121585438995
Claw Shells: https://x.com/DrJimFan/status/2036494601750716711
OpenAI Automated Researcher: https://www.technologyreview.com/2026/03/20/1134438/openai-is-throwing-everything-into-building-a-fully-automated-researcher/
Patreon Post: https://www.patreon.com/c/aiexplained/posts
Eng Jobs: https://x.com/lennysan/status/2036535460726767793
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
Podcast: https://aiexplainedopodcast.buzzsprout.com/ -
Just 48 hours after releasing GPT 5.3 Instant, OpenAI have released GPT 5.4 Thinking, so either their is an imminent singularity or perhaps we are being distracted from other news. This video will give 9 crucial bits of context, not just on the GPT 5.4 drop but on the background to the meltdown between the Pentagon and Anthropic. What does this say about the state of AI progress, your job, and what is next.
Check out my fast-growing (!) app, free to use, and code INSIDER15 for 15% off paid tiers: https://lmcouncil.ai
AI Insiders ($9!): https://www.patreon.com/AIExplained
Chapters:
00:00 - Introduction
01:06: GPT 5.4 Breakdown
05:06 - Closing the Loop
06:35 - Spiky Performance
10:31 - Advice
11:32 - Less Encouraging Developments - Fired Like Dogs
17:45 - But Used in Iran
GPT 5.4: https://openai.com/index/introducing-gpt-5-4/
Hallucinations: https://artificialanalysis.ai/evaluations/omniscience
Investment Banking Bench: https://x.com/bradlightcap/status/2029684672343728452
Move 37: https://x.com/nasqret/status/2029628846518010099
System Card: https://deploymentsafety.openai.com/gpt-5-4-thinking/gpt-5-4-thinking.pdf
Prediction Market Scandal: https://www.wired.com/story/openai-fires-employee-insider-trading-polymarket-kalshi/
GPT 5.3 Instant: https://openai.com/index/gpt-5-3-instant/
GDPVal: https://openai.com/index/gdpval/
Claude in Iran: https://www.washingtonpost.com/technology/2026/03/04/anthropic-ai-iran-campaign
‘Like Dogs’: https://x.com/AndrewCurran_/status/2029605783311470679
Altman leak: https://www.cnbc.com/2026/03/03/sam-altman-tells-openai-staff-operational-decisions-up-to-government.html
Original 2024 Switch: https://archive.fo/20240116172526/https://www.bloomberg.com/news/articles/2024-01-16/openai-working-with-us-military-on-cybersecurity-tools-for-veterans#selection-6173.83-6173.226
Amodei Original Memo: https://www.theinformation.com/articles/read-anthropic-ceos-memo-attacking-openais-mendacious-pentagon-announcement?rc=sy0ihq
Anthropic Apology: https://www.anthropic.com/news/where-stand-department-war
OpenAI Employee Reaction: https://x.com/tszzl/status/2029334980481212820
DoD Suppler Risk: https://www.cnbc.com/amp/2026/03/05/anthropic-pentagon-ai-claude-iran.html
Atlantic Exclusive: https://archive.fo/20260301152646/https://www.theatlantic.com/technology/2026/03/inside-anthropics-killer-robot-dispute-with-the-pentagon/686200/#selection-941.61-941.212
No Negotiation: https://x.com/USWREMichael/status/2029754965778907493
$20B Doubling: https://archive.ph/20260304111124/https://www.bloomberg.com/news/articles/2026-03-03/anthropic-nears-20-billion-revenue-run-rate-amid-pentagon-feud
March 2022 Interview: https://www.youtube.com/watch?v=uAA6PZkek4A
https://lmcouncil.ai/
Non-hype Newsletter: https://signaltonoise.beehiiv.com/ -
Will Anthropic be forced to make a version of Claude for war? And does a new paper expose the risks of Claude agents, in both OpenClaw and the field of war? Plus, 5 more twists in the story of the Pentagon versus Anthropic + some AI lab employees, and a petition that could change everything, or nothing...
Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcouncil.ai
AI Insiders ($9!): https://www.patreon.com/AIExplained
Chapters:
00:00 - Introduction
00:44 - Deadline Day + Petition
02:42 - Twist 1: Existing Deal
03:26 - Twist 2: Existing Policy
04:21 - Twist 3: Twin Threats
05:54 - Twist 4: Interesting Objections
11:32 - Twist 5: Anthropic’s Dropped Policy
Dario Statement: https://www.anthropic.com/news/statement-department-of-war
Google/OpenAI Petition: https://notdivided.org/
Axios on Amodei Rejection: https://www.axios.com/2026/02/26/anthropic-rejects-pentagon-ai-terms
FT on US Threat: https://www.ft.com/content/11d27612-d6c5-4cf7-94dd-f65603549b7f
Politico on Latest: https://archive.ph/20260227013117/https://www.politico.com/news/2026/02/26/incoherent-hegseths-anthropic-ultimatum-confounds-ai-policymakers-00800135
The Verge on Current Deal: https://www.theverge.com/ai-artificial-intelligence/883456/anthropic-pentagon-department-of-defense-negotiations
Anthropic RSP change: https://www.anthropic.com/news/responsible-scaling-policy-v3
Time Magazine on RSP: https://time.com/7380854/exclusive-anthropic-drops-flagship-safety-pledge/
Agent of Chaos Paper: https://x.com/NatalieShapira/status/2026062499599319526
AI Agent Reliability Paper: https://arxiv.org/pdf/2602.16666
My Patreon Video: https://www.patreon.com/posts/real-mystery-ai-151647211
Patreon Documentary: https://www.patreon.com/posts/our-new-age-of-133960279
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
Podcast: https://aiexplainedopodcast.buzzsprout.com/ -
Do we have a new best AI model, or do we have the downfall of benchmarks in general, as a way of capturing machine intelligence? Full breakdown of Gemini 3.1 Pro, guest-starring the new Sonnet 4.6, plus analysis from 7 papers/posts that will give you much needed context. Oh, and a new record on Simple Bench!
https://epoch.ai/ai-explained-datacenters
Check out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.ai
AI Insiders ($9!): https://www.patreon.com/AIExplained
Chapters:
00:00 - Introduction
00:30 - Post-training Dominance
04:00 - ARC-AGI 2 Caveat
05:54 - Simple Bench Record
08:22 - Hallucination Caveat
10:05 - Model Card
11:12 - Exponential Coming
12:20 - Amodei on Generalizing
15:10 - One True Benchmark?
17:02 - Other Metrics…
Gemini 3.1 Model Card: https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-1-Pro-Model-Card.pdf
Release: https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro/
Where are Agents deployed?: https://www.anthropic.com/research/measuring-agent-autonomy
Newsletter Post: https://signaltonoise.beehiiv.com/p/4-ai-numbers-that-surprised-me-this-week
Hallucination AA: https://artificialanalysis.ai/evaluations/omniscience
Melanie Mitchell: https://x.com/MelMitchell1/status/2022738363548340526
ARC-AGI-2: https://x.com/arcprize/status/2024522812728496470/photo/1
Chollet on Agentic Coding and ML: https://x.com/fchollet/status/2024519439140737442
METR Caveat: https://metr.org/notes/2026-01-22-time-horizon-limitations/
Talaas Fast: https://chatjimmy.ai/
Amodei Interview Continual learning: https://www.dwarkesh.com/p/dario-amodei-2?open=false#%C2%A7002942-is-continual-learning-necessary-how-will-it-be-solved
Metaculus FutureEval: https://www.metaculus.com/futureeval/
Next Vid to Watch: https://www.patreon.com/posts/what-you-need-to-150647292
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
Podcast: https://aiexplainedopodcast.buzzsprout.com/ -
The two models that you will hear discussed for at least the next two months - Claude Opus 4.6 and GPT 5.3 Codex - just got released within 26 mins or each other. The full breakdown of around 250 pages of reports, with just the most interest moments, from the battle of which is best, Claude personhood, the surprising misbehaviour of Opus 4.6, and much more
https://assemblyai.com/aiexplained
Check out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.ai
AI Insiders ($9): https://www.patreon.com/AIExplained
Chapters:
00:00 - Introduction
00:54 - Self-improvement?
02:44 - Knowledge Work
05:30 - Overly agentic behaviour
09:12 - Who Shouldn’t Use Claude Opus
11:39 - Step-change?
15:09 - Claude’s ‘Personhood’
Hassabis Roadmap: https://www.patreon.com/posts/hassabis-roadmap-149750869
Release of Opus 4.6: https://www.anthropic.com/news/claude-opus-4-6
212 Page System Card: https://www-cdn.anthropic.com/0dd865075ad3132672ee0ab40b05a53f14cf5288.pdf
Claude Code Tip: https://x.com/bcherny/status/2019475897691124107
GPT Codex 5.3: https://openai.com/index/introducing-gpt-5-3-codex/System Card: https://openai.com/index/gpt-5-3-codex-system-card/
Browse Comp: https://arxiv.org/pdf/2504.12516v1
Finance Agent: https://www.vals.ai/benchmarks/finance_agent
Terminal Bench 2: https://arxiv.org/pdf/2601.11868
Vending Bench: https://andonlabs.com/blog/opus-4-6-vending-bench
My X post: https://x.com/AIExplainedYT/status/2016851303436095647
Anthropic Apology: https://x.com/ch402/status/2014066134194995256/photo/1
Altman rebuttal: https://x.com/sama/status/2019139174339928189
https://x.com/sama/status/2019140276246442089
4% of GitHub: https://x.com/dylan522p/status/2019490550911766763
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
Podcast: https://aiexplainedopodcast.buzzsprout.com/ -
Anthropic's CEO, who has consistently predicted transformative AI will arrive before 2030, recently published a nearly 20,000-word essay outlining his vision of where AI is heading. The video gives you the highlights. The essay argues that scaling and recursion will advance AI from coding automation to full engineering automation, while warning of economic displacement within 1-2 years and China's trajectory toward AI-enabled totalitarianism. Additionally, Dario Amodei predicts that AI models will increasingly be understood as collections of distinct personas rather than monolithic systems.
80,000 Hours: https://www.youtube.com/watch?v=B54EQiuO1UU
Check out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.ai
AI Insiders ($9!): https://www.patreon.com/AIExplained
Chapters:
00:00 - Introduction
01:10 - Scaling to software engineers
06:11 - Permanent Underclass
10:18 - Totalitarian Nightmares
16:38 - Collection of Personas
Essay: https://www.darioamodei.com/essay/the-adolescence-of-technology
Physics Prediction: https://www.quantamagazine.org/is-particle-physics-dead-dying-or-just-hard-20260126/
Axios: https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropic
World GDP: https://data.worldbank.org/indicator/NY.GDP.MKTP.KD.ZG?end=2024&start=1961&view=chart
Demis Hassabis Counter: https://www.youtube.com/watch?v=q6fq4_uP7aM
Karpathy 80%: https://x.com/karpathy/status/2015883857489522876
Machines of Loving Grace: https://www.darioamodei.com/essay/machines-of-loving-grace
Anthropic LessWrong: https://www.lesswrong.com/posts/5aKRshJzhojqfbRyo/unless-its-governance-changes-anthropic-is-untrustworthy#1__In_private__Dario_frequently_said_he_won_t_push_the_frontier_of_AI_capabilities__later__Anthropic_pushed_the_frontier
Original Constitution: https://www.anthropic.com/news/claudes-constitution
New Constitution: https://www.anthropic.com/constitution
Kimi K2.5: https://x.com/Kimi_Moonshot/status/2016024049869324599
Societies of Thought, Google DeepMind Paper: https://arxiv.org/pdf/2601.10825
https://lmcouncil.ai/benchmarks
https://www.patreon.com/posts/our-new-age-of-133960279
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
Podcast: https://aiexplainedopodcast.buzzsprout.com/ -
A new tool, with code written by an AI model, has gone omega-viral: Claude Cowork. But is the hype justified? What do the stats say on productivity? Where is the truth in a sea of noise? What is truth? Can we handle the truth? Where's Nemo?
https://matsprogram.org/s26-aie
Check out my new app! https://lmcouncil.ai
AI Insiders ($9!): https://www.patreon.com/AIExplained
Chapters:
00:00 - Introduction
01:12 - Claude Cowork
06:48 - Productivity Speed-up + jobs
09:33 - Comparing Models
12:00 - Brittle AI Paper
Cowork Intro: https://x.com/claudeai/thread/2010805682434666759
'All of it': https://x.com/bcherny/status/2010813886052581538
'AGI' Claims: https://x.com/deepfates/status/2004994698335879383
Douglas Interview: https://www.youtube.com/watch?v=TOsNrV3bXtQ&t=2313s
Job Stats: https://www.oxfordeconomics.com/wp-content/uploads/2026/01/Evidence-of-an-AI-driven-shakeup-of-job-markets-is-patchy.pdf
Amodei Prediction: https://fortune.com/2025/05/28/anthropic-ceo-warning-ai-job-loss/
GenAI Traffic: https://x.com/demishassabis/status/2009075877347512545
Illusion of Insight: https://arxiv.org/pdf/2601.00514
Entropy Exploration: https://arxiv.org/pdf/2506.14758
ProRL: https://arxiv.org/pdf/2505.24864
Genesis Mission: https://www.whitehouse.gov/presidential-actions/2025/11/launching-the-genesis-mission/
https://deepmind.google/blog/how-were-supporting-better-tropical-cyclone-prediction-with-ai/
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
Podcast: https://aiexplainedopodcast.buzzsprout.com/ -
It’s probably not possible to satisfactorily condense a 12 month’s worth of weird progress in AI, as well as predictions for the year to come, into one video. But I’m gonna try anyway because it has been a very strange time.
http://matsprogram.org/s26-aie
My new app! https://lmcouncil.ai
Patreon Interview: https://www.patreon.com/posts/robot-in-your-27-146376094
Chapters:
00:00 - Introduction
00:34 - Reasoning Models … and limits
02:54 - A playable world
03:36 - Realism
03:50 - AI Slop gone mainstream
05:03 - DolphinGemma
05:39 - Public Mood
07:34 - AI Enlisted
08:30 - GPT-5
11:05 - Open Weight not out
13:00 - METR Breakout
17:30 - VASA-1
18:28 - Lateral Productivity
20:15 - 1 or 1000 benchmarks needed?
24:54 - Continual Learning + Altman on Superintelligence
28:08 - Automated Information Discovery ft AlphaEvolve
Hassabis on Generality: https://x.com/demishassabis/status/2003097405026193809
https://www.youtube.com/watch?v=PqVbypvxDto
Gemini 3: https://storage.googleapis.com/gweb-uniblog-publish-prod/original_images/gemini_3_table_final_HLE_Tools_on.gif
Reasoning Trade-offs: https://arxiv.org/pdf/2504.13837
DolphinGemma: https://blog.google/technology/ai/dolphingemma/?s=09
Genie 3: https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/
METR Time Horizon: https://arxiv.org/pdf/2503.14499
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
Flaws: https://x.com/ShashwatGoel7/status/2002369517499105443
https://shash42.substack.com/p/how-to-game-the-metr-plot
https://x.com/METR_Evals/status/2002203627377574113
GPT-5 - Altman phd in everything: https://edition.cnn.com/2025/08/14/business/chatgpt-rollout-problems
https://simple-bench.com/
AI Slop: https://www.youtube.com/watch?v=I_3vxoJDD9k
https://www.theguardian.com/technology/2025/dec/16/boost-for-artists-in-ai-copyright-battle-as-only-3-per-cent-back-uk-active-opt-out-plan
Survey: https://x.com/SearchlightInst/status/2001057144842387920/photo/1
Nvidia Nemotron: https://x.com/percyliang/status/2000608134205985169
OpenAI Compute Flywheel: https://x.com/OpenAI/status/2001363007209914399/photo/1
Altman Interview: https://www.youtube.com/watch?v=2P27Ef-LLuQ
AI in Govt: https://x.com/jdcmedlock/status/1939814516503847259
Benchmark Gaming: https://techcrunch.com/2025/04/07/meta-exec-denies-the-company-artificially-boosted-llama-4s-benchmark-scores/
AlphaEvolve: https://deepmind.google/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdf?utm_source=deepmind.google&utm_medium=referral&utm_campaign=gdm&utm_content=
Continual Learning: https://abehrouz.github.io/files/NL.pdf
Job Risk: https://archive.ph/20250708204527/https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropic
GPT4o: https://x.com/AISafetyMemes/status/1916889492172013989
Vasa-1: https://www.microsoft.com/en-us/research/project/vasa-1/
Three Views: https://www.lesswrong.com/posts/K2D45BNxnZjdpSX2j/ai-timelines
Turing Test: https://x.com/tunguz/status/1907185471211422147
Karpathy Year in Review: https://karpathy.bearblog.dev/year-in-review-2025/
LLM Brainrot: https://arxiv.org/pdf/2510.13928
Lateral Productivity: https://www.aisi.gov.uk/frontier-ai-trends-report
Emotional Quotient: https://arxiv.org/pdf/2511.08394
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
Podcast: https://aiexplainedopodcast.buzzsprout.com/
AI Insiders ($9!): https://www.patreon.com/AIExplained -
The condensed highlights of hours of AI lab leader interviews, model releases, Gemini 3 Flash insights (plus it’s hidden flaw), Hassabis’ ‘proto-AGI’ and much more…
https://matsprogram.org/apply?utm_source=ai-explained&utm_medium=youtube&utm_campaign=s26
Also, do check out my new app: https://lmcouncil.ai
Chapters:
00:00 - Introduction
00:50 - Results
02:44 - But… the Flaw
04:49 - So Benchmarks are fake? No
07:37 - Spatial Reasoning + Hassabis
10:06 - Proto-AGI
12:07 - Minimal AGI
15:07 - Compute Slowdown
17:56 - New Data Paradigm
Gemini 3 Flash: https://deepmind.google/models/gemini/flash/
Hassabis Interview: https://www.youtube.com/watch?v=PqVbypvxDto
Legg Interview: https://www.youtube.com/watch?v=l3u_FAv33G0
Pre-training Lead Interview: https://www.youtube.com/watch?v=cNGDAqFXvew
Altman Interview: https://www.youtube.com/watch?v=2P27Ef-LLuQ
Brockman Video: https://x.com/OpenAI/status/2001336514786017417
Post-Training Reveal: https://x.com/OfficialLoganK/status/2001742530472534442
Hallucinations Paper: https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf
Patreon Hallucinations Vid: https://www.patreon.com/posts/blockers-to-and-139264812
AA-Omniscience Benchmark: https://artificialanalysis.ai/evaluations/omniscience
https://arxiv.org/pdf/2511.13029
lmcouncil.ai/benchmarks
https://simple-bench.com/
https://x.com/scaling01/status/1999620587744813205
5.2 Codex Drop: https://cdn.openai.com/pdf/ac7c37ae-7f4c-4442-b741-2eabdeaf77e0/oai_5_2_Codex.pdf
OpenAI Compute Trend: https://www.theinformation.com/articles/openais-350-billion-computing-cost-problem?rc=sy0ihq
Cramer Tweet/Response: https://x.com/BorisMPower/status/2001440650210976018
OpenAI Valuation: https://www.theinformation.com/articles/openai-discussed-raising-tens-billions-valuation-around-750-billion?rc=sy0ihq
Indian Data: https://www.reuters.com/world/india/with-freebies-openai-google-vie-indian-users-training-data-2025-12-17/
TheInformation Data: https://x.com/theinformation/status/2001421225751351778
Genie 3: https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/
Sima 2: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/
Veo 3.1: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/
METR: https://metr.org/blohttps://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
AI Insiders ($9!): https://www.patreon.com/AIExplained
Non-hype Newsletter: https://signaltonoise.beehiiv.com/ -
Full GPT-5.2 breakdown - did OpenAI reclaim the crown? A story of tokens, time and cost, plus 9 details you wouldn’t get just from reading the headlines.
https://www.youtube.com/@eightythousandhours
AI Insiders ($9!): https://www.patreon.com/AIExplained
https://lmcouncil.ai
Chapters:
00:00 - Introduction
00:55 - Better than Human @ Professional Tasks?
04:42 - Test time Compute
07:05 - Benchmark Selection
09:32 - Simple Results + council comparison
13:01 - Long Context
13:52 - Self-Improvement
15:00 - 10 Years + New Models
Release Page: https://openai.com/index/introducing-gpt-5-2/
GPT 5.2 Benchmark Comparison: https://www.reddit.com/r/singularity/comments/1pka1y9/gpt52_all_20_benchmarks_rankings_and_pricing/
https://storage.googleapis.com/gweb-uniblog-publish-prod/original_images/gemini_3_table_final_HLE_Tools_on.gif
https://lmcouncil.ai/benchmarks
Charxiv: https://charxiv.github.io/#leaderboard
GDPval: https://arxiv.org/pdf/2510.04374
My vid: https://www.youtube.com/watch?v=oK5LxMaROSA
Kilpatrick: https://x.com/OfficialLoganK/status/1999270402712023158/photo/1
Noam Brown: https://x.com/polynoamial/status/1999189845164667132
New Model in New Year: https://www.theinformation.com/articles/openai-developing-garlic-model-counter-googles-recent-gains?rc=sy0ihq
10 Years of OpenAI: https://openai.com/index/ten-years/
GPQA: https://x.com/idavidrein/status/1841265634170278063
ARC-AGI 1-2: https://arcprize.org/arc-agi/2/
Sunday Robotics: https://x.com/tonyzzhao/status/1991204839578300813
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
https://lmcouncil.ai -
With headlines of an imminent job apocalypse, code red for ChatGPT and recursive self-improvement, at the same time as Anthropic's CEO yesterday saying we know how to scale to AGI, and Gemini 3 DeepThink out today, it is easy to get lost among the narratives and counter-narratives. So here are both, plus the facts behind them, for you to decide.
https://epoch.ai/data/data-centers
Epoch AI is the sponsor of today’s video, and my views, and those expressed in this video, do not necessarily reflect Epoch AI’s views in any way.
Chapters:
00:00 - Introduction
00:42 - Job Apocalypse?
01:45 - Scaling to AGI
04:15 - Recursive Self-Improvement Needed, or Not
09:57 - OpenAI Code Red vs Gemini 3 DeepThink vs Claude Opus 4.5
13:27 - DeepSeek Speciale vs Mistral Large v3
16:45 - Claude Soul Document
https://lmcouncil.ai/
AI Insiders ($9!): https://www.patreon.com/AIExplained
Guardian Interview: https://www.theguardian.com/technology/ng-interactive/2025/dec/02/jared-kaplan-artificial-intelligence-train-itself
MIT Study on Jobs/Tasks: https://iceberg.mit.edu/report.pdf
vs https://www.cnbc.com/2025/11/26/mit-study-finds-ai-can-already-replace-11point7percent-of-us-workforce.html
Amodei on Scaling: https://www.youtube.com/watch?v=FEj7wAjwQIk
Claude Soul Document: https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5-opus-soul-document
Capabilities Original Stance: https://www.anthropic.com/news/core-views-on-ai-safety
Ilya Interview: https://www.dwarkesh.com/p/ilya-sutskever-2
Ricursive Intelligence: https://x.com/RicursiveAI/status/1995932204703346946
Economist Worker Usage of GenAI: https://www.economist.com/finance-and-economics/2025/11/26/investors-expect-ai-use-to-soar-thats-not-happening#selection-1409.94-1413.42
Mistral v3 Large: https://docs.mistral.ai/models/mistral-large-3-25-12
Compute Slowdown Paper: https://joel-becker.com/images/publications/forecasting_time_horizon_under_compute_slowdown.pdf
https://x.com/joel_bkr/status/1993023436541903155
METR Chart: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
https://www.theinformation.com/articles/openais-350-billion-computing-cost-problem?rc=sy0ihq
OpenAI Code Red: https://www.anthropic.com/news/core-views-on-ai-safety
Rocket Company: https://www.independent.co.uk/news/world/americas/sam-altman-rocket-elon-musk-spacex-b2878351.html
DeepSeek Paper: https://arxiv.org/html/2512.02556v1
DeepSeek Crowdstrike CCP: https://www.crowdstrike.com/en-us/blog/crowdstrike-researchers-identify-hidden-vulnerabilities-ai-coded-software/
https://simple-bench.com/
Patreon Post: https://www.patreon.com/c/aiexplained/posts
Robot: https://x.com/jloganolson/status/1985850115379351799 -
Gemini 3 Pro is out, and records fell like snowflakes in Svalbard.
No long description, chapters or links today, huge technical difficulties, including with audio, so just want to publish asap.
https://app.grayswan.ai/ai-explained
https://lmcouncil.ai
AI Insiders ($9!): https://www.patreon.com/AIExplained
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
Podcast: https://aiexplainedopodcast.buzzsprout.com/ -
A lot just got released in the last 36 hours, and it will all affect hundreds of millions of people. 10 details you would miss if you just read the headlines, from GPT 5.1 regressions, to how Claude hacked Govt Agencies, to SIMA 2, and Musical Turing Tests.
https://assemblyai.com/aiexplained
Chapters:
00:00 - Introduction00:56 - GPT 5.1 Smarter?
01:47 - Some Regressions
03:22 - Sycophancy?
05:22 - Claude Auto-Hacking
06:16 - Jailbreaking through Granularity
08:22 - This Will be Re-used
09:30 - Hallucinating Hacker
09:57 - Surprisingly Neutral Tone
12:18 - SIMA 2
14:10 - Alpha Parallels
17:24 - AI Music
GPT 5.1 Announcement: https://openai.com/index/gpt-5-1/
System Card: https://cdn.openai.com/pdf/4173ec8d-1229-47db-96de-06d87147e07e/5_1_system_card.pdf
Benchmarks: https://openai.com/index/gpt-5-1-for-developers/
Simple Bench: https://lmcouncil.ai/benchmarks
Auto-Hacking: https://x.com/AnthropicAI/status/1989033793190277618
https://www.anthropic.com/news/disrupting-AI-espionage
Report: https://assets.anthropic.com/m/ec212e6566a0d47/original/Disrupting-the-first-reported-AI-orchestrated-cyber-espionage-campaign.pdf
Sima 2 Announcement: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/
https://x.com/amoufarek/status/1988986075331858693
Scepticism: https://www.technologyreview.com/2025/11/13/1127921/google-deepmind-is-using-gemini-to-train-agents-inside-goat-simulator-3/
Voyager: https://voyager.minedojo.org/
Reuters Music: https://www.reuters.com/legal/litigation/are-you-listening-bots-survey-shows-ai-music-is-virtually-undetectable-2025-11-12/
-
Don’t let headlines about bubbles distract you from the real avenues of progress being explored in AI every week, including what had been thought to be a long-term blocker - continual learning (learning on the fly).
https://app.grayswan.ai/ai-explained
This, plus models introspecting (hesitate before you berate), Nano Banana 2 possibly spotted, Chinese imagen and more.
AI Insiders ($9!): https://www.patreon.com/AIExplained
Chapters:
00:00 - Introduction
01:26 - Continual Learning (Nested Learning / HOPE)
07:00 - Introspection
10:54 - Image-Gen Progress
Nested Learning Post: https://research.google/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/
Nested Learning Paper: https://abehrouz.github.io/files/NL.pdf
Original Titans Paper: https://arxiv.org/pdf/2501.00663
Siri News: https://www.bloomberg.com/news/articles/2025-11-05/apple-plans-to-use-1-2-trillion-parameter-google-gemini-model-to-power-new-siri
Introspection: https://www.anthropic.com/research/introspection
Full Paper: https://transformer-circuits.pub/2025/introspection/index.html#mechanisms
Earlier Work: https://www.anthropic.com/research/mapping-mind-language-model
https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html
Release Post: https://x.com/AnthropicAI/status/1983584136972677319
https://lmcouncil.ai
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
Podcast: https://aiexplainedopodcast.buzzsprout.com/ - Montre plus