Episodes

  • Fable 5 is out - and it’s good, very good. But beyond the splashy demos, I want to bring you the 20+ nuggets from the 319 page system card, which I read in full, all day, plus benchmarks you may not have noticed.

    https://assemblyai.com/aiexplained

    Plus two worrying trends inside the ‘mind’ of Claude, how OpenAI counter, and the transformer inventor’s warning.



    Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcouncil.ai

    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    01:06 - Blocks + Better Models
    02:42 - Fable 5 Upgrade over Mythos Preview
    04:49 - ML Acceleration Bombshell
    07:11 - No RSI yet
    07:41 - Bio-capable
    14:51 - Creative Writing … no
    17:23 - Does need bug-checks
    18:57 - OpenAI Response
    19:23 - Benchmark Bonanza
    28:06 - Chain of Thought worrying trend

    Fable 5 Release: https://www.anthropic.com/news/claude-fable-5-mythos-5

    System Card: https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c342ee809620.pdf

    Intelligence Explosion: https://www.patreon.com/posts/anthropic-charts-160231656

    Annotated: https://x.com/Miles_Brundage/status/2064500190523113816/photo/1

    OpenAI Counter: https://x.com/thsottiaux/status/2064572118264913923
    https://x.com/thsottiaux/status/2043177597434306699

    Double Lifespan: https://darioamodei.com/essay/machines-of-loving-grace

    AutomationBench: https://zapier.com/benchmarks
    Vending Bench: https://x.com/andonlabs/status/2064429817530085804
    CritPt: https://critpt.com/
    Riemann Bench: https://surgehq.ai/leaderboards/riemann-bench
    GDPVal: https://artificialanalysis.ai/evaluations/gdpval-aa
    BluePrint Bench 2: https://andonlabs.com/evals/blueprint-bench-2
    MCP Atlas: https://labs.scale.com/leaderboard/mcp_atlas
    FutureSim: https://x.com/nikhilchandak29/status/2064676801440358774

    Roon Stun Lock: https://x.com/tszzl/status/2064454617568874669

    Noam Brown Inference Ceiling: https://x.com/polynoamial/status/2064210146558136827

    Isochronic Chart: https://isochronic-passage-chart.netlify.app/#nyc
    Rose Tavern: https://claude.ai/public/artifacts/2295bebe-77e6-43e2-ae94-0fe49e9a776b
    Redwall Game: https://redwall-mossflower.surge.sh/

    Risk Report: https://www-cdn.anthropic.com/097c63b5fe7dd8b14866e1f15bb1910ec713658a.pdf

    Transformer Inventor Warning: https://x.com/tszzl/status/2064563986914554125



    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

  • The ‘best’ generally available AI model just dropped, but there is plenty I bet you missed about what it is, how it performs, and what the release tells us. 15 highlights from the 244 page system card, plus private testing, leader interview and more.

    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    00:49 - Mythos in Weeks
    01:49 - Adaptive not necessary
    02:26 - Honesty?
    04:37 - Flagging Uncertainty
    04:57 - Benchmarks
    08:54 - Mythos will be even better
    10:30 - Business skillz
    11:15 - Model Welfare
    12:16 - Cyber Comparable
    13:10 - Misalignment Concerns
    16:22 - Meta Inabilities
    17:58 - Code flagging
    18:34 - Go to sleep
    18:50 - Fast Mode
    20:21 - Dynamic Workflows


    Opus 4.8 Paper: https://cdn.sanity.io/files/4zrzovbb/website/c886650a2e96fc0925c805a1a7ca77314ccbf4a6.pdf

    Release: https://www.anthropic.com/news/claude-opus-4-8


    Chips: https://www.theinformation.com/articles/anthropic-talks-use-microsofts-ai-chips?rc=sy0ihq
    https://www.anthropic.com/news/expanding-our-use-of-google-cloud-tpus-and-services
    https://www.anthropic.com/news/higher-limits-spacex

    Patreon Vid: https://www.patreon.com/posts/re-up-anthropics-159289449

    GDPVal: https://artificialanalysis.ai/evaluations/omniscience
    https://arxiv.org/abs/2510.04374

    Amodei Technical Debt: https://www.youtube.com/watch?v=7xco5Qd2Oo8

    Dynamic Workflows: https://x.com/ClaudeDevs/status/2060044853279617150
    https://x.com/_catwu/status/2060054180379689074/photo/1
    https://claude.com/blog/introducing-dynamic-workflows-in-claude-code

    https://simple-bench.com/


    Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcouncil.ai

    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

  • Episodes manquant?

    Cliquez ici pour raffraichir la page manuellement.

  • The biggest Google AI push of the year, but what is the bigger story? Why is Google pursuing a different fork in the road than OpenAI or Anthropic? What does Gemini 3.5 Flash mean for the near-term future of AI?

    https://assemblyai.com/aiexplained

    Plus the highlights from a provocative new paper on AI, 8 key moments you may have missed, and the signal from 5+ hours of AI lab interviews.



    Check out my free to use app, code INSIDER15 for paid tiers: https://lmcouncil.ai

    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    00:38 - Vibes and Google Goal
    02:18 - Omni, again?
    06:57 - Taking the same road
    07:44 - Gemini 3 Flash
    12:37 - Pitching on Cost?
    13:55 - Agentic Task Search
    14:30 - 1-shot OS but jagged, negation paper
    20:02 - The Karpathy Moonshot

    Mostafa Deghani Interview: https://www.youtube.com/watch?v=Bo19sXssYXI

    Negation Neglect Paper: https://arxiv.org/pdf/2605.13829

    Gemini 3.5 Flash Headline Scores: https://deepmind.google/models/model-cards/gemini-3-5-flash/

    Sors original AGI Path: https://www.theguardian.com/commentisfree/2024/feb/24/openai-video-generation-tool-sora-babies-ai-artificial-intelligence

    Hassabis Helped Set-up Anthropic: https://archive.fo/20260519070857/https://www.ft.com/content/8f2a529e-7a1b-4d8e-95be-338d0c4c98f5

    Intelligence to Output Speed: https://artificialanalysis.ai/models?intelligence-comparison=intelligence-vs-output-speed#intelligence

    VibeCodeBench + Finance Agent: https://www.vals.ai/home

    OpenAI Needs Ads: https://archive.ph/20260409123153/https://www.reuters.com/business/media-telecom/openai-projects-25-billion-ad-revenue-this-year-100-billion-by-2030-axios-2026-04-09/

    Anthropic Core Views: https://www.anthropic.com/news/core-views-on-ai-safety

    Karpathy Move: https://x.com/karpathy/status/2056753169888334312
    https://www.axios.com/2026/05/19/anthropic-openai-karpathy-andrej-claude

    Recursive Self-Improvement: https://www.patreon.com/posts/ineffably-smart-156866417



    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

  • GPT 5.5 full analysis, plus DeepSeek V4 paper highlights, comparisons with Mythos, a vibe-coded game w/ GPT Image 2, and 50 data-points you wouldn’t get from just reading the headlines.

    Chapters:

    01:11 - GPT 5.5 Comparison

    06:04 - Mythos Marketing

    11:50 - Recursive Self-Improvement?

    14:11 - Deepseek V4

    18:03 - VibeCode Experiment Extravaganza

    21:44 - The Scarce Compute Era



    https://80000hours.org/aiexplained


    OpenAI Benchmarks: https://openai.com/index/introducing-gpt-5-5/

    5.5 System Card: https://deploymentsafety.openai.com/gpt-5-5/gpt-5-5.pdf

    Direct Comparison: https://pbs.twimg.com/media/HGnNm5GWEAAJ1Ob?format=jpg&name=4096x4096

    DeepSeek Paper: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro

    SWE Bench Pro - benchmark of choice? https://x.com/ChowdhuryNeil/status/2047416077622395025


    AA Omniscience: https://artificialanalysis.ai/evaluations/omniscience

    Vending Bench: https://x.com/andonlabs/status/2047377260412649967

    Opus 4.7 System Card: https://cdn.sanity.io/files/4zrzovbb/website/037f06850df7fbe871e206dad004c3db5fd50340.pdf

    Sam Altman Drunk Phase: https://x.com/sama/with_replies

    Noam Brown: https://x.com/polynoamial/status/2047387675762802998

    DeepSeek Compute Crunch: https://www.bloomberg.com/news/articles/2026-04-24/deepseek-unveils-newest-flagship-a-year-after-ai-breakthrough?srnd=phx-ai

    Spreadsheet Bench: https://x.com/nicochristie/status/2047476237464211721

    Pattern Recognition: https://arcprize.org/leaderboard

    Leader Interviews:

    Core Memory: https://www.youtube.com/watch?v=NCKQL0op30E

    Knowledge Podcast: https://www.youtube.com/watch?v=6JoUcQ1qmAc
    Big Tech Round 1: https://www.youtube.com/watch?v=J6vYvk7R190&t=1116s

    Big Tech Round 2: https://www.youtube.com/watch?v=YnoQ8RJbALw&t=8s

    Claude Code Limitations: https://x.com/TheAmolAvasare/status/2046724659039932830

    ChatGPT 5.4 for Clinicians: https://openai.com/index/making-chatgpt-better-for-clinicians/

    Image Arena: https://x.com/arena/status/2046670703311884548

    VibeCode Bench: https://www.vals.ai/benchmarks/vibe-code

    5.5-made Game +Seedance 2.0: https://rosemere-quest.pages.dev/

  • Claude Opus 4.7 just dropped, but behind every headline lies a deeper story. From a bonanza of benchmarks, to seeing the fruits of one of the biggest mega-projects in US history, to sneaky Mythos disclaimers, to Anthropic admitting compute restraints and, forcing lower capability of Opus 4.7. Where the new model falls behind Gemini but ahead of GPT 5.4, plus why some users are furious at Anthropic. Ending with a 9-year animus, that still affects AI today…

    https://assemblyai.com/aiexplained



    Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcouncil.ai

    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    00:58 - Benchmarks
    05:21 - Market Share + Compute Problems
    08:12 - Mythos Exclusives
    12:56 - User Frustration + Claude Code Updates
    14:03 - Brockman Amodei Rivalry
    17:40 - OpenAI vs Anthropic Approach to Code

    Claude 4.7 Opus Release Notes: https://www.anthropic.com/news/claude-opus-4-7
    vs Mythos: https://pbs.twimg.com/media/HGCGugrXUAAKcHp?format=jpg&name=medium

    232-page System Card: https://cdn.sanity.io/files/4zrzovbb/website/037f06850df7fbe871e206dad004c3db5fd50340.pdf

    ARC-AGI 2: https://x.com/arcprize/status/2044834615417053305/photo/1

    ParseBench: https://x.com/jerryjliu0/status/2044902620746363016/photo/1

    GDPVal: https://artificialanalysis.ai/evaluations/gdpval-aa

    Vidoc Security Replication: https://blog.vidocsecurity.com/blog/we-reproduced-anthropics-mythos-findings-with-public-models

    Boris Cherny Settings: https://x.com/Hesamation/status/2043016923961577516/photo/2

    User Frustration: https://x.com/RileyRalmuto/status/2044836116189069660

    VibeCode Bench: https://x.com/ValsAI/status/2044791415524471099/photo/1

    Verge Memo: https://www.theverge.com/ai-artificial-intelligence/911118/openai-memo-cro-ai-competition-anthropic

    5.4 Cyber: ​​https://openai.com/index/scaling-trusted-access-for-cyber-defense/

    Data Centers in Absolute $: https://x.com/finmoorhouse/status/2044933442236776794/photo/1

    …in % of GDP: https://pbs.twimg.com/media/HGEN8FGWQAAN7Np?format=jpg&name=4096x4096

    WSJ Exclusive: https://www.wsj.com/tech/ai/the-decadelong-feud-shaping-the-future-of-ai-7075acde

    Brockman Interview: https://www.youtube.com/watch?v=J6vYvk7R190

    $1T Valuation: https://x.com/StefanFSchubert/status/2045039686997967082

    Emotions: https://www.patreon.com/c/aiexplained/posts

    https://lmcouncil.ai/benchmarks


    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

  • The model, the mythos, the legend. We have a new best AI model, but not all of us. How good is it, what does it’s new offensive capabilities mean? Why does it’s 244 page report card remind me of Her, and why did the creator of Claude Code call it ‘terrifying’. 30+ highlights sourced by reading the paper in full, old-school, no AI summary.

    https://80000hours.org/aiexplained


    Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcouncil.ai

    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    00:56 - Internal Release + Availability
    02:37 - General Capabilities
    05:12 - Self-improvement?
    06:15 - ‘Terrifying’ Landscape
    11:07 - Safety Decision
    13:22 - Coding
    14:49 - Alignment, Awareness
    19:52 - GUI for Agents/Claws + Hallucinations
    21:34 - …Emotions?
    25:29 - Her connection

    244-page System Card: https://www-cdn.anthropic.com/8b8380204f74670be75e81c820ca8dda846ab289.pdf

    Project Glasswing: https://www.anthropic.com/glasswing
    Zero-Day Details: https://red.anthropic.com/2026/mythos-preview/

    Mythos ‘terrifying’: https://x.com/bcherny/status/2041605852382351666

    New Yorker Altman/Amodei: https://archive.fo/20260406100412/https://www.newyorker.com/magazine/2026/04/13/sam-altman-may-control-our-future-can-he-be-trusted

    Alignment Risk Update: https://www-cdn.anthropic.com/79c2d46d997783b9d2fb3241de43218158e5f25c.pdf

    In a Park: https://x.com/sleepinyourhat/status/2041584808514744742

    “Uhm” - https://x.com/thsottiaux/status/2041749947385815109


    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

  • First look at exclusive reports about OpenAI's new Spud model, and the model Anthropic think will stir governments to urgency, all in the context of the newly-launched ARC-AGI-3. What does the extreme difficulty of that benchmarks, and its quirky scoring metrics, mean for AI in 2026?

    https://assemblyai.com/aiexplained


    Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcouncil.ai

    AI Insiders ($9!): https://www.patreon.com/AIExplained


    Chapters:
    00:00 - Introduction
    00:55 - OpenAI Side Quests
    01:58 - Claude New Model Coming + Universal Equity?
    03:13 - ARC-AGI 3
    05:00 - Intentional or Unintentional Gaming?
    07:11 - But is it AGI Harbinger? No Harness
    09:41 - Not the First
    12:32 - Automated Researcher
    15:00 - Claw Caveat

    Spud: https://www.theinformation.com/articles/openai-ceo-shifts-responsibilities-preps-spud-ai-model?utm_campaign=Editorial&utm_content=Article&utm_medium=organic_social&utm_source=bluesky%2Cfacebook%2Clinkedin%2Cthreads%2Ctwitter&rc=sy0ihq

    FT: OpenAI Special Model: https://www.ft.com/content/de9bf0af-b241-424f-8229-5870b1c0d93d?syn-25a6b1a6=1

    Jensen Huang: https://www.forbes.com/sites/antoniopequenoiv/2026/03/23/nvidias-jensen-huang-says-he-thinks-weve-achieved-agi/

    Axios Article: https://archive.fo/20260326100140/https://www.axios.com/2026/03/26/anthropic-pentagon-ai-deal#selection-827.0-829.257

    https://arcprize.org/arc-agi/3

    ARC AGI 3 Paper: https://arcprize.org/media/ARC_AGI_3_Technical_Report.pdf

    NetHack Leaderboard: https://balrogai.com/
    Paper: https://ai.meta.com/research/publications/the-nethack-learning-environment/
    https://x.com/_rockt/status/2036864121585438995

    Claw Shells: https://x.com/DrJimFan/status/2036494601750716711

    OpenAI Automated Researcher: https://www.technologyreview.com/2026/03/20/1134438/openai-is-throwing-everything-into-building-a-fully-automated-researcher/

    Patreon Post: https://www.patreon.com/c/aiexplained/posts

    Eng Jobs: https://x.com/lennysan/status/2036535460726767793

    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

  • Just 48 hours after releasing GPT 5.3 Instant, OpenAI have released GPT 5.4 Thinking, so either their is an imminent singularity or perhaps we are being distracted from other news. This video will give 9 crucial bits of context, not just on the GPT 5.4 drop but on the background to the meltdown between the Pentagon and Anthropic. What does this say about the state of AI progress, your job, and what is next.


    Check out my fast-growing (!) app, free to use, and code INSIDER15 for 15% off paid tiers: https://lmcouncil.ai

    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    01:06: GPT 5.4 Breakdown
    05:06 - Closing the Loop
    06:35 - Spiky Performance
    10:31 - Advice
    11:32 - Less Encouraging Developments - Fired Like Dogs
    17:45 - But Used in Iran


    GPT 5.4: https://openai.com/index/introducing-gpt-5-4/

    Hallucinations: https://artificialanalysis.ai/evaluations/omniscience
    Investment Banking Bench: https://x.com/bradlightcap/status/2029684672343728452
    Move 37: https://x.com/nasqret/status/2029628846518010099
    System Card: https://deploymentsafety.openai.com/gpt-5-4-thinking/gpt-5-4-thinking.pdf

    Prediction Market Scandal: https://www.wired.com/story/openai-fires-employee-insider-trading-polymarket-kalshi/


    GPT 5.3 Instant: https://openai.com/index/gpt-5-3-instant/

    GDPVal: https://openai.com/index/gdpval/

    Claude in Iran: https://www.washingtonpost.com/technology/2026/03/04/anthropic-ai-iran-campaign

    ‘Like Dogs’: https://x.com/AndrewCurran_/status/2029605783311470679

    Altman leak: https://www.cnbc.com/2026/03/03/sam-altman-tells-openai-staff-operational-decisions-up-to-government.html

    Original 2024 Switch: https://archive.fo/20240116172526/https://www.bloomberg.com/news/articles/2024-01-16/openai-working-with-us-military-on-cybersecurity-tools-for-veterans#selection-6173.83-6173.226

    Amodei Original Memo: https://www.theinformation.com/articles/read-anthropic-ceos-memo-attacking-openais-mendacious-pentagon-announcement?rc=sy0ihq
    Anthropic Apology: https://www.anthropic.com/news/where-stand-department-war
    OpenAI Employee Reaction: https://x.com/tszzl/status/2029334980481212820

    DoD Suppler Risk: https://www.cnbc.com/amp/2026/03/05/anthropic-pentagon-ai-claude-iran.html
    Atlantic Exclusive: https://archive.fo/20260301152646/https://www.theatlantic.com/technology/2026/03/inside-anthropics-killer-robot-dispute-with-the-pentagon/686200/#selection-941.61-941.212
    No Negotiation: https://x.com/USWREMichael/status/2029754965778907493

    $20B Doubling: https://archive.ph/20260304111124/https://www.bloomberg.com/news/articles/2026-03-03/anthropic-nears-20-billion-revenue-run-rate-amid-pentagon-feud

    March 2022 Interview: https://www.youtube.com/watch?v=uAA6PZkek4A

    https://lmcouncil.ai/



    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

  • Will Anthropic be forced to make a version of Claude for war? And does a new paper expose the risks of Claude agents, in both OpenClaw and the field of war? Plus, 5 more twists in the story of the Pentagon versus Anthropic + some AI lab employees, and a petition that could change everything, or nothing...


    Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcouncil.ai

    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    00:44 - Deadline Day + Petition
    02:42 - Twist 1: Existing Deal
    03:26 - Twist 2: Existing Policy
    04:21 - Twist 3: Twin Threats
    05:54 - Twist 4: Interesting Objections
    11:32 - Twist 5: Anthropic’s Dropped Policy


    Dario Statement: https://www.anthropic.com/news/statement-department-of-war

    Google/OpenAI Petition: https://notdivided.org/

    Axios on Amodei Rejection: https://www.axios.com/2026/02/26/anthropic-rejects-pentagon-ai-terms

    FT on US Threat: https://www.ft.com/content/11d27612-d6c5-4cf7-94dd-f65603549b7f

    Politico on Latest: https://archive.ph/20260227013117/https://www.politico.com/news/2026/02/26/incoherent-hegseths-anthropic-ultimatum-confounds-ai-policymakers-00800135

    The Verge on Current Deal: https://www.theverge.com/ai-artificial-intelligence/883456/anthropic-pentagon-department-of-defense-negotiations

    Anthropic RSP change: https://www.anthropic.com/news/responsible-scaling-policy-v3

    Time Magazine on RSP: https://time.com/7380854/exclusive-anthropic-drops-flagship-safety-pledge/

    Agent of Chaos Paper: https://x.com/NatalieShapira/status/2026062499599319526

    AI Agent Reliability Paper: https://arxiv.org/pdf/2602.16666

    My Patreon Video: https://www.patreon.com/posts/real-mystery-ai-151647211

    Patreon Documentary: https://www.patreon.com/posts/our-new-age-of-133960279



    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

  • Do we have a new best AI model, or do we have the downfall of benchmarks in general, as a way of capturing machine intelligence? Full breakdown of Gemini 3.1 Pro, guest-starring the new Sonnet 4.6, plus analysis from 7 papers/posts that will give you much needed context. Oh, and a new record on Simple Bench!

    https://epoch.ai/ai-explained-datacenters


    Check out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.ai

    AI Insiders ($9!): https://www.patreon.com/AIExplained


    Chapters:
    00:00 - Introduction
    00:30 - Post-training Dominance
    04:00 - ARC-AGI 2 Caveat
    05:54 - Simple Bench Record
    08:22 - Hallucination Caveat
    10:05 - Model Card
    11:12 - Exponential Coming
    12:20 - Amodei on Generalizing
    15:10 - One True Benchmark?
    17:02 - Other Metrics…

    Gemini 3.1 Model Card: https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-1-Pro-Model-Card.pdf

    Release: https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro/

    Where are Agents deployed?: https://www.anthropic.com/research/measuring-agent-autonomy

    Newsletter Post: https://signaltonoise.beehiiv.com/p/4-ai-numbers-that-surprised-me-this-week

    Hallucination AA: https://artificialanalysis.ai/evaluations/omniscience

    Melanie Mitchell: https://x.com/MelMitchell1/status/2022738363548340526
    ARC-AGI-2: https://x.com/arcprize/status/2024522812728496470/photo/1

    Chollet on Agentic Coding and ML: https://x.com/fchollet/status/2024519439140737442

    METR Caveat: https://metr.org/notes/2026-01-22-time-horizon-limitations/

    Talaas Fast: https://chatjimmy.ai/

    Amodei Interview Continual learning: https://www.dwarkesh.com/p/dario-amodei-2?open=false#%C2%A7002942-is-continual-learning-necessary-how-will-it-be-solved

    Metaculus FutureEval: https://www.metaculus.com/futureeval/

    Next Vid to Watch: https://www.patreon.com/posts/what-you-need-to-150647292



    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

  • The two models that you will hear discussed for at least the next two months - Claude Opus 4.6 and GPT 5.3 Codex - just got released within 26 mins or each other. The full breakdown of around 250 pages of reports, with just the most interest moments, from the battle of which is best, Claude personhood, the surprising misbehaviour of Opus 4.6, and much more

    https://assemblyai.com/aiexplained

    Check out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.ai

    AI Insiders ($9): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    00:54 - Self-improvement?
    02:44 - Knowledge Work
    05:30 - Overly agentic behaviour
    09:12 - Who Shouldn’t Use Claude Opus
    11:39 - Step-change?
    15:09 - Claude’s ‘Personhood’

    Hassabis Roadmap: https://www.patreon.com/posts/hassabis-roadmap-149750869

    Release of Opus 4.6: https://www.anthropic.com/news/claude-opus-4-6
    212 Page System Card: https://www-cdn.anthropic.com/0dd865075ad3132672ee0ab40b05a53f14cf5288.pdf
    Claude Code Tip: https://x.com/bcherny/status/2019475897691124107


    GPT Codex 5.3: https://openai.com/index/introducing-gpt-5-3-codex/

    System Card: https://openai.com/index/gpt-5-3-codex-system-card/

    Browse Comp: https://arxiv.org/pdf/2504.12516v1
    Finance Agent: https://www.vals.ai/benchmarks/finance_agent
    Terminal Bench 2: https://arxiv.org/pdf/2601.11868
    Vending Bench: https://andonlabs.com/blog/opus-4-6-vending-bench

    My X post: https://x.com/AIExplainedYT/status/2016851303436095647

    Anthropic Apology: https://x.com/ch402/status/2014066134194995256/photo/1

    Altman rebuttal: https://x.com/sama/status/2019139174339928189
    https://x.com/sama/status/2019140276246442089

    4% of GitHub: https://x.com/dylan522p/status/2019490550911766763



    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

  • Anthropic's CEO, who has consistently predicted transformative AI will arrive before 2030, recently published a nearly 20,000-word essay outlining his vision of where AI is heading. The video gives you the highlights. The essay argues that scaling and recursion will advance AI from coding automation to full engineering automation, while warning of economic displacement within 1-2 years and China's trajectory toward AI-enabled totalitarianism. Additionally, Dario Amodei predicts that AI models will increasingly be understood as collections of distinct personas rather than monolithic systems.

    80,000 Hours: https://www.youtube.com/watch?v=B54EQiuO1UU



    Check out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.ai

    AI Insiders ($9!): https://www.patreon.com/AIExplained


    Chapters:
    00:00 - Introduction
    01:10 - Scaling to software engineers
    06:11 - Permanent Underclass
    10:18 - Totalitarian Nightmares
    16:38 - Collection of Personas

    Essay: https://www.darioamodei.com/essay/the-adolescence-of-technology

    Physics Prediction: https://www.quantamagazine.org/is-particle-physics-dead-dying-or-just-hard-20260126/

    Axios: https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropic

    World GDP: https://data.worldbank.org/indicator/NY.GDP.MKTP.KD.ZG?end=2024&start=1961&view=chart

    Demis Hassabis Counter: https://www.youtube.com/watch?v=q6fq4_uP7aM

    Karpathy 80%: https://x.com/karpathy/status/2015883857489522876

    Machines of Loving Grace: https://www.darioamodei.com/essay/machines-of-loving-grace

    Anthropic LessWrong: https://www.lesswrong.com/posts/5aKRshJzhojqfbRyo/unless-its-governance-changes-anthropic-is-untrustworthy#1__In_private__Dario_frequently_said_he_won_t_push_the_frontier_of_AI_capabilities__later__Anthropic_pushed_the_frontier

    Original Constitution: https://www.anthropic.com/news/claudes-constitution

    New Constitution: https://www.anthropic.com/constitution

    Kimi K2.5: https://x.com/Kimi_Moonshot/status/2016024049869324599

    Societies of Thought, Google DeepMind Paper: https://arxiv.org/pdf/2601.10825

    https://lmcouncil.ai/benchmarks

    https://www.patreon.com/posts/our-new-age-of-133960279



    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

  • A new tool, with code written by an AI model, has gone omega-viral: Claude Cowork. But is the hype justified? What do the stats say on productivity? Where is the truth in a sea of noise? What is truth? Can we handle the truth? Where's Nemo?

    https://matsprogram.org/s26-aie


    Check out my new app! https://lmcouncil.ai

    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    01:12 - Claude Cowork
    06:48 - Productivity Speed-up + jobs
    09:33 - Comparing Models
    12:00 - Brittle AI Paper

    Cowork Intro: https://x.com/claudeai/thread/2010805682434666759

    'All of it': https://x.com/bcherny/status/2010813886052581538

    'AGI' Claims: https://x.com/deepfates/status/2004994698335879383

    Douglas Interview: https://www.youtube.com/watch?v=TOsNrV3bXtQ&t=2313s

    Job Stats: https://www.oxfordeconomics.com/wp-content/uploads/2026/01/Evidence-of-an-AI-driven-shakeup-of-job-markets-is-patchy.pdf
    Amodei Prediction: https://fortune.com/2025/05/28/anthropic-ceo-warning-ai-job-loss/

    GenAI Traffic: https://x.com/demishassabis/status/2009075877347512545

    Illusion of Insight: https://arxiv.org/pdf/2601.00514
    Entropy Exploration: https://arxiv.org/pdf/2506.14758
    ProRL: https://arxiv.org/pdf/2505.24864

    Genesis Mission: https://www.whitehouse.gov/presidential-actions/2025/11/launching-the-genesis-mission/
    https://deepmind.google/blog/how-were-supporting-better-tropical-cyclone-prediction-with-ai/


    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

  • It’s probably not possible to satisfactorily condense a 12 month’s worth of weird progress in AI, as well as predictions for the year to come, into one video. But I’m gonna try anyway because it has been a very strange time.

    http://matsprogram.org/s26-aie


    My new app! https://lmcouncil.ai


    Patreon Interview: https://www.patreon.com/posts/robot-in-your-27-146376094

    Chapters:
    00:00 - Introduction
    00:34 - Reasoning Models … and limits
    02:54 - A playable world
    03:36 - Realism
    03:50 - AI Slop gone mainstream
    05:03 - DolphinGemma
    05:39 - Public Mood
    07:34 - AI Enlisted
    08:30 - GPT-5
    11:05 - Open Weight not out
    13:00 - METR Breakout
    17:30 - VASA-1
    18:28 - Lateral Productivity
    20:15 - 1 or 1000 benchmarks needed?
    24:54 - Continual Learning + Altman on Superintelligence
    28:08 - Automated Information Discovery ft AlphaEvolve


    Hassabis on Generality: https://x.com/demishassabis/status/2003097405026193809
    https://www.youtube.com/watch?v=PqVbypvxDto

    Gemini 3: https://storage.googleapis.com/gweb-uniblog-publish-prod/original_images/gemini_3_table_final_HLE_Tools_on.gif
    Reasoning Trade-offs: https://arxiv.org/pdf/2504.13837

    DolphinGemma: https://blog.google/technology/ai/dolphingemma/?s=09

    Genie 3: https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/

    METR Time Horizon: https://arxiv.org/pdf/2503.14499
    https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
    Flaws: https://x.com/ShashwatGoel7/status/2002369517499105443
    https://shash42.substack.com/p/how-to-game-the-metr-plot
    https://x.com/METR_Evals/status/2002203627377574113

    GPT-5 - Altman phd in everything: https://edition.cnn.com/2025/08/14/business/chatgpt-rollout-problems

    https://simple-bench.com/

    AI Slop: https://www.youtube.com/watch?v=I_3vxoJDD9k
    https://www.theguardian.com/technology/2025/dec/16/boost-for-artists-in-ai-copyright-battle-as-only-3-per-cent-back-uk-active-opt-out-plan

    Survey: https://x.com/SearchlightInst/status/2001057144842387920/photo/1

    Nvidia Nemotron: https://x.com/percyliang/status/2000608134205985169

    OpenAI Compute Flywheel: https://x.com/OpenAI/status/2001363007209914399/photo/1
    Altman Interview: https://www.youtube.com/watch?v=2P27Ef-LLuQ

    AI in Govt: https://x.com/jdcmedlock/status/1939814516503847259

    Benchmark Gaming: https://techcrunch.com/2025/04/07/meta-exec-denies-the-company-artificially-boosted-llama-4s-benchmark-scores/

    AlphaEvolve: https://deepmind.google/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
    https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdf?utm_source=deepmind.google&utm_medium=referral&utm_campaign=gdm&utm_content=
    Continual Learning: https://abehrouz.github.io/files/NL.pdf

    Job Risk: https://archive.ph/20250708204527/https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropic

    GPT4o: https://x.com/AISafetyMemes/status/1916889492172013989

    Vasa-1: https://www.microsoft.com/en-us/research/project/vasa-1/

    Three Views: https://www.lesswrong.com/posts/K2D45BNxnZjdpSX2j/ai-timelines
    Turing Test: https://x.com/tunguz/status/1907185471211422147

    Karpathy Year in Review: https://karpathy.bearblog.dev/year-in-review-2025/

    LLM Brainrot: https://arxiv.org/pdf/2510.13928

    Lateral Productivity: https://www.aisi.gov.uk/frontier-ai-trends-report

    Emotional Quotient: https://arxiv.org/pdf/2511.08394

    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/


    AI Insiders ($9!): https://www.patreon.com/AIExplained

  • The condensed highlights of hours of AI lab leader interviews, model releases, Gemini 3 Flash insights (plus it’s hidden flaw), Hassabis’ ‘proto-AGI’ and much more…

    https://matsprogram.org/apply?utm_source=ai-explained&utm_medium=youtube&utm_campaign=s26

    Also, do check out my new app: https://lmcouncil.ai

    Chapters:
    00:00 - Introduction
    00:50 - Results
    02:44 - But… the Flaw
    04:49 - So Benchmarks are fake? No
    07:37 - Spatial Reasoning + Hassabis
    10:06 - Proto-AGI
    12:07 - Minimal AGI
    15:07 - Compute Slowdown
    17:56 - New Data Paradigm

    Gemini 3 Flash: https://deepmind.google/models/gemini/flash/

    Hassabis Interview: https://www.youtube.com/watch?v=PqVbypvxDto
    Legg Interview: https://www.youtube.com/watch?v=l3u_FAv33G0
    Pre-training Lead Interview: https://www.youtube.com/watch?v=cNGDAqFXvew
    Altman Interview: https://www.youtube.com/watch?v=2P27Ef-LLuQ
    Brockman Video: https://x.com/OpenAI/status/2001336514786017417
    Post-Training Reveal: https://x.com/OfficialLoganK/status/2001742530472534442

    Hallucinations Paper: https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf
    Patreon Hallucinations Vid: https://www.patreon.com/posts/blockers-to-and-139264812
    AA-Omniscience Benchmark: https://artificialanalysis.ai/evaluations/omniscience
    https://arxiv.org/pdf/2511.13029


    lmcouncil.ai/benchmarks
    https://simple-bench.com/
    https://x.com/scaling01/status/1999620587744813205

    5.2 Codex Drop: https://cdn.openai.com/pdf/ac7c37ae-7f4c-4442-b741-2eabdeaf77e0/oai_5_2_Codex.pdf

    OpenAI Compute Trend: https://www.theinformation.com/articles/openais-350-billion-computing-cost-problem?rc=sy0ihq

    Cramer Tweet/Response: https://x.com/BorisMPower/status/2001440650210976018

    OpenAI Valuation: ​​https://www.theinformation.com/articles/openai-discussed-raising-tens-billions-valuation-around-750-billion?rc=sy0ihq

    Indian Data: https://www.reuters.com/world/india/with-freebies-openai-google-vie-indian-users-training-data-2025-12-17/

    TheInformation Data: https://x.com/theinformation/status/2001421225751351778

    Genie 3: https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/
    Sima 2: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/
    Veo 3.1: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/

    METR: https://metr.org/blohttps://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/2025-03-19-measuring-ai-ability-to-complete-long-tasks/


    AI Insiders ($9!): https://www.patreon.com/AIExplained


    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

  • Full GPT-5.2 breakdown - did OpenAI reclaim the crown? A story of tokens, time and cost, plus 9 details you wouldn’t get just from reading the headlines.

    https://www.youtube.com/@eightythousandhours



    AI Insiders ($9!): https://www.patreon.com/AIExplained
    https://lmcouncil.ai

    Chapters:
    00:00 - Introduction
    00:55 - Better than Human @ Professional Tasks?
    04:42 - Test time Compute
    07:05 - Benchmark Selection
    09:32 - Simple Results + council comparison
    13:01 - Long Context
    13:52 - Self-Improvement
    15:00 - 10 Years + New Models

    Release Page: https://openai.com/index/introducing-gpt-5-2/

    GPT 5.2 Benchmark Comparison: https://www.reddit.com/r/singularity/comments/1pka1y9/gpt52_all_20_benchmarks_rankings_and_pricing/
    https://storage.googleapis.com/gweb-uniblog-publish-prod/original_images/gemini_3_table_final_HLE_Tools_on.gif
    https://lmcouncil.ai/benchmarks

    Charxiv: https://charxiv.github.io/#leaderboard

    GDPval: https://arxiv.org/pdf/2510.04374
    My vid: https://www.youtube.com/watch?v=oK5LxMaROSA

    Kilpatrick: https://x.com/OfficialLoganK/status/1999270402712023158/photo/1

    Noam Brown: https://x.com/polynoamial/status/1999189845164667132

    New Model in New Year: https://www.theinformation.com/articles/openai-developing-garlic-model-counter-googles-recent-gains?rc=sy0ihq

    10 Years of OpenAI: https://openai.com/index/ten-years/

    GPQA: https://x.com/idavidrein/status/1841265634170278063

    ARC-AGI 1-2: https://arcprize.org/arc-agi/2/

    Sunday Robotics: https://x.com/tonyzzhao/status/1991204839578300813


    Non-hype Newsletter: https://signaltonoise.beehiiv.com/


    https://lmcouncil.ai

  • With headlines of an imminent job apocalypse, code red for ChatGPT and recursive self-improvement, at the same time as Anthropic's CEO yesterday saying we know how to scale to AGI, and Gemini 3 DeepThink out today, it is easy to get lost among the narratives and counter-narratives. So here are both, plus the facts behind them, for you to decide.


    https://epoch.ai/data/data-centers

    Epoch AI is the sponsor of today’s video, and my views, and those expressed in this video, do not necessarily reflect Epoch AI’s views in any way.


    Chapters:
    00:00 - Introduction
    00:42 - Job Apocalypse?
    01:45 - Scaling to AGI
    04:15 - Recursive Self-Improvement Needed, or Not
    09:57 - OpenAI Code Red vs Gemini 3 DeepThink vs Claude Opus 4.5
    13:27 - DeepSeek Speciale vs Mistral Large v3
    16:45 - Claude Soul Document

    https://lmcouncil.ai/

    AI Insiders ($9!): https://www.patreon.com/AIExplained



    Guardian Interview: https://www.theguardian.com/technology/ng-interactive/2025/dec/02/jared-kaplan-artificial-intelligence-train-itself

    MIT Study on Jobs/Tasks: https://iceberg.mit.edu/report.pdf
    vs https://www.cnbc.com/2025/11/26/mit-study-finds-ai-can-already-replace-11point7percent-of-us-workforce.html

    Amodei on Scaling: https://www.youtube.com/watch?v=FEj7wAjwQIk
    Claude Soul Document: https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5-opus-soul-document

    Capabilities Original Stance: https://www.anthropic.com/news/core-views-on-ai-safety

    Ilya Interview: https://www.dwarkesh.com/p/ilya-sutskever-2

    Ricursive Intelligence: https://x.com/RicursiveAI/status/1995932204703346946

    Economist Worker Usage of GenAI: https://www.economist.com/finance-and-economics/2025/11/26/investors-expect-ai-use-to-soar-thats-not-happening#selection-1409.94-1413.42

    Mistral v3 Large: https://docs.mistral.ai/models/mistral-large-3-25-12

    Compute Slowdown Paper: https://joel-becker.com/images/publications/forecasting_time_horizon_under_compute_slowdown.pdf
    https://x.com/joel_bkr/status/1993023436541903155

    METR Chart: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/

    https://www.theinformation.com/articles/openais-350-billion-computing-cost-problem?rc=sy0ihq

    OpenAI Code Red: https://www.anthropic.com/news/core-views-on-ai-safety
    Rocket Company: https://www.independent.co.uk/news/world/americas/sam-altman-rocket-elon-musk-spacex-b2878351.html

    DeepSeek Paper: https://arxiv.org/html/2512.02556v1

    DeepSeek Crowdstrike CCP: https://www.crowdstrike.com/en-us/blog/crowdstrike-researchers-identify-hidden-vulnerabilities-ai-coded-software/

    https://simple-bench.com/

    Patreon Post: https://www.patreon.com/c/aiexplained/posts

    Robot: https://x.com/jloganolson/status/1985850115379351799

  • Gemini 3 Pro is out, and records fell like snowflakes in Svalbard.

    No long description, chapters or links today, huge technical difficulties, including with audio, so just want to publish asap.


    https://app.grayswan.ai/ai-explained


    https://lmcouncil.ai
    AI Insiders ($9!): https://www.patreon.com/AIExplained



    Non-hype Newsletter: https://signaltonoise.beehiiv.com/
    Podcast: https://aiexplainedopodcast.buzzsprout.com/

  • A lot just got released in the last 36 hours, and it will all affect hundreds of millions of people. 10 details you would miss if you just read the headlines, from GPT 5.1 regressions, to how Claude hacked Govt Agencies, to SIMA 2, and Musical Turing Tests.

    https://assemblyai.com/aiexplained

    Chapters:
    00:00 - Introduction

    00:56 - GPT 5.1 Smarter?

    01:47 - Some Regressions

    03:22 - Sycophancy?

    05:22 - Claude Auto-Hacking

    06:16 - Jailbreaking through Granularity

    08:22 - This Will be Re-used

    09:30 - Hallucinating Hacker

    09:57 - Surprisingly Neutral Tone

    12:18 - SIMA 2

    14:10 - Alpha Parallels

    17:24 - AI Music



    GPT 5.1 Announcement: https://openai.com/index/gpt-5-1/

    System Card: https://cdn.openai.com/pdf/4173ec8d-1229-47db-96de-06d87147e07e/5_1_system_card.pdf

    Benchmarks: https://openai.com/index/gpt-5-1-for-developers/

    Simple Bench: https://lmcouncil.ai/benchmarks

    Auto-Hacking: https://x.com/AnthropicAI/status/1989033793190277618

    https://www.anthropic.com/news/disrupting-AI-espionage

    Report: https://assets.anthropic.com/m/ec212e6566a0d47/original/Disrupting-the-first-reported-AI-orchestrated-cyber-espionage-campaign.pdf



    Sima 2 Announcement: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/

    https://x.com/amoufarek/status/1988986075331858693

    Scepticism: https://www.technologyreview.com/2025/11/13/1127921/google-deepmind-is-using-gemini-to-train-agents-inside-goat-simulator-3/

    Voyager: https://voyager.minedojo.org/

    Reuters Music: https://www.reuters.com/legal/litigation/are-you-listening-bots-survey-shows-ai-music-is-virtually-undetectable-2025-11-12/

  • Don’t let headlines about bubbles distract you from the real avenues of progress being explored in AI every week, including what had been thought to be a long-term blocker - continual learning (learning on the fly).

    https://app.grayswan.ai/ai-explained

    This, plus models introspecting (hesitate before you berate), Nano Banana 2 possibly spotted, Chinese imagen and more.

    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    01:26 - Continual Learning (Nested Learning / HOPE)
    07:00 - Introspection
    10:54 - Image-Gen Progress

    Nested Learning Post: https://research.google/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/

    Nested Learning Paper: https://abehrouz.github.io/files/NL.pdf

    Original Titans Paper: https://arxiv.org/pdf/2501.00663

    Siri News: https://www.bloomberg.com/news/articles/2025-11-05/apple-plans-to-use-1-2-trillion-parameter-google-gemini-model-to-power-new-siri

    Introspection: https://www.anthropic.com/research/introspection

    Full Paper: https://transformer-circuits.pub/2025/introspection/index.html#mechanisms

    Earlier Work: https://www.anthropic.com/research/mapping-mind-language-model
    https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html

    Release Post: https://x.com/AnthropicAI/status/1983584136972677319

    https://lmcouncil.ai



    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/