The Building Blocks of Agentic Systems with Harrison Chase - #698 – The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) – Podcast

Episodios

Scaling Up Test-Time Compute with Latent Reasoning with Jonas Geiping - #723
17 mar· The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Today, we're joined by Jonas Geiping, research group leader at Ellis Institute and the Max Planck Institute for Intelligent Systems to discuss his recent paper, “Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach.” This paper proposes a novel language model architecture which uses recurrent depth to enable “thinking in latent space.” We dig into “internal reasoning” versus “verbalized reasoning”—analogous to non-verbalized and verbalized thinking in humans, and discuss how the model searches in latent space to predict the next token and dynamically allocates more compute based on token difficulty. We also explore how the recurrent depth architecture simplifies LLMs, the parallels to diffusion models, the model's performance on reasoning tasks, the challenges of comparing models with varying compute budgets, and architectural advantages such as zero-shot adaptive exits and natural speculative decoding.

The complete show notes for this episode can be found at https://twimlai.com/go/723.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
Imagine while Reasoning in Space: Multimodal Visualization-of-Thought with Chengzu Li - #722
10 mar· The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Today, we're joined by Chengzu Li, PhD student at the University of Cambridge to discuss his recent paper, “Imagine while Reasoning in Space: Multimodal Visualization-of-Thought.” We explore the motivations behind MVoT, its connection to prior work like TopViewRS, and its relation to cognitive science principles such as dual coding theory. We dig into the MVoT framework along with its various task environments—maze, mini-behavior, and frozen lake. We explore token discrepancy loss, a technique designed to align language and visual embeddings, ensuring accurate and meaningful visual representations. Additionally, we cover the data collection and training process, reasoning over relative spatial relations between different entities, and dynamic spatial reasoning. Lastly, Chengzu shares insights from experiments with MVoT, focusing on the lessons learned and the potential for applying these models in real-world scenarios like robotics and architectural design.

The complete show notes for this episode can be found at https://twimlai.com/go/722.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
¿Faltan episodios?

Pulsa aquí para actualizar resultados
Inside s1: An o1-Style Reasoning Model That Cost Under $50 to Train with Niklas Muennighoff - #721
3 mar· The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Today, we're joined by Niklas Muennighoff, a PhD student at Stanford University, to discuss his paper, “S1: Simple Test-Time Scaling.” We explore the motivations behind S1, as well as how it compares to OpenAI's O1 and DeepSeek's R1 models. We dig into the different approaches to test-time scaling, including parallel and sequential scaling, as well as S1’s data curation process, its training recipe, and its use of model distillation from Google Gemini and DeepSeek R1. We explore the novel "budget forcing" technique developed in the paper, allowing it to think longer for harder problems and optimize test-time compute for better performance. Additionally, we cover the evaluation benchmarks used, the comparison between supervised fine-tuning and reinforcement learning, and similar projects like the Hugging Face Open R1 project. Finally, we discuss the open-sourcing of S1 and its future directions.

The complete show notes for this episode can be found at https://twimlai.com/go/721.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
Accelerating AI Training and Inference with AWS Trainium2 with Ron Diamant - #720
24 feb· The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Today, we're joined by Ron Diamant, chief architect for Trainium at Amazon Web Services, to discuss hardware acceleration for generative AI and the design and role of the recently released Trainium2 chip. We explore the architectural differences between Trainium and GPUs, highlighting its systolic array-based compute design, and how it balances performance across key dimensions like compute, memory bandwidth, memory capacity, and network bandwidth. We also discuss the Trainium tooling ecosystem including the Neuron SDK, Neuron Compiler, and Neuron Kernel Interface (NKI). We also dig into the various ways Trainum2 is offered, including Trn2 instances, UltraServers, and UltraClusters, and access through managed services like AWS Bedrock. Finally, we cover sparsity optimizations, customer adoption, performance benchmarks, support for Mixture of Experts (MoE) models, and what’s next for Trainium.

The complete show notes for this episode can be found at https://twimlai.com/go/720.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
π0: A Foundation Model for Robotics with Sergey Levine - #719
18 feb· The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Today, we're joined by Sergey Levine, associate professor at UC Berkeley and co-founder of Physical Intelligence, to discuss π0 (pi-zero), a general-purpose robotic foundation model. We dig into the model architecture, which pairs a vision language model (VLM) with a diffusion-based action expert, and the model training "recipe," emphasizing the roles of pre-training and post-training with a diverse mixture of real-world data to ensure robust and intelligent robot learning. We review the data collection approach, which uses human operators and teleoperation rigs, the potential of synthetic data and reinforcement learning in enhancing robotic capabilities, and much more. We also introduce the team’s new FAST tokenizer, which opens the door to a fully Transformer-based model and significant improvements in learning and generalization. Finally, we cover the open-sourcing of π0 and future directions for their research.

The complete show notes for this episode can be found at https://twimlai.com/go/719.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
AI Trends 2025: AI Agents and Multi-Agent Systems with Victor Dibia - #718
10 feb· The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Today we’re joined by Victor Dibia, principal research software engineer at Microsoft Research, to explore the key trends and advancements in AI agents and multi-agent systems shaping 2025 and beyond. In this episode, we discuss the unique abilities that set AI agents apart from traditional software systems–reasoning, acting, communicating, and adapting. We also examine the rise of agentic foundation models, the emergence of interface agents like Claude with Computer Use and OpenAI Operator, the shift from simple task chains to complex workflows, and the growing range of enterprise use cases. Victor shares insights into emerging design patterns for autonomous multi-agent systems, including graph and message-driven architectures, the advantages of the “actor model” pattern as implemented in Microsoft’s AutoGen, and guidance on how users should approach the ”build vs. buy” decision when working with AI agent frameworks. We also address the challenges of evaluating end-to-end agent performance, the complexities of benchmarking agentic systems, and the implications of our reliance on LLMs as judges. Finally, we look ahead to the future of AI agents in 2025 and beyond, discuss emerging HCI challenges, their potential for impact on the workforce, and how they are poised to reshape fields like software engineering.

The complete show notes for this episode can be found at https://twimlai.com/go/718.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
Speculative Decoding and Efficient LLM Inference with Chris Lott - #717
4 feb· The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Today, we're joined by Chris Lott, senior director of engineering at Qualcomm AI Research to discuss accelerating large language model inference. We explore the challenges presented by the LLM encoding and decoding (aka generation) and how these interact with various hardware constraints such as FLOPS, memory footprint and memory bandwidth to limit key inference metrics such as time-to-first-token, tokens per second, and tokens per joule. We then dig into a variety of techniques that can be used to accelerate inference such as KV compression, quantization, pruning, speculative decoding, and leveraging small language models (SLMs). We also discuss future directions for enabling on-device agentic experiences such as parallel generation and software tools like Qualcomm AI Orchestrator.

The complete show notes for this episode can be found at https://twimlai.com/go/717.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
Ensuring Privacy for Any LLM with Patricia Thaine - #716
28 ene· The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Today, we're joined by Patricia Thaine, co-founder and CEO of Private AI to discuss techniques for ensuring privacy, data minimization, and compliance when using 3rd-party large language models (LLMs) and other AI services. We explore the risks of data leakage from LLMs and embeddings, the complexities of identifying and redacting personal information across various data flows, and the approach Private AI has taken to mitigate these risks. We also dig into the challenges of entity recognition in multimodal systems including OCR files, documents, images, and audio, and the importance of data quality and model accuracy. Additionally, Patricia shares insights on the limitations of data anonymization, the benefits of balancing real-world and synthetic data in model training and development, and the relationship between privacy and bias in AI. Finally, we touch on the evolving landscape of AI regulations like GDPR, CPRA, and the EU AI Act, and the future of privacy in artificial intelligence.

The complete show notes for this episode can be found at https://twimlai.com/go/716.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
AI Engineering Pitfalls with Chip Huyen - #715
21 ene· The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Today, we're joined by Chip Huyen, independent researcher and writer to discuss her new book, “AI Engineering.” We dig into the definition of AI engineering, its key differences from traditional machine learning engineering, the common pitfalls encountered in engineering AI systems, and strategies to overcome them. We also explore how Chip defines AI agents, their current limitations and capabilities, and the critical role of effective planning and tool utilization in these systems. Additionally, Chip shares insights on the importance of evaluation in AI systems, highlighting the need for systematic processes, human oversight, and rigorous metrics and benchmarks. Finally, we touch on the impact of open-source models, the potential of synthetic data, and Chip’s predictions for the year ahead.

The complete show notes for this episode can be found at https://twimlai.com/go/715.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
Evolving MLOps Platforms for Generative AI and Agents with Abhijit Bose - #714
13 ene· The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Today, we're joined by Abhijit Bose, head of enterprise AI and ML platforms at Capital One to discuss the evolution of the company’s approach and insights on Generative AI and platform best practices. In this episode, we dig into the company’s platform-centric approach to AI, and how they’ve been evolving their existing MLOps and data platforms to support the new challenges and opportunities presented by generative AI workloads and AI agents. We explore their use of cloud-based infrastructure—in this case on AWS—to provide a foundation upon which they then layer open-source and proprietary services and tools. We cover their use of Llama 3 and open-weight models, their approach to fine-tuning, their observability tooling for Gen AI applications, their use of inference optimization techniques like quantization, and more. Finally, Abhijit shares the future of agentic workflows in the enterprise, the application of OpenAI o1-style reasoning in models, and the new roles and skillsets required in the evolving GenAI landscape.

The complete show notes for this episode can be found at https://twimlai.com/go/714.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
Why Agents Are Stupid & What We Can Do About It with Dan Jeffries - #713
16 dic 2024· The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Today, we're joined by Dan Jeffries, founder and CEO of Kentauros AI to discuss the challenges currently faced by those developing advanced AI agents. We dig into how Dan defines agents and distinguishes them from other similar uses of LLM, explore various use cases for them, and dig into ways to create smarter agentic systems. Dan shared his “big brain, little brain, tool brain” approach to tackling real-world challenges in agents, the trade-offs in leveraging general-purpose vs. task-specific models, and his take on LLM reasoning. We also cover the way he thinks about model selection for agents, along with the need for new tools and platforms for deploying them. Finally, Dan emphasizes the importance of open source in advancing AI, shares the new products they’re working on, and explores the future directions in the agentic era.

The complete show notes for this episode can be found at https://twimlai.com/go/713.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
Automated Reasoning to Prevent LLM Hallucination with Byron Cook - #712
9 dic 2024· The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Today, we're joined by Byron Cook, VP and distinguished scientist in the Automated Reasoning Group at AWS to dig into the underlying technology behind the newly announced Automated Reasoning Checks feature of Amazon Bedrock Guardrails. Automated Reasoning Checks uses mathematical proofs to help LLM users safeguard against hallucinations. We explore recent advancements in the field of automated reasoning, as well as some of the ways it is applied broadly, as well as across AWS, where it is used to enhance security, cryptography, virtualization, and more. We discuss how the new feature helps users to generate, refine, validate, and formalize policies, and how those policies can be deployed alongside LLM applications to ensure the accuracy of generated text. Finally, Byron also shares the benchmarks they’ve applied, the use of techniques like ‘constrained coding’ and ‘backtracking,’ and the future co-evolution of automated reasoning and generative AI.

The complete show notes for this episode can be found at https://twimlai.com/go/712.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
AI at the Edge: Qualcomm AI Research at NeurIPS 2024 with Arash Behboodi - #711
3 dic 2024· The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Today, we're joined by Arash Behboodi, director of engineering at Qualcomm AI Research to discuss the papers and workshops Qualcomm will be presenting at this year’s NeurIPS conference. We dig into the challenges and opportunities presented by differentiable simulation in wireless systems, the sciences, and beyond. We also explore recent work that ties conformal prediction to information theory, yielding a novel approach to incorporating uncertainty quantification directly into machine learning models. Finally, we review several papers enabling the efficient use of LoRA (Low-Rank Adaptation) on mobile devices (Hollowed Net, ShiRA, FouRA). Arash also previews the demos Qualcomm will be hosting at NeurIPS, including new video editing diffusion and 3D content generation models running on-device, Qualcomm's AI Hub, and more!

The complete show notes for this episode can be found at https://twimlai.com/go/711.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
AI for Network Management with Shirley Wu - #710
19 nov 2024· The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Today, we're joined by Shirley Wu, senior director of software engineering at Juniper Networks to discuss how machine learning and artificial intelligence are transforming network management. We explore various use cases where AI and ML are applied to enhance the quality, performance, and efficiency of networks across Juniper’s customers, including diagnosing cable degradation, proactive monitoring for coverage gaps, and real-time fault detection. We also dig into the complexities of integrating data science into networking, the trade-offs between traditional methods and ML-based solutions, the role of feature engineering and data in networking, the applicability of large language models, and Juniper’s approach to using smaller, specialized ML models to optimize speed, latency, and cost. Finally, Shirley shares some future directions for Juniper Mist such as proactive network testing and end-user self-service.

The complete show notes for this episode can be found at https://twimlai.com/go/710.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
Why Your RAG System Is Broken, and How to Fix It with Jason Liu - #709
11 nov 2024· The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Today, we're joined by Jason Liu, freelance AI consultant, advisor, and creator of the Instructor library to discuss all things retrieval-augmented generation (RAG). We dig into the tactical and strategic challenges companies face with their RAG system, the different signs Jason looks for to identify looming problems, the issues he most commonly encounters, and the steps he takes to diagnose these issues. We also cover the significance of building out robust test datasets, data-driven experimentation, evaluation tools, and metrics for different use cases. We also touched on fine-tuning strategies for RAG systems, the effectiveness of different chunking strategies, the use of collaboration tools like Braintrust, and how future models will change the game. Lastly, we cover Jason’s interest in teaching others how to capitalize on their own AI experience via his AI consulting course.

The complete show notes for this episode can be found at https://twimlai.com/go/709.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
An Agentic Mixture of Experts for DevOps with Sunil Mallya - #708
4 nov 2024· The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Today we're joined by Sunil Mallya, CTO and co-founder of Flip AI. We discuss Flip’s incident debugging system for DevOps, which was built using a custom mixture of experts (MoE) large language model (LLM) trained on a novel "CoMELT" observability dataset which combines traditional MELT data—metrics, events, logs, and traces—with code to efficiently identify root failure causes in complex software systems. We discuss the challenges of integrating time-series data with LLMs and their multi-decoder architecture designed for this purpose. Sunil describes their system's agent-based design, focusing on clear roles and boundaries to ensure reliability. We examine their "chaos gym," a reinforcement learning environment used for testing and improving the system's robustness. Finally, we discuss the practical considerations of deploying such a system at scale in diverse environments and much more.

The complete show notes for this episode can be found at https://twimlai.com/go/708.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
Building AI Voice Agents with Scott Stephenson - #707
28 oct 2024· The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Today, we're joined by Scott Stephenson, co-founder and CEO of Deepgram to discuss voice AI agents. We explore the importance of perception, understanding, and interaction and how these key components work together in building intelligent AI voice agents. We discuss the role of multimodal LLMs as well as speech-to-text and text-to-speech models in building AI voice agents, and dig into the benefits and limitations of text-based approaches to voice interactions. We dig into what’s required to deliver real-time voice interactions and the promise of closed-loop, continuously improving, federated learning agents. Finally, Scott shares practical applications of AI voice agents at Deepgram and provides an overview of their newly released agent toolkit.

The complete show notes for this episode can be found at https://twimlai.com/go/707.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
Is Artificial Superintelligence Imminent? with Tim Rocktäschel - #706
21 oct 2024· The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Today, we're joined by Tim Rocktäschel, senior staff research scientist at Google DeepMind, professor of Artificial Intelligence at University College London, and author of the recently published popular science book, “Artificial Intelligence: 10 Things You Should Know.” We dig into the attainability of artificial superintelligence and the path to achieving generalized superhuman capabilities across multiple domains. We discuss the importance of open-endedness in developing autonomous and self-improving systems, as well as the role of evolutionary approaches and algorithms. Additionally, we cover Tim’s recent research projects such as “Promptbreeder,” “Debating with More Persuasive LLMs Leads to More Truthful Answers,” and more.

The complete show notes for this episode can be found at https://twimlai.com/go/706.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
ML Models for Safety-Critical Systems with Lucas García - #705
14 oct 2024· The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Today, we're joined by Lucas García, principal product manager for deep learning at MathWorks to discuss incorporating ML models into safety-critical systems. We begin by exploring the critical role of verification and validation (V&V) in these applications. We review the popular V-model for engineering critical systems and then dig into the “W” adaptation that’s been proposed for incorporating ML models. Next, we discuss the complexities of applying deep learning neural networks in safety-critical applications using the aviation industry as an example, and talk through the importance of factors such as data quality, model stability, robustness, interpretability, and accuracy. We also explore formal verification methods, abstract transformer layers, transformer-based architectures, and the application of various software testing techniques. Lucas also introduces the field of constrained deep learning and convex neural networks and its benefits and trade-offs.

The complete show notes for this episode can be found at https://twimlai.com/go/705.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
AI Agents: Substance or Snake Oil with Arvind Narayanan - #704
7 oct 2024· The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Today, we're joined by Arvind Narayanan, professor of Computer Science at Princeton University to discuss his recent works, AI Agents That Matter and AI Snake Oil. In “AI Agents That Matter”, we explore the range of agentic behaviors, the challenges in benchmarking agents, and the ‘capability and reliability gap’, which creates risks when deploying AI agents in real-world applications. We also discuss the importance of verifiers as a technique for safeguarding agent behavior. We then dig into the AI Snake Oil book, which uncovers examples of problematic and overhyped claims in AI. Arvind shares various use cases of failed applications of AI, outlines a taxonomy of AI risks, and shares his insights on AI’s catastrophic risks. Additionally, we also touched on different approaches to LLM-based reasoning, his views on tech policy and regulation, and his work on CORE-Bench, a benchmark designed to measure AI agents' accuracy in computational reproducibility tasks.

The complete show notes for this episode can be found at https://twimlai.com/go/704.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
Mostrar más

Episodios

Scaling Up Test-Time Compute with Latent Reasoning with Jonas Geiping - #723

Imagine while Reasoning in Space: Multimodal Visualization-of-Thought with Chengzu Li - #722

Inside s1: An o1-Style Reasoning Model That Cost Under $50 to Train with Niklas Muennighoff - #721

Accelerating AI Training and Inference with AWS Trainium2 with Ron Diamant - #720

π0: A Foundation Model for Robotics with Sergey Levine - #719

AI Trends 2025: AI Agents and Multi-Agent Systems with Victor Dibia - #718

Speculative Decoding and Efficient LLM Inference with Chris Lott - #717

Ensuring Privacy for Any LLM with Patricia Thaine - #716

AI Engineering Pitfalls with Chip Huyen - #715

Evolving MLOps Platforms for Generative AI and Agents with Abhijit Bose - #714

Why Agents Are Stupid & What We Can Do About It with Dan Jeffries - #713

Automated Reasoning to Prevent LLM Hallucination with Byron Cook - #712

AI at the Edge: Qualcomm AI Research at NeurIPS 2024 with Arash Behboodi - #711

AI for Network Management with Shirley Wu - #710

Why Your RAG System Is Broken, and How to Fix It with Jason Liu - #709

An Agentic Mixture of Experts for DevOps with Sunil Mallya - #708

Building AI Voice Agents with Scott Stephenson - #707

Is Artificial Superintelligence Imminent? with Tim Rocktäschel - #706

ML Models for Safety-Critical Systems with Lucas García - #705

AI Agents: Substance or Snake Oil with Arvind Narayanan - #704