Episodes
-
Today we’re joined by Sherry Yang, senior research scientist at Google DeepMind and a PhD student at UC Berkeley. In this interview, we discuss her new paper, "Video as the New Language for Real-World Decision Making,” which explores how generative video models can play a role similar to language models as a way to solve tasks in the real world. Sherry draws the analogy between natural language as a unified representation of information and text prediction as a common task interface and demonstrates how video as a medium and generative video as a task exhibit similar properties. This formulation enables video generation models to play a variety of real-world roles as planners, agents, compute engines, and environment simulators. Finally, we explore UniSim, an interactive demo of Sherry's work and a preview of her vision for interacting with AI-generated environments.
The complete show notes for this episode can be found at twimlai.com/go/676. -
Today we’re joined by Sayash Kapoor, a Ph.D. student in the Department of Computer Science at Princeton University. Sayash walks us through his paper: "On the Societal Impact of Open Foundation Models.” We dig into the controversy around AI safety, the risks and benefits of releasing open model weights, and how we can establish common ground for assessing the threats posed by AI. We discuss the application of the framework presented in the paper to specific risks, such as the biosecurity risk of open LLMs, as well as the growing problem of "Non Consensual Intimate Imagery" using open diffusion models.
The complete show notes for this episode can be found at twimlai.com/go/675. -
Missing episodes?
-
Today we’re joined by Akshita Bhagia, a senior research engineer at the Allen Institute for AI. Akshita joins us to discuss OLMo, a new open source language model with 7 billion and 1 billion variants, but with a key difference compared to similar models offered by Meta, Mistral, and others. Namely, the fact that AI2 has also published the dataset and key tools used to train the model. In our chat with Akshita, we dig into the OLMo models and the various projects falling under the OLMo umbrella, including Dolma, an open three-trillion-token corpus for language model pretraining, and Paloma, a benchmark and tooling for evaluating language model performance across a variety of domains.
The complete show notes for this episode can be found at twimlai.com/go/674. -
Today we’re joined by Ben Prystawski, a PhD student in the Department of Psychology at Stanford University working at the intersection of cognitive science and machine learning. Our conversation centers on Ben’s recent paper, “Why think step by step? Reasoning emerges from the locality of experience,” which he recently presented at NeurIPS 2023. In this conversation, we start out exploring basic questions about LLM reasoning, including whether it exists, how we can define it, and how techniques like chain-of-thought reasoning appear to strengthen it. We then dig into the details of Ben’s paper, which aims to understand why thinking step-by-step is effective and demonstrates that local structure is the key property of LLM training data that enables it.
The complete show notes for this episode can be found at twimlai.com/go/673. -
Today we're joined by Armineh Nourbakhsh of JP Morgan AI Research to discuss the development and capabilities of DocLLM, a layout-aware large language model for multimodal document understanding. Armineh provides a historical overview of the challenges of document AI and an introduction to the DocLLM model. Armineh explains how this model, distinct from both traditional LLMs and document AI models, incorporates both textual semantics and spatial layout in processing enterprise documents like reports and complex contracts. We dig into her team’s approach to training DocLLM, their choice of a generative model as opposed to an encoder-based approach, the datasets they used to build the model, their approach to incorporating layout information, and the various ways they evaluated the model’s performance.
The complete show notes for this episode can be found at twimlai.com/go/672. -
Today we’re joined by Sanmi Koyejo, assistant professor at Stanford University, to continue our NeurIPS 2024 series. In our conversation, Sanmi discusses his two recent award-winning papers. First, we dive into his paper, “Are Emergent Abilities of Large Language Models a Mirage?”. We discuss the different ways LLMs are evaluated and the excitement surrounding their“emergent abilities” such as the ability to perform arithmetic Sanmi describes how evaluating model performance using nonlinear metrics can lead to the illusion that the model is rapidly gaining new capabilities, whereas linear metrics show smooth improvement as expected, casting doubt on the significance of emergence. We continue on to his next paper, “DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models,” discussing the methodology it describes for evaluating concerns such as the toxicity, privacy, fairness, and robustness of LLMs.
The complete show notes for this episode can be found at twimlai.com/go/671. -
Today we’re joined by Kamyar Azizzadenesheli, a staff researcher at Nvidia, to continue our AI Trends 2024 series. In our conversation, Kamyar updates us on the latest developments in reinforcement learning (RL), and how the RL community is taking advantage of the abstract reasoning abilities of large language models (LLMs). Kamyar shares his insights on how LLMs are pushing RL performance forward in a variety of applications, such as ALOHA, a robot that can learn to fold clothes, and Voyager, an RL agent that uses GPT-4 to outperform prior systems at playing Minecraft. We also explore the progress being made in assessing and addressing the risks of RL-based decision-making in domains such as finance, healthcare, and agriculture. Finally, we discuss the future of deep reinforcement learning, Kamyar’s top predictions for the field, and how greater compute capabilities will be critical in achieving general intelligence.
The complete show notes for this episode can be found at twimlai.com/go/670. -
Today we’re joined by Ram Sriharsha, VP of engineering at Pinecone. In our conversation, we dive into the topic of vector databases and retrieval augmented generation (RAG). We explore the trade-offs between relying solely on LLMs for retrieval tasks versus combining retrieval in vector databases and LLMs, the advantages and complexities of RAG with vector databases, the key considerations for building and deploying real-world RAG-based applications, and an in-depth look at Pinecone's new serverless offering. Currently in public preview, Pinecone Serverless is a vector database that enables on-demand data loading, flexible scaling, and cost-effective query processing. Ram discusses how the serverless paradigm impacts the vector database’s core architecture, key features, and other considerations. Lastly, Ram shares his perspective on the future of vector databases in helping enterprises deliver RAG systems.
The complete show notes for this episode can be found at twimlai.com/go/669. -
Today we’re joined by Ben Zhao, a Neubauer professor of computer science at the University of Chicago. In our conversation, we explore his research at the intersection of security and generative AI. We focus on Ben’s recent Fawkes, Glaze, and Nightshade projects, which use “poisoning” approaches to provide users with security and protection against AI encroachments. The first tool we discuss, Fawkes, imperceptibly “cloaks” images in such a way that models perceive them as highly distorted, effectively shielding individuals from recognition by facial recognition models. We then dig into Glaze, a tool that employs machine learning algorithms to compute subtle alterations that are indiscernible to human eyes but adept at tricking the models into perceiving a significant shift in art style, giving artists a unique defense against style mimicry. Lastly, we cover Nightshade, a strategic defense tool for artists akin to a 'poison pill' which allows artists to apply imperceptible changes to their images that effectively “breaks” generative AI models that are trained on them.
The complete show notes for this episode can be found at twimlai.com/go/668. -
Today, we continue our NeurIPS series with Dan Friedman, a PhD student in the Princeton NLP group. In our conversation, we explore his research on mechanistic interpretability for transformer models, specifically his paper, Learning Transformer Programs. The LTP paper proposes modifications to the transformer architecture which allow transformer models to be easily converted into human-readable programs, making them inherently interpretable. In our conversation, we compare the approach proposed by this research with prior approaches to understanding the models and their shortcomings. We also dig into the approach’s function and scale limitations and constraints.
The complete show notes for this episode can be found at twimlai.com/go/667. -
Today we continue our AI Trends 2024 series with a conversation with Thomas Dietterich, distinguished professor emeritus at Oregon State University. As you might expect, Large Language Models figured prominently in our conversation, and we covered a vast array of papers and use cases exploring current research into topics such as monolithic vs. modular architectures, hallucinations, the application of uncertainty quantification (UQ), and using RAG as a sort of memory module for LLMs. Lastly, don’t miss Tom’s predictions on what he foresees happening this year as well as his words of encouragement for those new to the field.
The complete show notes for this episode can be found at twimlai.com/go/666. -
Today we kick off our AI Trends 2024 series with a conversation with Naila Murray, director of AI research at Meta. In our conversation with Naila, we dig into the latest trends and developments in the realm of computer vision. We explore advancements in the areas of controllable generation, visual programming, 3D Gaussian splatting, and multimodal models, specifically vision plus LLMs. We discuss tools and open source projects, including Segment Anything–a tool for versatile zero-shot image segmentation using simple text prompts clicks, and bounding boxes; ControlNet–which adds conditional control to stable diffusion models; and DINOv2–a visual encoding model enabling object recognition, segmentation, and depth estimation, even in data-scarce scenarios. Finally, Naila shares her view on the most exciting opportunities in the field, as well as her predictions for upcoming years.
The complete show notes for this episode can be found at twimlai.com/go/665. -
Today we’re joined by Ed Anuff, chief product officer at DataStax. In our conversation, we discuss Ed’s insights on RAG, vector databases, embedding models, and more. We dig into the underpinnings of modern vector databases (like HNSW and DiskANN) that allow them to efficiently handle massive and unstructured data sets, and discuss how they help users serve up relevant results for RAG, AI assistants, and other use cases. We also discuss embedding models and their role in vector comparisons and database retrieval as well as the potential for GPU usage to enhance vector database performance.
The complete show notes for this episode can be found at twimlai.com/go/664. -
Today we’re joined by Markus Nagel, research scientist at Qualcomm AI Research, who helps us kick off our coverage of NeurIPS 2023. In our conversation with Markus, we cover his accepted papers at the conference, along with other work presented by Qualcomm AI Research scientists. Markus’ first paper, Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing, focuses on tackling activation quantization issues introduced by the attention mechanism and how to solve them. We also discuss Pruning vs Quantization: Which is Better?, which focuses on comparing the effectiveness of these two methods in achieving model weight compression. Additional papers discussed focus on topics like using scalarization in multitask and multidomain learning to improve training and inference, using diffusion models for a sequence of state models and actions, applying geometric algebra with equivariance to transformers, and applying a deductive verification of chain of thought reasoning performed by LLMs.
The complete show notes for this episode can be found at twimlai.com/go/663. -
Today we’re joined by Michael Kearns, professor in the Department of Computer and Information Science at the University of Pennsylvania and an Amazon scholar. In our conversation with Michael, we discuss the new challenges to responsible AI brought about by the generative AI era. We explore Michael’s learnings and insights from the intersection of his real-world experience at AWS and his work in academia. We cover a diverse range of topics under this banner, including service card metrics, privacy, hallucinations, RLHF, and LLM evaluation benchmarks. We also touch on Clean Rooms ML, a secured environment that balances accessibility to private datasets through differential privacy techniques, offering a new approach for secure data handling in machine learning.
The complete show notes for this episode can be found at twimlai.com/go/662. -
Today we’re joined by Mike Miller, director of product at AWS responsible for the company’s “edutainment” products. In our conversation with Mike, we explore AWS PartyRock, a no-code generative AI app builder that allows users to easily create fun and shareable AI applications by selecting a model, chaining prompts together, and linking different text, image, and chatbot widgets together. Additionally, we discuss some of the previous tools Mike’s team has delivered at the intersection of developer education and entertainment, including DeepLens, a computer vision hardware device, DeepRacer, a programmable vehicle that uses reinforcement learning to navigate a track, and lastly, DeepComposer, a generative AI model that transforms musical inputs and creates accompanying compositions.
The complete show notes for this episode can be found at twimlai.com/go/661. -
Today we’re joined by Cody Coleman, co-founder and CEO of Coactive AI. In our conversation with Cody, we discuss how Coactive has leveraged modern data, systems, and machine learning techniques to deliver its multimodal asset platform and visual search tools. Cody shares his expertise in the area of data-centric AI, and we dig into techniques like active learning and core set selection, and how they can drive greater efficiency throughout the machine learning lifecycle. We explore the various ways Coactive uses multimodal embeddings to enable their core visual search experience, and we cover the infrastructure optimizations they’ve implemented in order to scale their systems. We conclude with Cody’s advice for entrepreneurs and engineers building companies around generative AI technologies.
The complete show notes for this episode can be found at twimlai.com/go/660. -
Today we’re joined by Kyle Roche, founder and CEO of Griptape to discuss patterns and middleware for LLM applications. We dive into the emerging patterns for developing LLM applications, such as off prompt data—which allows data retrieval without compromising the chain of thought within language models—and pipelines, which are sequential tasks that are given to LLMs that can involve different models for each task or step in the pipeline. We also explore Griptape, an open-source, Python-based middleware stack that aims to securely connect LLM applications to an organization’s internal and external data systems. We discuss the abstractions it offers, including drivers, memory management, rule sets, DAG-based workflows, and a prompt stack. Additionally, we touch on common customer concerns such as privacy, retraining, and sovereignty issues, and several use cases that leverage role-based retrieval methods to optimize human augmentation tasks.
The complete show notes for this episode can be found at twimlai.com/go/659. -
Today we’re joined by Prem Natarajan, chief scientist and head of enterprise AI at Capital One. In our conversation, we discuss AI access and inclusivity as technical challenges and explore some of Prem and his team’s multidisciplinary approaches to tackling these complexities. We dive into the issues of bias, dealing with class imbalances, and the integration of various research initiatives to achieve additive results. Prem also shares his team’s work on foundation models for financial data curation, highlighting the importance of data quality and the use of federated learning, and emphasizing the impact these factors have on the model performance and reliability in critical applications like fraud detection. Lastly, Prem shares his overall approach to tackling AI research in the context of a banking enterprise, including prioritizing mission-inspired research aiming to deliver tangible benefits to customers and the broader community, investing in diverse talent and the best infrastructure, and forging strategic partnerships with a variety of academic labs.
The complete show notes for this episode can be found at twimlai.com/go/658. -
Today we’re joined by Jay Emery, director of technical sales & architecture at Microsoft Azure. In our conversation with Jay, we discuss the challenges faced by organizations when building LLM-based applications, and we explore some of the techniques they are using to overcome them. We dive into the concerns around security, data privacy, cost management, and performance as well as the ability and effectiveness of prompting to achieve the desired results versus fine-tuning, and when each approach should be applied. We cover methods such as prompt tuning and prompt chaining, prompt variance, fine-tuning, and RAG to enhance LLM output along with ways to speed up inference performance such as choosing the right model, parallelization, and provisioned throughput units (PTUs). In addition to that, Jay also shared several intriguing use cases describing how businesses use tools like Azure Machine Learning prompt flow and Azure ML AI Studio to tailor LLMs to their unique needs and processes.
The complete show notes for this episode can be found at twimlai.com/go/657. - Show more