Episodes
-
In this podcast, the hosts discuss a research paper that explores how large language models (LLMs), like the ones used in chatbots, behave when placed in a simulated prison scenario. The researchers built a custom tool, zAImbardo, to simulate interactions between a guard and a prisoner, focusing on two key behaviors: persuasion, where the prisoner tries to convince the guard to allow extra privileges (like more yard time or an escape), and anti-social behavior, such as being toxic or violent. The study found that while some LLMs struggle to stay in character or hold meaningful conversations, others show distinct patterns of persuasion and anti-social actions. It also reveals that the personality of the guard (another LLM) can greatly influence whether the prisoner succeeds in persuading them or if harmful behaviors emerge, pointing to the potential dangers of LLMs in power-based interactions without human oversight.
Original paper:
Campedelli, G. M., Penzo, N., Stefan, M., Dessì, R., Guerini, M., Lepri, B., & Staiano, J. (2024). I want to break free! Anti-social behavior and persuasion ability of LLMs in multi-agent settings with social hierarchy. arXiv. https://arxiv.org/abs/2410.07109
-
This podcast episode discusses a research paper focused on making large language models (LLMs) safer and better aligned with human values. The authors introduce a new technique called DATA ADVISOR, which helps LLMs create safer and more reliable data by following guiding principles. DATA ADVISOR works by continuously reviewing the data generated by the model, spotting gaps or issues, and suggesting improvements for the next round of data creation. The study shows that this method makes LLMs safer without reducing their overall effectiveness, and it performs better than other current approaches for generating safer data.
Original paper:
Wang, F., Mehrabi, N., Goyal, P., Gupta, R., Chang, K.-W., & Galstyan, A. (2024). Data Advisor: Dynamic data curation for safety alignment of large language models. arXiv. https://arxiv.org/abs/2410.05269
-
Episodes manquant?
-
In this episode, we dive into the fascinating world of BodyShapeGPT, a breakthrough approach that turns simple text into realistic 3D human avatars. By using the power of LLaMA-3, a fine-tuned large language model, researchers have developed a system that can accurately shape a virtual body based on descriptive language. We'll explore how a unique dataset and custom algorithms make it all possible, and why this could revolutionize everything from gaming to virtual reality. Tune in to discover how technology is bridging the gap between words and virtual worlds!
Original paper:
Árbol, B. R., & Casas, D. (2024). BodyShapeGPT: SMPL Body Shape Manipulation with LLMs. https://arxiv.org/abs/2410.03556
-
In this episode, we're diving into the world of languages and computers! Have you ever wondered how people who speak in different ways or dialects can still communicate with the help of technology? Today, we’ll learn about a cool new method scientists are using to help computers understand Finnish dialects and turn them into standard Finnish! We’ll explore how this helps computers do a better job at things like reading and talking to us. Join us as we talk about how this amazing technology works and how it can even help people who speak other languages!Original paper
Partanen, N., Hämäläinen, M., & Alnajjar, K. (2019). Dialect text normalization to normative standard Finnish. In Workshop on Noisy User-generated Text (pp. 141-146). The Association for Computational Linguistics. https://aclanthology.org/D19-5519/
-
Join us on an exciting journey into the world of Artificial Intelligence (AI)! In this episode, we explore how countries around the world are creating smart strategies to use AI in the best way possible. You'll learn about the EPIC framework, which focuses on four important things—Education, Partnership, Infrastructure, and Community—to make sure AI helps everyone. Whether you're curious about robots, computers, or just how technology works, this fun and easy-to-understand episode will show you how we can use AI to make the world a better place for all!Original paper:
Tjondronegoro, D. W. (2024). Strategic AI Governance: Insights from Leading Nations. https://arxiv.org/abs/2410.01819
-
In this episode, we dive into groundbreaking research on how large language models (LLMs) handle complex legal reasoning. We discuss the challenges LLMs face when distinguishing between similar legal charges and explore a new framework called MALR, which uses a multi-agent approach and non-parametric learning to enhance AI's understanding of legal concepts. Tune in to learn how this innovative approach improves AI's performance, even surpassing human capabilities in some legal reasoning tasks.
Original paper:
Yuan, W., Cao, J., Jiang, Z., Kang, Y., Lin, J., Song, K., tianqianjin lin, Yan, P., Sun, C., & Liu, X. (2024). Can Large Language Models Grasp Legal Theories? Enhance Legal Reasoning with Insights from Multi-Agent Collaboration. https://arxiv.org/abs/2410.02507
-
In this episode, we explore a cool new way scientists are teaching robots to learn and make decisions, just like humans! We talk about how robots can now use something called "Cognitive Belief-Driven Q-Learning" (CBDQ)—it’s like how you use your brain to guess what might happen next based on what you’ve learned before. We’ll explain how this helps robots avoid making mistakes and get better at things like driving cars or playing games. Tune in to find out how these smart robots are getting even smarter, all by thinking a bit more like you!
Original paper:
Gu, X., Qiao, G., Jiang, C., Xia, T., & Mao, H. (2024). Mimicking Human Intuition: Cognitive Belief-Driven Q-Learning. https://arxiv.org/abs/2410.01739
-
In this episode, we dive into a groundbreaking method called One-Shot Style Adaptation (OSSA), designed to tackle a common challenge in deep learning: performance drop-offs when models face different environments than they were trained for. Unlike traditional approaches that need large amounts of data, OSSA requires only a single image to adjust the model, making it highly efficient. From weather changes to synthetic-to-real scenarios, OSSA shows promise in real-world applications with limited data. Join us as we explore this innovative and practical solution for object detection!Original paper:
Gerster, R., Caesar, H., Rapp, M., Wolpert, A., & Teutsch, M. (2024). OSSA: Unsupervised One-Shot Style Adaptation. https://arxiv.org/abs/2410.00900
-
This podcast episode explores a new model for intonation in the Russian language and how it can be adapted to other languages. The model focuses on analyzing the rise and fall of pitch within words, making it useful for tasks like automatically marking up speech data or improving text-to-speech systems. Overall, it’s a useful tool for both studying intonation and developing better text-to-speech technologies.
Original paper:
Tomilov, A., Gromova, A., & Svischev, A. (2024). Word-wise intonation model for cross-language TTS systems. https://arxiv.org/abs/2409.20374
-
In this podcast episode, we explore the LLMs4Synthesis framework, which aims to improve how Large Language Models (LLMs) generate scientific summaries and overviews. With research growing rapidly, it can be hard to keep up with all the new studies. This framework offers a solution by helping LLMs process scientific papers more effectively and create different types of summaries. The researchers discuss new ways to evaluate the quality of these summaries, ensuring they meet specific standards while still being informative. They also use reinforcement learning (where AI gets feedback to improve) to make sure the LLM produces high-quality scientific summaries.Original paper:
Giglou, H. B., D’Souza, J., & Auer, S. (2024). LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis. https://arxiv.org/abs/2409.18812
-
Welcome to our podcast, where we delve into groundbreaking research at the intersection of artificial intelligence and creativity. In this episode, we explore a fascinating system designed to automatically generate Finnish poetry. Tune in to hear how the system tackles essential elements like rhyme, meter, imagery, and metaphor, and why creative framing is vital to the system’s poetic intent. As we discuss the limitations of AI in grasping the subtle nuances of metaphor and semantics, we'll also explore future research aiming to better align AI-generated poetry with human perception.
Original paper:
Hämäläinen, M., & Alnajjar, K. (2019). Let’s FACE it. Finnish Poetry Generation with Aesthetics and Framing. In Proceedings of the 12th International Conference on Natural Language Generation (pp. 290-300). https://aclanthology.org/W19-8637/
-
In this episode, we dive deep into the world of AI and how it can better understand and assist humans in everyday tasks. Our discussion focuses on the limitations of current AI systems when it comes to following natural language instructions, particularly in collaborative environments where human intentions often remain implicit. We introduce FISER (Follow Instructions with Social and Embodied Reasoning), a groundbreaking framework designed to bridge this gap by allowing AI to infer human goals and intentions through social reasoning. We explore the innovative use of Transformer-based models to enhance collaborative AI systems and discuss the results of testing FISER on the HandMeThat benchmark, where it achieves state-of-the-art performance. Tune in to learn how this new approach could revolutionize the way AI interacts with the human world, moving beyond literal commands and into the realm of shared understanding.Original paper:
Wan, Y., Wu, Y., Wang, Y., Mao, J., & Jaques, N. (2024). Infer Human’s Intentions Before Following Natural Language Instructions. https://arxiv.org/abs/2409.18073
-
In this episode, we explore ZALM3, a revolutionary method designed to improve vision-language alignment in multi-turn multimodal medical dialogues. Patients often share images of their conditions with doctors, but these images can be low quality, with distracting backgrounds or off-center focus. ZALM3 uses a large language model to extract keywords from the ongoing conversation and employs a visual grounding model to crop and refine the image accordingly. This method enhances the alignment between the text and the image, leading to more accurate interpretations. We’ll also discuss the results of experiments across clinical datasets and the new subjective assessment metric introduced to evaluate this breakthrough technology. Join us as we delve into the future of AI-driven medical consultations!Original paper:
Li, Z., Zou, C., Ma, S., Yang, Z., Du, C., Tang, Y., Cao, Z., Zhang, N., Lai, J.-H., Lin, R.-S., Ni, Y., Sun, X., Xiao, J., Zhang, K., & Han, M. (2024). ZALM3: Zero-Shot Enhancement of Vision-Language Alignment via In-Context Information in Multi-Turn Multimodal Medical Dialogue. https://arxiv.org/abs/2409.17610
-
In this episode, we dive into a groundbreaking framework called AXIS, designed to make human-agent-computer interaction (HACI) faster, easier, and more accurate. Traditional digital assistants often rely on step-by-step interactions, which can be slow and error-prone. AXIS changes the game by using APIs (application programming interfaces) to complete tasks more efficiently. The episode breaks down how AXIS works, including its design, skills, and testing process. Real-world experiments, such as using AXIS with Microsoft Word, show how it can improve productivity and reduce mental effort, offering a glimpse into the future of smarter, more intuitive digital assistants.
Original paper:
Lu, J., Zhang, Z., Yang, F., Zhang, J., Wang, L., Du, C., Lin, Q., Rajmohan, S., Zhang, D., & Zhang, Q. (2024). Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents. https://arxiv.org/abs/2409.17140
-
In this episode, we dive into the groundbreaking "Gaussian Déjà-vu" framework, a cutting-edge method for creating realistic 3D head avatars. We'll explore how this innovative approach outshines traditional mesh-based and neural rendering techniques by using 3D Gaussian splatting, offering more flexibility and faster rendering. Learn how the system reconstructs 3D models from 2D images and personalizes them with monocular video, all while drastically reducing training times. Join us to discover how Gaussian Déjà-vu is shaping the future of photorealistic avatar creation!Original research:
Yan, P., Ward, R., Tang, Q., & Du, S. (2024). Gaussian Déjà-vu: Creating Controllable 3D Gaussian Head-Avatars with Enhanced Generalization and Personalization Abilities. https://arxiv.org/abs/2409.16147
-
In this groundbreaking podcast, we delve into the fascinating intersection of healthcare and artificial intelligence, exploring how emerging large language models (LLMs) are transforming the way doctors and patients communicate. In our latest episode, we unpack a pivotal study that tests cutting-edge AI models, like GPT-4 and LLaMA2, to evaluate the quality of palliative care conversations. Traditional methods of assessing patient-provider communication—like surveys and self-assessments—are time-consuming and costly. But could AI be the solution? Tune in to hear how LLMs are not only capturing nuanced metrics like "empathy" and "understanding" but also providing real-time feedback to improve clinical interactions. We’ll discuss the potential for AI to enhance patient outcomes, improve quality of care, and reshape the healthcare landscape. If you're curious about the future of healthcare technology, this episode is a must-listen!
Original paper:
Wang, Z., Yuan, F., LeBaron, V., Flickinger, T., & Barnes, L. E. (2024). PALLM: Evaluating and Enhancing PALLiative Care Conversations with Large Language Models. https://arxiv.org/abs/2409.15188
-
In this episode, we dive into the science behind what makes us laugh, using one of the most iconic sitcoms of all time—Friends. Have you ever wondered why you laugh at certain moments in the show? It’s not just the jokes themselves, but also the prerecorded laughter that signals humor. We’ll explore a fascinating study that developed an AI model to automatically detect humor in Friends by analyzing both the dialog and the laughter that follows. This model can identify when a joke is made with 78% accuracy and even predict how long the audience’s laughter should last! Join us as we break down how AI can understand humor, the role of laugh tracks in comedy, and what this research reveals about how we experience entertainment. Whether you're a Friends fan or just curious about AI, this episode is full of insights and fun!
Original paper:
Khalid Alnajjar, Mika Hämäläinen, Jörg Tiedemann, Jorma Laaksonen, and Mikko Kurimo. 2022. When to Laugh and How Hard? A Multimodal Approach to Detecting Humor and Its Intensity. In Proceedings of the 29th International Conference on Computational Linguistics https://aclanthology.org/2022.coling-1.598/
-
In this episode, we dive into a fascinating new approach to improving chatbot technology through the art of forgetting. We explore a groundbreaking paper introducing LUFY, an innovative method that leverages psychological insights to help chatbots selectively forget unimportant parts of conversations. Designed to improve user experience, LUFY focuses on emotionally arousing memories—those that humans are more likely to recall. The authors conducted one of the longest chatbot studies to date, revealing that LUFY outperforms existing models in both user satisfaction and retrieval accuracy. Tune in to learn how forgetting could be the key to more natural, long-term conversations with AI, and discover the implications of this research for the future of chatbot design.
Original paper:
Sumida, R., Inoue, K., & Kawahara, T. (2024). Should RAG Chatbots Forget Unimportant Conversations? Exploring Importance and Forgetting with Psychological Insights. https://arxiv.org/abs/2409.12524
-
Can Artificial Intelligence craft the next big taste sensation? In this groundbreaking episode, we explore how cutting-edge Large Language Models (LLMs) are transforming the food industry with a deep dive into a revolutionary new approach called FOODPUZZLE. Discover how a team of researchers built a Scientific Agent that uses AI to predict and enhance flavor profiles, outperforming traditional methods. With 978 food items and their flavor molecules as the secret ingredient, this episode unpacks the science behind the future of taste. Don't miss how AI could change what we eat—and how we experience flavor!
Original paper:
Huang, T., Lee, D., Sweeney, J., Shi, J., Steliotes, E., Lange, M., May, J., & Chen, M. (2024). FoodPuzzle: Developing Large Language Model Agents as Flavor Scientists. https://arxiv.org/abs/2409.12832
-
In this episode, we dive into the fascinating world of Vision Language Models (VLMs) and their potential to revolutionize how AI interacts with complex environments, like video games. Our focus is on a groundbreaking study that explores how an AI agent can learn to play the action-packed role-playing game Black Myth: Wukong using only visual input. The researchers behind this work developed a new framework, VARP, which allows the AI to navigate the game's challenges by mimicking human actions and planning its own strategies. Amazingly, this AI was able to master 90% of the game's easy and medium-level combat tasks! We’ll discuss how the framework works, the importance of the human gameplay data they’ve shared, and what this means for the future of AI in gaming and beyond. Tune in to learn how this research could shape the next generation of intelligent agents in complex, visually-rich environments!Original paper:
Chen, P., Bu, P., Song, J., Gao, Y., & Zheng, B. (2024). Can VLMs Play Action Role-Playing Games? Take Black Myth Wukong as a Study Case. https://arxiv.org/abs/2409.12889
Demo video: https://varp-agent.github.io/
- Montre plus