Data Skeptic - Подкаст

Эпизоды

News Recommendations
2 июл· Data Skeptic
News recommendation algorithms influence far more than what stories we click—they can shape our understanding of the world. In this episode, Kyle Polich speaks with Andreea Iana about responsible AI, filter bubbles, multilingual news recommendation, and her open-source NewsRecLib framework for evaluating recommender systems. They explore why bigger models aren't always better and how future recommendation systems can balance personalization with diversity and societal impact.
- Слушать Слушать снова Продолжить Воспроизведение ...
- Слушать позже Слушать позже
Give Users the Wheel
23 июн· Data Skeptic
What if you could simply tell a recommendation system what you want instead of relying on likes, dislikes, and watch history? Kyle Polich talks with Fuyuan Lyu about the DPR framework, which combines large language models and traditional recommender systems to give users direct control over recommendations through natural language. Together they explore how conversational interfaces could transform platforms like YouTube, TikTok, and news feeds while preserving the strengths of modern recommendation algorithms.
- Слушать Слушать снова Продолжить Воспроизведение ...
- Слушать позже Слушать позже
Пропущенные эпизоды?

Нажмите здесь, чтобы обновить ленту.
AutoLike
17 июн· Data Skeptic
How can researchers audit recommendation systems when the algorithms are hidden from view? Hieu Le joins Kyle Polich to discuss Auto-Like, a reinforcement learning framework that systematically explores how platforms like TikTok personalize content feeds. The conversation covers recommendation transparency, black-box auditing, and the future of platform accountability.
- Слушать Слушать снова Продолжить Воспроизведение ...
- Слушать позже Слушать позже
Student Spotlight: Aaron Payne, Data Analyst
1 мая· Data Skeptic
Aaron Payne, an MBA student at Georgia Tech studying business analytics and a Senior Insights Analyst at Chick-fil-A, joins Kyle Polich to talk about turning analytics into decisions that matter. They unpack a real-world forecasting project with Comfama in Colombia, including messy data realities, interpretability tradeoffs, and why "data science for good" starts with the people impacted.
- Слушать Слушать снова Продолжить Воспроизведение ...
- Слушать позже Слушать позже
The Future is Agentic in Recommender Systems
25 апр· Data Skeptic
Kyle Polich sits down with Yashar Deldjoo, research scientist and Associate Professor at the Polytechnic University of Bari, to explore how recommender systems have evolved and why trustworthiness matters. They unpack key dimensions of responsible AI, including robustness to adversarial attacks, privacy, explainability, and fairness, and discuss how LLMs introduce new risks like hallucinations.

The episode closes with a look at "agentic" recommender systems, where tools and memory shift recommendations from ranked lists to end-to-end task completion.
- Слушать Слушать снова Продолжить Воспроизведение ...
- Слушать позже Слушать позже
Book Ratings and Recommendations
27 мар· Data Skeptic
Goodreads star ratings can be misleading as measures of "book quality," and research from Hannes Rosenbusch suggests that for many professionally published books, differences between readers often matter more than differences between books. The episode also explores how to model reader preferences, why reviews often reveal more about the reviewer than the text, and how LLMs can aid computational literary research while still falling short of human editors in creative writing.
- Слушать Слушать снова Продолжить Воспроизведение ...
- Слушать позже Слушать позже
Disentanglement and Interpretability in Recommender Systems
10 мар· Data Skeptic
Ervin Dervishaj, a PhD student at the University of Copenhagen, discusses his research on disentangled representation learning in recommender systems, finding that while disentanglement strongly correlates with interpretability, it doesn't consistently improve recommendation performance. The conversation explores how disentanglement acts as a regularizer that can enhance user trust and interpretability at the potential cost of some accuracy, and touches on the future of large language models in denoising user interaction data.
- Слушать Слушать снова Продолжить Воспроизведение ...
- Слушать позже Слушать позже
Collective Altruism in Recommender Systems
27 фев· Data Skeptic
Ekaterina (Kat) Fedorova from MIT EECS joins us to discuss strategic learning in recommender systems—what happens when users collectively coordinate to game recommendation algorithms. Kat's research reveals surprising findings: algorithmic "protest movements" can paradoxically help platforms by providing clearer preference signals, and the challenge of distinguishing coordinated behavior from bot activity is more complex than it appears. This episode explores the intersection of machine learning and game theory, examining what happens when your training data actively responds to your algorithm.
- Слушать Слушать снова Продолжить Воспроизведение ...
- Слушать позже Слушать позже
Niche vs Mainstream
18 фев· Data Skeptic
Anas Buhayh discusses multi-stakeholder fairness in recommender systems and the S'mores framework—a simulation allowing users to choose between mainstream and niche algorithms. His research shows specialized recommenders improve utility for niche users while raising questions about filter bubbles and data privacy.
- Слушать Слушать снова Продолжить Воспроизведение ...
- Слушать позже Слушать позже
Healthy Friction in Job Recommender Systems
2 фев· Data Skeptic
In this episode, host Kyle Polich speaks with Roan Schellingerhout, a fourth-year PhD student at Maastricht University, about explainable multi-stakeholder recommender systems for job recruitment. Roan discusses his research on creating AI-powered job matching systems that balance the needs of multiple stakeholders—job seekers, recruiters, HR professionals, and companies. The conversation explores different types of explanations for job recommendations, including textual, bar chart, and graph-based formats, with findings showing that lay users strongly prefer simple textual explanations over more technical visualizations. Roan shares insights from his "healthy friction" study, which tested whether users could distinguish between real AI-generated explanations and randomly generated ones, revealing that participants often used explanations as information sources rather than decision-making tools.

The discussion delves into the technical architecture behind these systems, including the use of knowledge graphs built from tabular data, inference rules, and large language models to generate human-friendly explanations. Roan explains how his research aims to open the black box of recommender systems, making them more transparent and trustworthy for non-technical users. Looking forward, he discusses ongoing work on automated knowledge graph construction from resumes and job listings, research into fairness considerations around gender and location, and plans for real-world testing with actual job seekers. The episode concludes with Roan's vision for the future: AI systems that support rather than replace human recruiters, making the job search process less grueling while maintaining the essential human judgment that recruitment requires.
- Слушать Слушать снова Продолжить Воспроизведение ...
- Слушать позже Слушать позже
Fairness in PCA-Based Recommenders
26 янв· Data Skeptic
In this episode, we explore the fascinating world of recommender systems and algorithmic fairness with David Liu, Assistant Research Professor at Cornell University's Center for Data Science for Enterprise and Society. David shares insights from his research on how machine learning models can inadvertently create unfairness, particularly for minority and niche user groups, even without any malicious intent. We dive deep into his groundbreaking work on Principal Component Analysis (PCA) and collaborative filtering, examining why these fundamental techniques sometimes fail to serve all users equally.

David introduces the concept of "power niche users" - highly active users with specialized interests who generate valuable data that can benefit the entire platform. We discuss his paper "When Collaborative Filtering Is Not Collaborative," which reveals how PCA can over-specialize on popular content while neglecting both niche items and even failing to properly recommend popular artists to new potential fans. David presents solutions through item-weighted PCA and thoughtful data upweighting strategies that can improve both fairness and performance simultaneously, challenging the common assumption that these goals must be in tension. The conversation spans from theoretical insights to practical applications at companies like Meta, offering a comprehensive look at the future of personalized recommendations.
- Слушать Слушать снова Продолжить Воспроизведение ...
- Слушать позже Слушать позже
Video Recommendations in Industry
26 дек 2025· Data Skeptic
In this episode, Kyle Polich sits down with Cory Zechmann, a content curator working in streaming television with 16 years of experience running the music blog "Silence Nogood." They explore the intersection of human curation and machine learning in content discovery, discussing the concept of "algatorial" curation—where algorithms and editorial expertise work together. Key topics include the cold start problem, why every metric is just a "proxy metric" for what users actually want, the challenge of filter bubbles, and the importance of balancing familiarity with discovery. Cory shares insights on why TikTok's algorithm works so well (clean data and massive interaction volume), the crucial role of homepage curation, and how human curators help by contextualizing content, cleaning data, and identifying positive feedback loops that algorithms might miss.

The conversation covers practical challenges like measuring "surprise and delight," the content deluge created by democratized creation tools, and why trust in tech companies is essential for better personalization. Cory emphasizes that discovery is "a good type of friction" and explains how the CODE framework (Capture, Organize, Distill, Express, plus Analysis) guides professional curation work. Looking to the future, they discuss the need for systems thinking that creates narrative connections between content, the potential for conversational AI to help users articulate preferences, and why diverse perspectives beyond engineering are crucial for building effective discovery systems. Resources mentioned include the newsletter "Top Information Retrieval Papers of the Week" and Notebook LM for synthesizing research.
- Слушать Слушать снова Продолжить Воспроизведение ...
- Слушать позже Слушать позже
Eye Tracking in Recommender Systems
18 дек 2025· Data Skeptic
In this episode, Santiago de Leon takes us deep into the world of eye tracking and its revolutionary applications in recommender systems. As a researcher at the Kempelin Institute and Brno University, Santiago explains the mechanics of eye tracking technology—how it captures gaze data and processes it into fixations and saccades to reveal user browsing patterns. He introduces the groundbreaking RecGaze dataset, the first eye tracking dataset specifically designed for recommender systems research, which opens new possibilities for understanding how users interact with carousel interfaces like Netflix. Through collaboration between psychologists and AI researchers, Santiago's work demonstrates how eye tracking can uncover insights about positional bias and user engagement that traditional click data misses.

Beyond the technical aspects, Santiago addresses the ethical considerations surrounding eye tracking data, particularly concerning pupil data and privacy. He emphasizes the importance of questioning assumptions in recommender systems and shares practical advice for improving recommendation algorithms by understanding actual user behavior rather than relying solely on click patterns. Looking forward, Santiago discusses exciting future directions including simulating user behavior using eye tracking data, addressing the cold start problem, and translating these findings to e-commerce applications. This conversation challenges researchers and practitioners to think more deeply about de-biasing clicks and leveraging eye tracking as a powerful tool to enhance user experience in recommendation systems.
- Слушать Слушать снова Продолжить Воспроизведение ...
- Слушать позже Слушать позже
Cracking the Cold Start Problem
8 дек 2025· Data Skeptic
In this episode of Data Skeptic, we dive deep into the technical foundations of building modern recommender systems. Unlike traditional machine learning classification problems where you can simply apply XGBoost to tabular data, recommender systems require sophisticated hybrid approaches that combine multiple techniques. Our guest, Boya Xu, an assistant professor of marketing at Virginia Tech, walks us through a cutting-edge method that integrates three key components: collaborative filtering for dimensionality reduction, embeddings to represent users and items in latent space, and bandit learning to balance exploration and exploitation when deploying new recommendations.

Boya shares insights from her research on how recommender systems impact both consumers and content creators across e-commerce and social media platforms. We explore critical challenges like the cold start problem—how to make good recommendations for brand new users—and discuss how her approach uses demographic information to create informative priors that accelerate learning. The conversation also touches on algorithmic fairness, revealing how her method reduces bias between majority and minority (niche preference) users by incorporating active learning through bandit algorithms. Whether you're interested in the mathematics of recommendation engines or the broader implications for digital platforms, this episode offers a comprehensive look at the state-of-the-art in recommender system design.
- Слушать Слушать снова Продолжить Воспроизведение ...
- Слушать позже Слушать позже
Designing Recommender Systems for Digital Humanities
23 ноя 2025· Data Skeptic
In this episode of Data Skeptic, we explore the fascinating intersection of recommender systems and digital humanities with guest Florian Atzenhofer-Baumgartner, a PhD student at Graz University of Technology. Florian is working on Monasterium.net, Europe's largest online collection of historical charters, containing millions of medieval and early modern documents from across the continent. The conversation delves into why traditional recommender systems fall short in the digital humanities space, where users range from expert historians and genealogists to art historians and linguists, each with unique research needs and information-seeking behaviors.

Florian explains the technical challenges of building a recommender system for cultural heritage materials, including dealing with sparse user-item interaction matrices, the cold start problem, and the need for multi-modal similarity approaches that can handle text, images, metadata, and historical context. The platform leverages various embedding techniques and gives users control over weighting different modalities—whether they're searching based on text similarity, visual imagery, or diplomatic features like issuers and receivers. A key insight from Florian's research is the importance of balancing serendipity with utility, collection representation to prevent bias, and system explainability while maintaining effectiveness.

The discussion also touches on unique evaluation challenges in non-commercial recommendation contexts, including Florian's "research funnel" framework that considers discovery, interaction, integration, and impact stages. Looking ahead, Florian envisions recommendation systems becoming standard tools for exploration across digital archives and cultural heritage repositories throughout Europe, potentially transforming how researchers discover and engage with historical materials. The new version of Monasterium.net, set to launch with enhanced semantic search and recommendation features, represents an important step toward making cultural heritage more accessible and discoverable for everyone.
- Слушать Слушать снова Продолжить Воспроизведение ...
- Слушать позже Слушать позже
DataRec Library for Reproducible in Recommend Systems
13 ноя 2025· Data Skeptic
In this episode of Data Skeptic's Recommender Systems series, host Kyle Polich explores DataRec, a new Python library designed to bring reproducibility and standardization to recommender systems research. Guest Alberto Carlo Maria Mancino, a postdoc researcher from Politecnico di Bari, Italy, discusses the challenges of dataset management in recommendation research—from version control issues to preprocessing inconsistencies—and how DataRec provides automated downloads, checksum verification, and standardized filtering strategies for popular datasets like MovieLens, Last.fm, and Amazon reviews.

The conversation covers Alberto's research journey through knowledge graphs, graph-based recommenders, privacy considerations, and recommendation novelty. He explains why small modifications in datasets can significantly impact research outcomes, the importance of offline evaluation, and DataRec's vision as a lightweight library that integrates with existing frameworks rather than replacing them. Whether you're benchmarking new algorithms or exploring recommendation techniques, this episode offers practical insights into one of the most critical yet overlooked aspects of reproducible ML research.
- Слушать Слушать снова Продолжить Воспроизведение ...
- Слушать позже Слушать позже
Shilling Attacks on Recommender Systems
5 ноя 2025· Data Skeptic
In this episode of Data Skeptic's Recommender Systems series, Kyle sits down with Aditya Chichani, a senior machine learning engineer at Walmart, to explore the darker side of recommendation algorithms. The conversation centers on shilling attacks—a form of manipulation where malicious actors create multiple fake profiles to game recommender systems, either to promote specific items or sabotage competitors. Aditya, who researched these attacks during his undergraduate studies at SPIT before completing his master's in computer science with a data science specialization at UC Berkeley, explains how these vulnerabilities emerge particularly in collaborative filtering systems. From promoting a friend's ska band on Spotify to inflating product ratings on e-commerce platforms, shilling attacks represent a significant threat in an industry where approximately 4% of reviews are fake, translating to $800 billion in annual sales in the US alone.

The discussion delves deep into collaborative filtering, explaining both user-user and item-item approaches that create similarity matrices to predict user preferences. However, these systems face various shilling attacks of increasing sophistication: random attacks use minimal information with average ratings, while segmented attacks strategically target popular items (like Taylor Swift albums) to build credibility before promoting target items. Bandwagon attacks focus on highly popular items to connect with genuine users, and average attacks leverage item rating knowledge to appear authentic. User-user collaborative filtering proves particularly vulnerable, requiring as few as 500 fake profiles to impact recommendations, while item-item filtering demands significantly more resources. Aditya addresses detection through machine learning techniques that analyze behavioral patterns using methods like PCA to identify profiles with unusually high correlation and suspicious rating consistency. However, this remains an evolving challenge as attackers adapt strategies, now using large language models to generate more authentic-seeming fake reviews. His research with the MovieLens dataset tested detection algorithms against synthetic attacks, highlighting how these concerns extend to modern e-commerce systems. While companies rarely share attack and detection data publicly to avoid giving attackers advantages, academic research continues advancing both offensive and defensive strategies in recommender systems security.
- Слушать Слушать снова Продолжить Воспроизведение ...
- Слушать позже Слушать позже
Music Playlist Recommendations
29 окт 2025· Data Skeptic
In this episode, Rebecca Salganik, a PhD student at the University of Rochester with a background in vocal performance and composition, discusses her research on fairness in music recommendation systems. She explores three key types of fairness—group, individual, and counterfactual—and examines how algorithms create challenges like popularity bias (favoring mainstream content) and multi-interest bias (underserving users with diverse tastes). Rebecca introduces LARP, her multi-stage multimodal framework for playlist continuation that uses contrastive learning to align text and audio representations, learn song relationships, and create playlist-level embeddings to address the cold start problem.

A significant contribution of Rebecca's work is the Music Semantics dataset, created by scraping Reddit discussions to capture how people naturally describe music using atmospheric qualities, contextual comparisons, and situational associations rather than just technical features. This dataset, available on Hugging Face, enables more nuanced recommendation systems that better understand user preferences and support niche tastes. Her research utilizes industry datasets including Last.fm and Spotify's Million Playlist Dataset, and points toward exciting future applications in music generation and multimodal systems that combine audio, text, and video.
- Слушать Слушать снова Продолжить Воспроизведение ...
- Слушать позже Слушать позже
Bypassing the Popularity Bias
15 окт 2025· Data Skeptic
- Слушать Слушать снова Продолжить Воспроизведение ...
- Слушать позже Слушать позже
Sustainable Recommender Systems for Tourism
9 окт 2025· Data Skeptic
In this episode, we speak with Ashmi Banerjee, a doctoral candidate at the Technical University of Munich, about her pioneering research on AI-powered recommender systems in tourism. Ashmi illuminates how these systems can address exposure bias while promoting more sustainable tourism practices through innovative approaches to data acquisition and algorithm design. Key highlights include leveraging large language models for synthetic data generation, developing recommendation architectures that balance user satisfaction with environmental concerns, and creating frameworks that distribute tourism more equitably across destinations. Ashmi's insights offer valuable perspectives for both AI researchers and tourism industry professionals seeking to implement more responsible recommendation technologies.
- Слушать Слушать снова Продолжить Воспроизведение ...
- Слушать позже Слушать позже
Показать больше

Эпизоды

News Recommendations

Give Users the Wheel

AutoLike

Student Spotlight: Aaron Payne, Data Analyst

The Future is Agentic in Recommender Systems

Book Ratings and Recommendations

Disentanglement and Interpretability in Recommender Systems

Collective Altruism in Recommender Systems

Niche vs Mainstream

Healthy Friction in Job Recommender Systems

Fairness in PCA-Based Recommenders

Video Recommendations in Industry

Eye Tracking in Recommender Systems

Cracking the Cold Start Problem

Designing Recommender Systems for Digital Humanities

DataRec Library for Reproducible in Recommend Systems

Shilling Attacks on Recommender Systems

Music Playlist Recommendations

Bypassing the Popularity Bias

Sustainable Recommender Systems for Tourism