Episodit
-
HuggingFace has upgraded the Open LLM Leaderboard to v2, adding new benchmarks and improving the evaluation suite for easier reproducibility.
Gemma 2, a new addition to the Gemma family of lightweight open models, delivers the best performance for its size and offers competitive alternatives to models that are 2-3ร bigger.
SeaKR is a new model that re-ranks retrieved knowledge based on the LLM's self-aware uncertainty, outperforming existing adaptive RAG methods in generating text with relevant and accurate information.
Step-DPO is a new method that enhances the robustness and factuality of LLMs by learning from human feedback, achieving impressive results in long-chain mathematical reasoning.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:21 HuggingFace Updates Open LLM Leaderboard
03:19 Gemma 2: Improving Open Language Models at a Practical Size
04:16 From bare metal to a 70B model: infrastructure set-up and scripts
05:21 Fake sponsor
07:11 SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation
08:47 Simulating Classroom Education with LLM-Empowered Agents
10:16 Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
12:31 Outro
-
OpenAI's advanced Voice Mode for ChatGPT Plus users has been delayed, but the company is taking a cautious approach to ensure safety and reliability.
ESM3 is a language model that can simulate 500 million years of evolution, making biology programmable and opening up possibilities for medicine, biology research, and clean energy.
R2R is an open-source project on GitHub that offers a comprehensive and state-of-the-art retrieval-augmented generation system for developers, making it accessible to anyone who wants to try it out.
MG-LLaVA is a new multi-modal large language model that enhances visual processing capabilities by incorporating a multi-granularity vision flow, including low-resolution, high-resolution, and object-centric features.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:36 OpenAI Delays ChatGPT Voice Mode
03:27 ESM3 Simulating 500 million years of evolution with a language model
04:38 Rag to Riches
06:00 Fake sponsor
08:11 MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
09:49 Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
11:13 Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
13:02 Outro
-
Puuttuva jakso?
-
Apple and Meta's failed partnership due to privacy concerns
IBM's integration of AI technology into quantum computing
Record labels suing AI startups for training on copyrighted material
Research papers on improving multimodal understanding, reinforcement learning, and automated software engineering
Contact: [email protected]
Timestamps:
00:34 Introduction
02:07 Apple shelved the idea of integrating Metaโs AI models over privacy concerns, report says
03:25 IBM Develops The AI-Quantum Link
05:25 Record Labels Sue Two Startups for Training AI Models on Their Songs
06:50 Fake sponsor
08:42 Long Context Transfer from Language to Vision
10:27 WARP: On the Benefits of Weight Averaged Rewarded Policies
12:11 BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
13:55 Outro
-
Safe Superintelligence Inc. has launched with the goal of building a safe superintelligence AI that won't turn on humanity.
Dell, Nvidia, and Super Micro Computer are partnering with xAI and Elon Musk to build a massive supercomputer that could use up to 100,000 Nvidia H100 GPUs, potentially making it 4x larger than the biggest existing AI clusters.
Anthropic has launched Claude 3.5 Sonnet, their latest model family, which outperforms competitor models and even their own Claude 3 Opus on a wide range of evaluations.
The papers discussed in this episode explore the decision boundaries of large language models, auto-optimized training hyperparameters for IR models, and thinking step-by-step across modalities using whiteboard-of-thought. These findings could have important implications for the future development of AI.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:40 Ilya Sutskever Launches Safe Superintelligence Inc.
03:04 Dell joins forces with Nvidia, Grok, xAI and Elon Musk
04:23 Anthropic Lauches Claude 3.5 Sonnet
06:10 Fake sponsor
08:16 Probing the Decision Boundaries of In-context Learning in Large Language Models
09:47 Prompts as Auto-Optimized Training Hyperparameters: Training Best-in-Class IR Models from Scratch with 10 Gold Labels
11:05 Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities
12:54 Outro
-
Google DeepMind's new AI tool that generates video soundtracks by combining text prompts with visual content.
Challenges of building large training AI clusters, including power, network topology, and reliability.
How large language models acquire factual knowledge during pretraining and their probabilistic reasoning capabilities.
LLARVA's vision-action instruction tuning that enhances robot learning.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:47 Google DeepMindโs new AI tool uses video pixels and text prompts to generate soundtracks
03:31 100,000 H100 Clusters: Power, Network Topology, Ethernet vs InfiniBand, Reliability, Failures, Checkpointing
05:22 Large language model data pipelines and Common Crawl (WARC/WAT/WET)
06:47 Fake sponsor
08:20 How Do Large Language Models Acquire Factual Knowledge During Pretraining?
10:01 What Are the Odds? Language Models Are Capable of Probabilistic Reasoning
11:22 LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
13:06 Outro
-
TikTok is expanding its Symphony ad suite with AI-generated avatars of creators and paid actors, as well as a global translation tool for multi-language support.
NVIDIA has released an open synthetic data generation pipeline for training large language models, which could benefit industries that rely on natural language processing.
Cohere's latest generative models, Command R and R+, can automate and streamline complex business workflows, saving time and increasing efficiency.
XLand-100B is a large-scale dataset for in-context reinforcement learning, providing a challenging benchmark for researchers in the field. CountGen addresses the challenge of controlling the number of depicted objects in text-to-image generation, while MM-NIAH is the first benchmark specifically designed to test the comprehension abilities of existing multimodal large language models.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:23 TikTok ads may soon contain AI-generated avatars of your favorite creators
02:59 NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models
04:43 Automating Complex Business Workflows with Cohere: Multi-Step Tool Use in Action
06:17 Fake sponsor
08:22 XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
10:23 Make It Count: Text-to-Image Generation with an Accurate Number of Objects
11:58 Needle In A Multimodal Haystack
13:37 Outro
-
Meta has paused its plans to train AI models on EU users' Facebook and Instagram posts due to concerns about privacy violations and lack of transparency.
McDonald's is ending its AI drive-thru ordering partnership with IBM, but is confident that a voice-ordering solution for drive-thru will be part of their restaurants' future.
"Creativity Has Left the Chat: The Price of Debiasing Language Models" explores the trade-off between consistency and creativity when selecting the appropriate model for creative tasks such as copywriting and ad creation.
"VideoGUI: A Benchmark for GUI Automation from Instructional Videos" highlights the need for better models and benchmarks to advance GUI automation.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:35 Meta won't train AI on Euro posts after all, as watchdogs put their paws down
03:10 McDonaldโs will stop testing AI to take drive-thru orders, for now
04:52 An Interview with AMD CEO Lisa Su About Solving Hard Problems
05:53 Fake sponsor
07:52 Creativity Has Left the Chat: The Price of Debiasing Language Models
09:23 VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
11:19 VideoGUI: A Benchmark for GUI Automation from Instructional Videos
12:58 Outro
-
Samsung showcases new manufacturing roadmap and AI chipmaking platform to compete with TSMC.
OpenAI CTO addresses Elon Musk's criticism and reveals that their internal models aren't far ahead of what's available for free.
Meta's MLow low-bitrate audio codec improves audio quality for slow-speed connections.
Google DeepMind's TransNAR model combines Transformers with neural algorithmic reasoners for better algorithmic reasoning.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:42 Samsung Showcases AI-Era Vision and Latest Foundry Technologies at SFF 2024
02:59 OpenAI CTO Speaks About Elon Musk and Future Models
04:42 MLow: Metaโs low bitrate audio codec
05:51 Fake sponsor
08:08 Depth Anything V2
09:46 Transformers meet Neural Algorithmic Reasoners
11:14 Can't Hide Behind the API: Stealing Black-Box Commercial Embedding Models
12:54 Outro
-
Microsoft has retired its Copilot GPT Builder feature, citing a shift in focus towards enterprise and commercial applications.
TextGrad is a framework that performs automatic "differentiation" via text, using natural language feedback from large language models to optimize variables in computation graphs.
"What If We Recaption Billions of Web Images with LLaMA-3?" is a paper that recaptioned 1.3 billion images from a web-crawled dataset using LLaMA-3, resulting in enhanced zero-shot performance in cross-modal retrieval tasks and improved alignment with users' text instructions for generative models.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:38 Elon Musk withdraws lawsuit against OpenAI
02:47 Microsoft Kills Copilot GPT Builder
04:18 Uncensor any LLM with abliteration
05:58 Fake sponsor
07:52 TextGrad: Automatic "Differentiation" via Text
09:41 Simple and Effective Masked Diffusion Language Models
10:52 What If We Recaption Billions of Web Images with LLaMA-3?
12:31 Outro
-
Apple's unique approach to AI development, focusing only on personal devices and prioritizing user privacy.
The ARC Prize competition pushing the boundaries of AI development towards AGI, incentivizing open-source research.
"Improve Mathematical Reasoning in Language Models by Automated Process Supervision" paper proposing a novel approach to improving mathematical reasoning performance of large language models.
"The Prompt Report: A Systematic Survey of Prompting Techniques" paper establishing a structured understanding of prompts for GenAI systems.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:56 Apple execs explain why its AI is different from competitors
03:11 ANNOUNCING ARC PRIZE
04:56 Introducing Appleโs On-Device and Server Foundation Models
06:17 Fake sponsor
08:30 Improve Mathematical Reasoning in Language Models by Automated Process Supervision
10:22 Simple and Effective Masked Diffusion Language Models
11:59 The Prompt Report: A Systematic Survey of Prompting Techniques
13:45 Outro
-
Perplexity, an AI startup, has been accused of plagiarism by news outlets like Forbes and CNBC, raising concerns about the erosion of trust in media and the impact of AI on journalism.
The article "TechScape: How cheap, outsourced labor in Africa is shaping AI English" from The Guardian highlights the impact of outsourcing AI training to anglophonic knowledge workers in parts of the global south, and raises questions about the impact on language, culture, and identity.
The paper "Show, Don't Tell: Aligning Language Models with Demonstrated Feedback" from Stanford University introduces a method called DITTO that uses a small number of demonstrations to customize language models, showing promising results in fine-grained style and task alignment.
"WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild" from the Allen Institute for AI and the University of Washington introduces an automated evaluation framework designed to benchmark large language models on challenging real-world user queries, providing a more reliable and interpretable evaluation of models' performance.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:36 AI startup Perplexity accused of โdirectly ripping offโ news outlets like Forbes, CNBC without proper credit
03:32 TechScape: How cheap, outsourced labour in Africa is shaping AI English
04:34 Thread: an AI jupyter notebook
05:29 Fake sponsor
07:34 Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
08:56 WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
10:46 Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
12:28 Outro
-
The impending launch of the real Siri by Apple, with improvements in reliability and integration inside apps.
The Mixture-of-Agents approach to leverage the collective strengths of multiple large language models, achieving state-of-the-art performance.
The Proofread feature in Google's Gboard, using a large language model to provide sentence-level and paragraph-level corrections with a single tap.
The Comprehensive RAG Benchmark, shedding light on the limitations of current question answering models and laying the groundwork for a KDD Cup 2024 challenge.
Contact: [email protected]
Timestamps:
00:34 Introduction
02:13 Is Apple about to finally launch the real Siri?
04:03 WARC-GPT: An Open-Source Tool for Exploring Web Archives Using AI
05:10 Claudeโs Character
06:47 Fake sponsor
08:45 Mixture-of-Agents Enhances Large Language Model Capabilities
10:18 Proofread: Fixes All Errors with One Tap
11:53 CRAG -- Comprehensive RAG Benchmark
14:03 Outro
-
AI used to predict potential new antibiotics in groundbreaking study.
Stable Audio Open: an open source model that allows users to create short audio samples and sound effects from text prompts.
The ethical responsibilities of AI researchers when it comes to warning about the dangers of advanced artificial intelligence.
Cutting-edge research on AI and robotics, including large-scale simulations, in-context learning, and skill composition in modular arithmetic tasks.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:21 AI used to predict potential new antibiotics in groundbreaking study
02:40 Introducing Stable Audio Open - An Open Source Model for Audio Samples and Sound Design
04:22 A Right to Warn about Advanced Artificial Intelligence
05:23 Fake sponsor
07:10 RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
08:56 Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks
10:36 Guiding a Diffusion Model with a Bad Version of Itself
12:24 Outro
-
Amazon's new AI system to detect damaged or incorrect items before they ship.
Elon Musk's controversial decision to prioritize X and xAI over Tesla for AI chips.
"To Believe or Not to Believe Your LLM" paper on uncertainty quantification in Large Language Models.
"Guiding a Diffusion Model with a Bad Version of Itself" paper on improving image generation with diffusion models.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:47 Learn how Amazon uses AI to spot damaged products before theyโre shipped to customers
03:17 Elon Musk ordered Nvidia to ship thousands of AI chips reserved for Tesla to X and xAI
05:08 FineWeb: decanting the web for the finest text data at scale
06:16 Fake sponsor
08:05 To Believe or Not to Believe Your LLM
09:33 Guiding a Diffusion Model with a Bad Version of Itself
11:06 Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks
12:54 Outro
-
Microsoft is investing $3.2 billion in Sweden for cloud and AI infrastructure, deploying 20,000 advanced graphics processing units and training 250,000 Swedes with AI skills over three years.
"Grokfast" is a new algorithm that accelerates generalization under the grokking phenomenon in machine learning by amplifying the slow-varying component of gradients, improving performance on tasks like image classification.
"Zipper" is a multi-tower decoder architecture that uses cross-attention to flexibly compose multimodal generative models from independently pre-trained unimodal decoders, showcasing superior performance in tasks like speech-to-text generation.
"MetRag" is a new framework for retrieval augmented generation that combines similarity and utility-oriented models, using an LLM as a task adaptive summarizer to generate knowledge-augmented text and outperforming existing models on knowledge-intensive tasks like finance and medicine.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:49 Microsoft to invest $3.2 bln in Swedish cloud, AI
03:42 State Space Duality (Mamba-2) Part I - The Model
04:47 Sam Altman, Lately
06:08 Fake sponsor
08:39 Grokfast: Accelerated Grokking by Amplifying Slow Gradients
10:11 Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities
11:38 Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
13:52 Outro
-
Nvidia unveils plans to accelerate the advance of artificial intelligence, partnering with companies and countries to build AI factories and releasing Nvidia ACE generative AI.
Finnish startup Binit develops an AI gadget that tracks household waste to encourage recycling, with potential benefits in improving recycling efficiency.
"The Intellectual Obesity Crisis" article discusses how we've become addicted to useless information, just like we evolved to crave sugar because it was a scarce source of energy.
Three AI research papers are discussed, including a method to compress second-order optimizer states to lower bitwidths, the first-ever full-spectrum, multi-modal evaluation benchmark of MLLMs in video analysis, and a theoretical connection between Transformers and state-space models leading to a faster and more efficient alternative to existing models.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:21 AI hardware firm Nvidia unveils next-gen products at Taiwan tech expo
02:48 Binit is bringing AI to trash
04:39 The Intellectual Obesity Crisis
06:22 Fake sponsor
08:18 4-bit Shampoo for Memory-Efficient Network Training
10:01 Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
11:51 Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
13:30 Outro
-
Google's AI Overviews are improving to provide accurate and helpful information.
Nvidia's new embedding model, NV-Embed-v1, ranks number one on the Massive Text Embedding Benchmark.
Matryoshka Query Transformer (MQT) offers flexibility to Large Vision-Language Models (LVLMs) by encoding an image into a variable number of visual tokens during inference.
Contextual Position Encoding (CoPE) improves the position encoding method in Large Language Models (LLMs) and solves tasks where popular position embeddings fail.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:35 AI Overviews: About last week
03:58 Nvidia Releases Embedding Model NV-Embed-v1
04:53 Multi-camera YOLOv5 on Zynq UltraScale+ with Hailo-8 AI Acceleration
06:31 Fake sponsor
08:28 Matryoshka Query Transformer for Large Vision-Language Models
10:24 Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
11:51 Contextual Position Encoding: Learning to Count What's Important
13:30 Outro
-
OpenAI announces new content and product partnerships with Vox Media and The Atlantic, making their reporting and stories more discoverable to millions of OpenAI users.
Mistral AI releases Codestral, a 22B parameter, open-weight model that specializes in coding tasks, beating out its code-focused rivals across top benchmarks.
MAP-Neo is the first fully open-sourced bilingual LLM that provides all the details needed to reproduce the model, improving transparency in large language models.
Self-Exploring Language Models (SELM) is a promising approach to improving the alignment of LLMs to human intentions through online feedback collection.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:39 A content and product partnership with The Atlantic
02:59 Mistral Releases Codestral, a Code-focused Model
04:34 How Dell Is Beating Supermicro
05:50 Fake sponsor
08:06 MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
09:44 Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
11:16 Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
13:18 Outro
-
OpenAI has formed a new safety team to address concerns about AI safety and ethics, led by CEO Sam Altman and board members Adam DโAngelo and Nicole Seligman.
Jan Leike, a leading AI researcher, has left OpenAI and joined Anthropic's Superalignment team, which is focused on AI safety and security.
The latest version of Sentence Transformers v3 has been released, allowing for finetuning of models for specific tasks like semantic search and paraphrase mining.
Exciting new research papers have been published, including MoEUT, a shared-layer Transformer design that outperforms standard Transformers on language modeling tasks, and EM Distillation, a new distillation method for diffusion models that efficiently distills them to one-step generator models without sacrificing perceptual quality.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:32 OpenAI has a new safety team โ itโs run by Sam Altman
03:18 Jan Leike (ex OpenAI) joins Anthropic's Superalignment Team
05:04 Sentence Transformers v3 Released
06:06 Fake sponsor
08:19 MoEUT: Mixture-of-Experts Universal Transformers
10:10 Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models
11:48 EM Distillation for One-step Diffusion Models
13:42 Outro
-
xAI, founded by Elon Musk, raises $6 billion in funding to accelerate the research and development of future technologies in the AI race.
Google's new 'AI Overviews' search feature causes uproar with bizarre and inaccurate responses, potentially eroding trust in Google's search results.
"Transformers Can Do Arithmetic with the Right Embeddings" proposes a solution to transformers' struggles with arithmetic tasks, achieving up to 99% accuracy on 100 digit addition problems.
"SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering" introduces SWE-agent, an autonomous system that uses a language model to interact with a computer to solve software engineering tasks, with potential to revolutionize the field.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:27 Elon Muskโs xAI raises $6 billion to fund its race against ChatGPT and all the rest
02:51 Googleโs A.I. Search Errors Cause a Furor Online
04:17 ir-measures Documentation
05:15 Fake sponsor
07:12 Transformers Can Do Arithmetic with the Right Embeddings
08:23 Matryoshka Multimodal Models
09:58 SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering
11:47 Outro
- Näytä enemmän