Bölümler
-
Daniel Davis is an expert on knowledge graphs. He has a background in risk assessment and complex systems—from aerospace to cybersecurity. Now he is working on “Temporal RAG” in TrustGraph.
Time is a critical—but often ignored—dimension in data. Whether it’s threat intelligence, legal contracts, or API documentation, every data point has a temporal context that affects its reliability and usefulness. To manage this, systems must track when data is created, updated, or deleted, and ideally, preserve versions over time.
Three Types of Data:
Observations:Definition: Measurable, verifiable recordings (e.g., “the hat reads ‘Sunday Running Club’”).Characteristics: Require supporting evidence and may be updated as new data becomes available.Assertions:Definition: Subjective interpretations (e.g., “the hat is greenish”).Characteristics: Involve human judgment and come with confidence levels; they may change over time.Facts:Definition: Immutable, verified information that remains constant.Characteristics: Rare in dynamic environments because most data evolves; serve as the “bedrock” of trust.By clearly categorizing data into these buckets, systems can monitor freshness, detect staleness, and better manage dependencies between components (like code and its documentation).
Integrating Temporal Data into Knowledge Graphs:
Challenge:Traditional knowledge graphs and schemas (e.g., schema.org) rarely integrate time beyond basic metadata. Long documents may only provide a single timestamp, leaving the context of internal details untracked.Solution:Attach detailed temporal metadata (such as creation, update, and deletion timestamps) during data ingestion. Use versioning to maintain historical context. This allows systems to:Assess whether data is current or stale.Detect conflicts when updates occur.Employ Bayesian methods to adjust trust metrics as more information accumulates.Key Takeaways:
Focus on Specialization:Build tools that do one thing well. For example, design a simple yet extensible knowledge graph rather than relying on overly complex ontologies.Integrate Temporal Metadata:Always timestamp data operations and version records. This is key to understanding data freshness and evolution.Adopt Robust Infrastructure:Use scalable, proven technologies to connect specialized modules via APIs. This reduces maintenance overhead compared to systems overloaded with connectors and extra features.Leverage Bayesian Updates:Start with initial trust metrics based on observed data and refine them as new evidence arrives.Mind the Big Picture:Avoid working in isolated silos. Emphasize a holistic system design that maintains in situ context and promotes collaboration across teams.Daniel Davis
Cognitive CoreTrustGraphYouTubeLinkedInDiscordNicolay Gerold:
LinkedInX (Twitter)00:00 Introduction to Temporal Dimensions in Data 00:53 Timestamping and Versioning Data 01:35 Introducing Daniel Davis and Temporal RAG 01:58 Three Buckets of Data: Observations, Assertions, and Facts 03:22 Dynamic Data and Data Freshness 05:14 Challenges in Integrating Time in Knowledge Graphs 09:41 Defining Observations, Assertions, and Facts 12:57 The Role of Time in Data Trustworthiness 46:58 Chasing White Whales in AI 47:58 The Problem with Feature Overload 48:43 Connector Maintenance Challenges 50:02 The Swiss Army Knife Analogy 51:16 API Meshes and Glue Code 54:14 The Importance of Software Infrastructure 01:00:10 The Need for Specialized Tools 01:13:25 Outro and Future Plans
-
Robert Caulk runs Emergent Methods, a research lab building news knowledge graphs. With a Ph.D. in computational mechanics, he spent 12 years creating open-source tools for machine learning and data analysis. His work on projects like Flowdapt (model serving) and FreqAI (adaptive modeling) has earned over 1,000 academic citations.
His team built AskNews, which he calls "the largest news knowledge graph in production." It's a system that doesn't just collect news - it understands how events, people, and places connect.
Current AI systems struggle to connect information across sources and domains. Simple vector search misses crucial relationships. But building knowledge graphs at scale brings major technical hurdles around entity extraction, relationship mapping, and query performance.
Emergent Methods built a hybrid system combining vector search and knowledge graphs:
Vector DB (Quadrant) handles initial broad retrievalCustom knowledge graph processes relationshipsTranslation pipeline normalizes multi-language contentEntity extraction model identifies key elementsContext engineering pipeline structures data for LLMsImplementation Details:
Data Pipeline:
All content normalized to English for consistent embeddingsEntity names preserved in original language when untranslatableCustom Gleiner News model handles entity extractionRetrained every 6 months on fresh dataHuman review validates entity accuracyEntity Management:
Base extraction uses BERT-based Gleiner architectureTrained on diverse data across topics/regionsDisambiguation system merges duplicate entitiesManual override options for analystsMetadata tracking preserves relationship contextKnowledge Graph:
Selective graph construction from vector resultsOn-demand relationship processingGraph queries via standard CypherBuilt for specific use cases vs general coverageIntegration with S3 and other data storesSystem Validation:
Custom "Context is King" benchmark suiteRAGAS metrics track retrieval accuracyTime-split validation prevents data leakageManual review of entity extractionProduction monitoring of query patternsEngineering Insights:
Key Technical Decisions:
English normalization enables consistent embeddingsHybrid vector + graph approach balances speed/depthSelective graph construction keeps costs downHuman-in-loop validation maintains qualityDead Ends Hit:
Full multi-language entity system too complexReal-time graph updates not feasible at scalePure vector or pure graph approaches insufficientTop Quotes:
"At its core, context engineering is about how we feed information to AI. We want clear, focused inputs for better outputs. Think of it like talking to a smart friend - you'd give them the key facts in a way they can use, not dump raw data on them." - Robert"Strong metadata paints a high-fidelity picture. If we're trying to understand what's happening in Ukraine, we need to know not just what was said, but who said it, when they said it, and what voice they used to say it. Each piece adds color to the picture." - Robert"Clean data beats clever models. You can throw noise at an LLM and get something that looks good, but if you want real accuracy, you need to strip away the clutter first. Every piece of noise pulls the model in a different direction." - Robert"Think about how the answer looks in the real world. If you're comparing apartments, you'd want a table. If you're tracking events, you'd want a timeline. Match your data structure to how humans naturally process that kind of information." - Nico"Building knowledge graphs isn't about collecting everything - it's about finding the relationships that matter. Most applications don't need a massive graph. They need the right connections for their specific problem." - Robert"The quality of your context sets the ceiling for what your AI can do. You can have the best model in the world, but if you feed it noisy, unclear data, you'll get noisy, unclear answers. Garbage in, garbage out still applies." - Robert"When handling multiple languages, it's better to normalize everything to one language than to try juggling many. Yes, you lose some nuance, but you gain consistency. And consistency is what makes these systems reliable." - Robert"The hard part isn't storing the data - it's making it useful. Anyone can build a database. The trick is structuring information so an AI can actually reason with it. That's where context engineering makes the difference." - Robert"Start simple, then add complexity only when you need it. Most teams jump straight to sophisticated solutions when they could get better results by just cleaning their data and thinking carefully about how they structure it." - Nico"Every token in your context window is precious. Don't waste them on HTML tags or formatting noise. Save that space for the actual signal - the facts, relationships, and context that help the AI understand what you're asking." - NicoRobert Caulk:
LinkedInEmergent MethodsAsknewsNicolay Gerold:
LinkedInX (Twitter)00:00 Introduction to Context Engineering 00:24 Curating Input Signals 01:01 Structuring Raw Data 03:05 Refinement and Iteration 04:08 Balancing Breadth and Precision 06:10 Interview Start 08:02 Challenges in Context Engineering 20:25 Optimizing Context for LLMs 45:44 Advanced Cypher Queries and Graphs 46:43 Enrichment Pipeline Flexibility 47:16 Combining Graph and Semantic Search 49:23 Handling Multilingual Entities 52:57 Disambiguation and Deduplication Challenges 55:37 Training Models for Diverse Domains 01:04:43 Dealing with AI-Generated Content 01:17:32 Future Developments and Final Thoughts
-
Eksik bölüm mü var?
-
When you store vectors, each number takes up 32 bits.
With 1000 numbers per vector and millions of vectors, costs explode.
A simple chatbot can cost thousands per month just to store and search through vectors.
The Fix: Quantization
Think of it like image compression. JPEGs look almost as good as raw photos but take up far less space. Quantization does the same for vectors.
Today we are back continuing our series on search with Zain Hasan, a former ML engineer at Weaviate and now a Senior AI/ ML Engineer at Together. We talk about the different types of quantization, when to use them, how to use them, and their tradeoff.Three Ways to Quantize:
Binary Quantization Turn each number into just 0 or 1Ask: "Is this dimension positive or negative?"Works great for 1000+ dimensionsCuts memory by 97%Best for normally distributed dataProduct Quantization Split vector into chunksGroup similar chunksStore cluster IDs instead of full numbersGood when binary quantization failsMore complex but flexibleScalar Quantization Use 8 bits instead of 32Simple middle groundKeeps more precision than binaryLess savings than binary
"Vector databases are pretty much the commercialization and the productization of representation learning.""I think quantization, it builds on the assumption that there is still noise in the embeddings. And if I'm looking, it's pretty similar as well to the thought of Matryoshka embeddings that I can reduce the dimensionality.""Going from text to multimedia in vector databases is really simple.""Vector databases allow you to take all the advances that are happening in machine learning and now just simply turn a switch and use them for your application."
Key Quotes:Zain Hasan:
LinkedInX (Twitter)WeaviateTogetherNicolay Gerold:
LinkedInX (Twitter)vector databases, quantization, hybrid search, multi-vector support, representation learning, cost reduction, memory optimization, multimodal recommender systems, brain-computer interfaces, weather prediction models, AI applications
-
Alex Garcia is a developer focused on making vector search accessible and practical. As he puts it: "I'm a SQLite guy. I use SQLite for a lot of projects... I want an easier vector search thing that I don't have to install 10,000 dependencies to use.”
Core Mantra: "Simple, Local, Scalable"
Why SQLite Vec?
"I didn't go along thinking, 'Oh, I want to build vector search, let me find a database for it.' It was much more like: I use SQLite for a lot of projects, I want something lightweight that works in my current workflow."
SQLiteVec uses row-oriented storage with some key design choices:
Vectors are stored in large chunks (megabytes) as blobsData is split across 4KB SQLite pages, which affects analytical performanceCurrently uses brute force linear search without ANN indexingSupports binary quantization for 32x size reductionHandles tens to hundreds of thousands of vectors efficientlyPractical limits:
500ms search time for 500K vectors (768 dimensions)Best performance under 100ms for user experienceBinary quantization enables scaling to ~1M vectorsMetadata filtering and partitioning coming soonKey advantages:
Fast writes for transactional workloadsSimple single-file databaseEasy integration with existing SQLite applicationsLeverages SQLite's mature storage engineGarcia's preferred tools for local AI:
Sentence Transformers models converted to GGUF formatLlama.cpp for inferenceSmall models (30MB) for basic embeddingsLarger models like Arctic Embed (hundreds of MB) for recent topicsSQLite L-Embed extension for text embeddingsTransformers.js for browser-based implementations1. Choose Your Storage
"There's two ways of storing vectors within SQLiteVec. One way is a manual way where you just store a JSON array... [second is] using a virtual table."
Traditional row storage: Simple, flexible, good for small vectorsVirtual table storage: Optimized chunks, better for large datasetsPerformance sweet spot: Up to 500K vectors with 500ms search time2. Optimize Performance
"With binary quantization it's 1/32 of the space... and holds up at 95 percent quality"
Binary quantization reduces storage 32x with 95% qualityDefault page size is 4KB - plan your vector storage accordinglyMetadata filtering dramatically improves search speed3. Integration Patterns
"It's a single file, right? So you can like copy and paste it if you want to make a backup."
Two storage approaches: manual columns or virtual tablesEasy backups: single file databaseCross-platform: desktop, mobile, IoT, browser (via WASM)4. Real-World Tips
"I typically choose the really small model... it's 30 megabytes. It quantizes very easily... I like it because it's very small, quick and easy."
Start with smaller, efficient models (30MB range)Use binary quantization before trying complex solutionsPlan for partitioning when scaling beyond 100K vectorsAlex Garcia
LinkedInX (Twitter)GitHubsqlite-vecsqllite-vssWebsiteNicolay Gerold:
LinkedInX (Twitter) -
Today, I (Nicolay Gerold) sit down with Trey Grainger, author of the book AI-Powered Search. We discuss the different techniques for search and recommendations and how to combine them.
While RAG (Retrieval-Augmented Generation) has become a buzzword in AI, Trey argues that the current understanding of "RAG" is overly simplified – it's actually a bidirectional process he calls "GARRAG," where retrieval and generation continuously enhance each other.
Trey uses a three context framework for search architecture:
Content Context: Traditional document understanding and retrievalUser Context: Behavioral signals driving personalization and recommendationsDomain Context: Knowledge graphs and semantic understandingTrey shares insights on:
Why collecting and properly using user behavior signals is crucial yet often overlookedHow to implement "light touch" personalization without trapping users in filter bubblesThe evolution from simple vector similarity to sophisticated late interaction modelsWhy treating search as a non-linear pipeline with feedback loops leads to better resultsFor engineers building search systems, Trey offers practical advice on choosing the right tools and techniques, from traditional search engines like Solr and Elasticsearch to modern approaches like ColBERT.
Also how to layer different techniques to make search tunable and debuggable.
Quotes:
"I think of whether it's search or generative AI, I think of all of these systems as nonlinear pipelines.""The reason we use retrieval when we're working with generative AI is because A generative AI model these LLMs will take your query, your request, whatever you're asking for. They will then try to interpret them and without access to up to date information, without access to correct information, they will generate a response from their highly compressed understanding of the world. And so we use retrieval to augment them with information.""I think the misconception is that, oh, hey, for RAG I can just, plug in a vector database and a couple of libraries and, a day or two later everything's magically working and I'm off to solve the next problem. Because search and information retrieval is one of those problems that you never really solve. You get it, good enough and quit, or you find so much value in it, you just continue investing to constantly make it better.""To me, they're, search and recommendations are fundamentally the same problem. They're just using different contexts.""Anytime you're building a search system, whether it's traditional search, whether it's RAG for generative AI, you need to have all three of those contexts in order to effectively get the most relevant results to solve solve the problem.""There's no better way to make your users really angry with you than to stick them in a bucket and get them stuck in that bucket, which is not their actual intent."Trey Grainger:
LinkedInAI Powered Search (Community)AI Powered Search (Book)Nicolay Gerold:
LinkedInX (Twitter)00:00 Introduction to Search Challenges 00:50 Layered Approach to Ranking 01:00 Personalization and Signal Boosting 02:25 Broader Principles in Software Engineering 02:51 Interview with Trey Greinger 03:32 Understanding RAG and Retrieval 04:35 Nonlinear Pipelines in Search 06:01 Generative AI and Retrieval 08:10 Search Renaissance and AI 10:27 Misconceptions in AI-Powered Search 18:12 Search vs. Recommendation Systems 22:26 Three Buckets of Relevance 38:19 Traditional Learning to Rank 39:11 Semantic Relevance and User Behavior 39:53 Layered Ranking Algorithms 41:40 Personalization in Search 43:44 Technological Setup for Query Understanding 48:21 Personalization and User Behavior Vectors 52:10 Choosing the Right Search Engine 56:35 Future of AI-Powered Search 01:00:48 Building Effective Search Applications 01:06:50 Three Critical Context Frameworks 01:12:08 Modern Search Systems and Contextual Understanding 01:13:37 Conclusion and Recommendations
-
Today we are back continuing our series on search. We are talking to Brandon Smith, about his work for Chroma. He led one of the largest studies in the field on different chunking techniques. So today we will look at how we can unfuck our RAG systems from badly chosen chunking hyperparameters.
The biggest lie in RAG is that semantic search is simple. The reality is that it's easy to build, it's easy to get up and running, but it's really hard to get right. And if you don't have a good setup, it's near impossible to debug. One of the reasons it's really hard is actually chunking. And there are a lot of things you can get wrong.
And even OpenAI boggled it a little bit, in my opinion, using an 800 token length for the chunks. And this might work for legal, where you have a lot of boilerplate that carries little semantic meaning, but often you have the opposite. You have very information dense content and imagine fitting an entire Wikipedia page into the size of a tweet There will be a lot of information that's actually lost and that's what happens with long chunks The next is overlap openai uses a foreign token overlap or used to And what this does is actually we try to bring the important context into the chunk, but in reality, we don't really know where the context is coming from.
It could be from a few pages prior, not just the 400 tokens before. It could also be from a definition that's not even in the document at all. There is a really interesting solution actually from Anthropic Contextual Retrieval, where you basically pre process all the chunks to see whether there is any missing information and you basically try to reintroduce it.
Brandon Smith:
LinkedInX (Twitter)WebsiteChunking ArticleNicolay Gerold:
LinkedInX (Twitter)Website00:00 The Biggest Lie in RAG: Semantic Search Simplified 00:43 Challenges in Chunking and Overlap 01:38 Introducing Brandon Smith and His Research 02:05 The Motivation and Mechanics of Chunking 04:40 Issues with Current Chunking Methods 07:04 Optimizing Chunking Strategies 23:04 Introduction to Chunk Overlap 24:23 Exploring LLM-Based Chunking 24:56 Challenges with Initial Approaches 28:17 Alternative Chunking Methods 36:13 Language-Specific Considerations 38:41 Future Directions and Best Practices
-
Most LLMs you use today already use synthetic data.
It’s not a thing of the future.
The large labs use a large model (e.g. gpt-4o) to generate training data for a smaller one (gpt-4o-mini).
This lets you build fast, cheap models that do one thing well.
This is “distillation”.
But the vision for synthetic data is much bigger.
Enable people to train specialized AI systems without having a lot of training data.
Today we are talking to Adrien Morisot, an ML engineer at Cohere.
We talk about how Cohere uses synthetic data to train their models, their learnings, and how you can use synthetic data in your training.
We are slightly diverging from our search focus, but I wanted to create a deeper dive into synthetic data after our episode with Saahil.
You could use it in a lot of places: generate hard negatives, generate training samples for classifiers and rerankers and much more.
Scaling Synthetic Data Creation: https://arxiv.org/abs/2406.20094
Adrien Morisot:
LinkedInX (Twitter)CohereNicolay Gerold:
LinkedInX (Twitter)00:00 Introduction to Synthetic Data in LLMs 00:18 Distillation and Specialized AI Systems 00:39 Interview with Adrien Morisot 02:00 Early Challenges with Synthetic Data 02:36 Breakthroughs and Rediscovery 03:54 The Evolution of AI and Synthetic Data 07:51 Data Harvesting and Internet Scraping 09:28 Generating Diverse Synthetic Data 15:37 Manual Review and Quality Control 17:28 Automating Data Evaluation 18:54 Fine-Tuning Models with Synthetic Data 21:45 Avoiding Behavioral Cloning 23:47 Ensuring Model Accuracy with Verification 24:31 Adapting Models to Specific Domains 26:41 Challenges in Financial and Legal Domains 28:10 Improving Synthetic Data Sets 30:45 Evaluating Model Performance 32:21 Using LLMs as Judges 35:42 Practical Tips for AI Practitioners 41:26 Synthetic Data in Training Processes 43:51 Quality Control in Synthetic Data 45:41 Domain Adaptation Strategies 46:51 Future of Synthetic Data Generation 47:30 Conclusion and Next Steps
-
Modern RAG systems build on flexibility.
At their core, they match each query with the best tool for the job.
They know which tool fits each task. When you ask about sales numbers, they reach for SQL. When you need to company policies, they use vector search or BM25. The key is switching tools smoothly.
A question about sales figures might need SQL, while a search through policy documents works better with vector search. The key is building systems that can switch between these tools smoothly.
But all types of retrieval start with metadata. By tagging documents with key details during processing, we narrow the search space before diving in.
The best systems use a mix of approaches: they might keep full documents for context, summaries for quick scanning, and metadata for filtering. They cast a wide net at first, then use neural ranking to zero in on the most relevant results.
The quality of embeddings can make or break a system. General-purpose models often fall short in specialized fields. Testing different embedding models on your specific data pays off - what works for general text might fail for legal documents or technical manuals. Sometimes, fine-tuning a model for your domain is worth the effort.
When building search systems, think modular. Start with pieces that can be swapped out as needs change or better tools emerge. Add metadata processing early - it's harder to add later. Break the retrieval process into steps: first find possible matches quickly, then rank them carefully. For complex documents with tables or images, add tools that can handle different types of content.
The best systems also check their work. They ask: "Did I actually answer the question?" If not, they try a different approach. But they also know when to stop - endless loops help no one. In the end, RAG isn't just about finding information. It's about finding the right information, in the right way, at the right time.
Stephen Batifol:
X (Twitter)ZillizLinkedInNicolay Gerold:
LinkedInX (Twitter)00:00 Introduction to Agentic RAG 00:04 Understanding Control Flow in Agentic RAG 00:33 Decision Making with LLMs 01:11 Exploring Agentic RAG with Stephen Batifol 03:35 Comparing RAG and GAR 06:31 Implementing Agentic RAG Workflows 22:36 Filtering with Prefix, Suffix, and Midfix 24:15 Breaking Mechanisms in Workflows 28:00 Evaluating Agentic Workflows 30:31 Multimodal and VLLMs in Document Processing 33:51 Challenges and Innovations in Parsing 34:51 Overrated and Underrated Aspects in LLMs 39:52 Building Effective Search Applications
-
Many companies use Elastic or OpenSearch and use 10% of the capacity.
They have to build ETL pipelines.
Get data Normalized.
Worry about race conditions.
All in all. At the moment, when you want to do search on top of your transactional data, you are forced to build a distributed systems.
Not anymore.
ParadeDB is building an open-source PostgreSQL extension to enable search within your database.
Today, I am talking to Philippe Noël, the founder and CEO of ParadeDB.
We talk about how they build it, how they integrate into the Postgres Query engines, and how you can build search on top of Postgres.
Key Insights:
Search is changing. We're moving from separate search clusters to search inside databases. Simpler architecture, stronger guarantees, lower costs up to a certain scale.
Most search engines force you to duplicate data. ParadeDB doesn't. You keep data normalized and join at query time. It hooks deep into Postgres's query planner. It doesn't just bolt on search - it lets Postgres optimize search queries alongside SQL ones.
Search indices can work with ACID. ParadeDB's BM25 index keeps Lucene-style components (term frequency, normalization) but adds Postgres metadata for transactions. Search + ACID is possible.
Two storage types matter: inverted indices for text, columnar "fast fields" for analytics. Pick the right one or queries get slow. Integers now default to columnar to prevent common mistakes.
Mixing query engines looks tempting but fails. The team tried using DuckDB and DataFusion inside Postgres. Both were fast but broke ACID compliance. They had to rebuild features natively.
Philippe Noël:
LinkedInBlueskyParadeDBNicolay Gerold:
LinkedInX (Twitter)Bluesky00:00 Introduction to ParadeDB 00:53 Building ParadeDB with Rust 01:43 Integrating Search in Postgres 03:04 ParadeDB vs. Elastic 05:48 Technical Deep Dive: Postgres Integration 07:27 Challenges and Solutions 09:35 Transactional Safety and Performance 11:06 Composable Data Systems 15:26 Columnar Storage and Analytics 20:54 Case Study: Alibaba Cloud 21:57 Data Warehouse Context 23:24 Custom Indexing with BM25 24:01 Postgres Indexing Overview 24:17 Fast Fields and Columnar Format 24:52 Lucene Inspiration and Data Storage 26:06 Setting Up and Managing Indexes 27:43 Query Building and Complex Searches 30:21 Scaling and Sharding Strategies 35:27 Query Optimization and Common Mistakes 38:39 Future Developments and Integrations 39:24 Building a Full-Fledged Search Application 42:53 Challenges and Advantages of Using ParadeDB 46:43 Final Thoughts and Recommendations
-
RAG isn't a magic fix for search problems. While it works well at first, most teams find it's not good enough for production out of the box. The key is to make it better step by step, using good testing and smart data creation.
Today, we are talking to Saahil Ognawala from Jina AI to start to understand RAG.
To build a good RAG system, you need three things: ways to test it, methods to create training data, and plans to make it better over time. Testing starts with a set of example searches that users might make. These should include common searches that happen often, medium-rare searches, and rare searches that only happen now and then. This mix helps you measure if changes make your system better or worse.
Creating synthetic data helps make the system stronger, especially in spotting wrong answers that look right. Think of someone searching for a "gluten-free chocolate cake." A "sugar-free chocolate cake" might look like a good answer because it shares many words, but it's wrong.
These tricky examples help the system learn the difference between similar but different things.
When creating synthetic data, you need rules. The best way is to show the AI a few real examples and give it a list of topics to work with. Most teams find that using half real data and half synthetic data works best. This gives you enough variety while keeping things real.
Getting user feedback is hard with RAG. In normal search, you can see if users click on results. But with RAG, the system creates an answer from many pieces. A good answer might come from both good and bad pieces, making it hard to know which parts helped. This means you need smart ways to track which pieces of information actually helped make good answers.
One key rule: don't make things harder than they need to be. If simple keyword search (called BM25) works well enough, adding fancy AI search might not be worth the extra work.
Success with RAG comes from good testing, careful data creation, and steady improvements based on real use. It's not about using the newest AI models. It's about building good systems and processes that work reliably.
"It isn’t a magic wand you can place on your catalog and expect results you didn’t get before."
“Most of our users are enterprise users who have seen the most success in their RAG systems are the ones that very early implemented a continuous feedback mechanism.“
“If you can't tell in real time usage whether an answer is a bad answer or a right answer because the LLM just makes it look like the right answer then you only have your retrieval dataset to blame”
Saahil Ognawala:
LinkedInJina AINicolay Gerold:
LinkedInX (Twitter)00:00 Introduction to Retrieval Augmented Generation (RAG) 00:29 Interview with Saahil Ognawala 00:52 Synthetic Data in Language Generation 01:14 Understanding the E5 Mistral Instructor Embeddings Paper 03:15 Challenges and Evolution in Synthetic Data 05:03 User Intent and Retrieval Systems 11:26 Evaluating RAG Systems 14:46 Setting Up Evaluation Frameworks 20:37 Fine-Tuning and Embedding Models 22:25 Negative and Positive Examples in Retrieval 26:10 Synthetic Data for Hard Negatives 29:20 Case Study: Marine Biology Project 29:54 Addressing Errors in Marine Biology Queries 31:28 Ensuring Query Relevance with Human Intervention 31:47 Few Shot Prompting vs Zero Shot Prompting 35:09 Balancing Synthetic and Real World Data 37:17 Improving RAG Systems with User Feedback 39:15 Future Directions for Jina and Synthetic Data 40:44 Building and Evaluating Embedding Models 41:24 Getting Started with Jina and Open Source Tools 51:25 The Importance of Hard Negatives in Embedding Models
-
Documentation quality is the silent killer of RAG systems. A single ambiguous sentence might corrupt an entire set of responses. But the hardest part isn't fixing errors - it's finding them.
Today we are talking to Max Buckley on how to find and fix these errors.
Max works at Google and has built a lot of interesting experiments with LLMs on using them to improve knowledge bases for generation.
We talk about identifying ambiguities, fixing errors, creating improvement loops in the documents and a lot more.
Some Insights:
A single ambiguous sentence can systematically corrupt an entire knowledge base's responses. Fixing these "documentation poisons" often requires minimal changes but identifying them is challenging.Large organizations develop their own linguistic ecosystems that evolve over time. This creates unique challenges for both embedding models and retrieval systems that need to bridge external and internal vocabularies.Multiple feedback loops are crucial - expert testing, user feedback, and system monitoring each catch different types of issues.Max Buckley: (All opinions are his own and not of Google)
LinkedInNicolay Gerold:
LinkedInX (Twitter)00:00 Understanding LLM Hallucinations 00:02 Challenges with Temporal Inconsistencies 00:43 Issues with Document Structure and Terminology 01:05 Introduction to Retrieval Augmented Generation (RAG) 01:49 Interview with Max Buckley 02:27 Anthropic's Approach to Document Chunking 02:55 Contextualizing Chunks for Better Retrieval 06:29 Challenges in Chunking and Search 07:35 LLMs in Internal Knowledge Management 08:45 Identifying and Fixing Documentation Errors 10:58 Using LLMs for Error Detection 15:35 Improving Documentation with User Feedback 24:42 Running Processes on Retrieved Context 25:19 Challenges of Terminology Consistency 26:07 Handling Definitions and Glossaries 30:10 Addressing Context Misinterpretation 31:13 Improving Documentation Quality 36:00 Future of AI and Search Technologies 42:29 Ensuring Documentation Readiness for AI
-
Ever wondered why vector search isn't always the best path for information retrieval?
Join us as we dive deep into BM25 and its unmatched efficiency in our latest podcast episode with David Tippett from GitHub.
Discover how BM25 transforms search efficiency, even at GitHub's immense scale.
BM25, short for Best Match 25, use term frequency (TF) and inverse document frequency (IDF) to score document-query matches. It addresses limitations in TF-IDF, such as term saturation and document length normalization.
Search Is About User Expectations
Search isn't just about relevance but aligning with what users expect: GitHub users, for example, have diverse use cases—finding security vulnerabilities, exploring codebases, or managing repositories. Each requires a different prioritization of fields, boosting strategies, and possibly even distinct search workflows.Key Insight: Search is deeply contextual and use-case driven. Understanding your users' intent and tailoring search behavior to their expectations matters more than chasing state-of-the-art technology.The Challenge of Vector Search at Scale
Vector search systems require in-memory storage of vectorized data, making them costly for datasets with billions of documents (e.g., GitHub’s 100 billion documents).IVF and HNSW offer trade-offs: IVF: Reduces memory requirements by bucketing vectors but risks losing relevance due to bucket misclassification.HNSW: Offers high relevance but demands high memory, making it impractical for massive datasets.Architectural Insight: When considering vector search, focus on niche applications or subdomains with manageable dataset sizes or use hybrid approaches combining BM25 with sparse/dense vectors.Vector Search vs. BM25: A Trade-off of Precision vs. Cost
Vector search is more precise and effective for semantic similarity, but its operational costs and memory requirements make it prohibitive for massive datasets like GitHub’s corpus of over 100 billion documents.BM25’s scaling challenges (e.g., reliance on disk IOPS) are manageable compared to the memory-bound nature of vector search engines like HNSW and IVF.Key Insight: BM25’s scalability allows for broader adoption, while vector search is still a niche solution requiring high specialization and infrastructure.David Tippett:
LinkedInPodcast (For the Sake of Search)X (Twitter)Tippybits.comBlueskyNicolay Gerold:
LinkedInX (Twitter)Bluesky00:00 Introduction to RAG and Vector Search Challenges 00:28 Introducing BM25: The Efficient Search Solution 00:43 Guest Introduction: David Tippett 01:16 Comparing Search Engines: Vespa, Weaviate, and More 07:53 Understanding BM25 and Its Importance 09:10 Deep Dive into BM25 Mechanics 23:46 Field-Based Scoring and BM25F 25:49 Introduction to Zero Shot Retrieval 26:03 Vector Search vs BM25 26:22 Combining Search Techniques 26:56 Favorite BM25 Adaptations 27:38 Postgres Search and Term Proximity 31:49 Challenges in GitHub Search 33:59 BM25 in Large Scale Systems 40:00 Technical Deep Dive into BM25 45:30 Future of Search and Learning to Rank 47:18 Conclusion and Future Plans
-
Ever wondered why your vector search becomes painfully slow after scaling past a million vectors? You're not alone - even tech giants struggle with this.
Charles Xie, founder of Zilliz (company behind Milvus), shares how they solved vector database scaling challenges at 100B+ vector scale:
Key Insights:
Multi-tier storage strategy: GPU memory (1% of data, fastest)RAM (10% of data)Local SSDObject storage (slowest but cheapest)Real-time search solution: New data goes to buffer (searchable immediately)Index builds in background when buffer fillsCombines buffer & main index resultsPerformance optimization: GPU acceleration for 10k-50k queries/secondCustomizable trade-offs between: CostLatencySearch relevanceFuture developments: Self-learning indicesHybrid search methods (dense + sparse)Graph embedding supportColbert integrationPerfect for teams hitting scaling walls with their current vector search implementation or planning for future growth.
Worth watching if you're building production search systems or need to optimize costs vs performance.
Charles Xie:
LinkedInZillizMilvusMilvus DiscordNicolay Gerold:
LinkedInX (Twitter)00:00 Introduction to Search System Challenges 00:26 Introducing Milvus: The Open Source Vector Database 00:58 Interview with Charles: Founder of Zilliz 02:20 Scalability and Performance in Vector Databases 03:35 Challenges in Distributed Systems 05:46 Data Consistency and Real-Time Search 12:12 Hierarchical Storage and GPU Acceleration 18:34 Emerging Technologies in Vector Search 23:21 Self-Learning Indexes and Future Innovations 28:44 Key Takeaways and Conclusion
-
Modern search systems face a complex balancing act between performance, relevancy, and cost, requiring careful architectural decisions at each layer.
While vector search generates buzz, hybrid approaches combining traditional text search with vector capabilities yield better results.
The architecture typically splits into three core components:
ingestion/indexing (requiring decisions between batch vs streaming)query processing (balancing understanding vs performance)analytics/feedback loops for continuous improvement.Critical but often overlooked aspects include query understanding depth, systematic relevancy testing (avoid anecdote-driven development), and data governance as search systems naturally evolve into organizational data hubs.
Performance optimization requires careful tradeoffs between index-time vs query-time computation, with even 1-2% improvements being significant in mature systems.
Success requires testing against production data (staging environments prove unreliable), implementing proper evaluation infrastructure (golden query sets, A/B testing, interleaving), and avoiding the local maxima trap where improving one query set unknowingly damages others.
The end goal is finding an acceptable balance between corpus size, latency requirements, and cost constraints while maintaining system manageability and relevance quality.
"It's quite easy to end up in local maxima, whereby you improve a query for one set and then you end up destroying it for another set."
"A good marker of a sophisticated system is one where you actually see it's getting worse... you might be discovering a maxima."
"There's no free lunch in all of this. Often it's a case that, to service billions of documents on a vector search, less than 10 millis, you can do those kinds of things. They're just incredibly expensive. It's really about trying to manage all of the overall system to find what is an acceptable balance."
Search Pioneers:
WebsiteGitHubStuart Cam:
LinkedInRuss Cam:
GithubLinkedInX (Twitter)Nicolay Gerold:
LinkedInX (Twitter)00:00 Introduction to Search Systems 00:13 Challenges in Search: Relevancy vs Latency 00:27 Insights from Industry Experts 01:00 Evolution of Search Technologies 03:16 Storage and Compute in Search Systems 06:22 Common Mistakes in Building Search Systems 09:10 Evaluating and Improving Search Systems 19:27 Architectural Components of Search Systems 29:17 Understanding Search Query Expectations 29:39 Balancing Speed, Cost, and Corpus Size 32:03 Trade-offs in Search System Design 32:53 Indexing vs Querying: Key Considerations 35:28 Re-ranking and Personalization Challenges 38:11 Evaluating Search System Performance 44:51 Overrated vs Underrated Search Techniques 48:31 Final Thoughts and Contact Information
-
Today we are talking to Michael Günther, a senior machine learning scientist at Jina about his work on JINA Clip.
Some key points:
Uni-modal embeddings convert a single type of input (text, images, audio) into vectorsMultimodal embeddings learn a joint embedding space that can handle multiple types of input, enabling cross-modal search (e.g., searching images with text)Multimodal models can potentially learn richer representations of the world, including concepts that are difficult or impossible to put into wordsTypes of Text-Image Models
CLIP-like ModelsSeparate vision and text transformer modelsEach tower maps inputs to a shared vector spaceOptimized for efficient retrievalVision-Language ModelsProcess image patches as tokensUse transformer architecture to combine image and text informationBetter suited for complex document matchingHybrid ModelsCombine separate encoders with additional transformer componentsAllow for more complex interactions between modalitiesExample: Google's Magic Lens modelTraining Insights from Jina CLIP
Key LearningsFreezing the text encoder during training can significantly hinder performanceShort image captions limit the model's ability to learn rich text representationsLarge batch sizes are crucial for training embedding models effectivelyTraining ProcessThree-stage training approach: Stage 1: Training on image captions and text pairsStage 2: Adding longer image captionsStage 3: Including triplet data with hard negativesPractical Considerations
Similarity ScalesDifferent modalities can produce different similarity value scalesImportant to consider when combining multiple embedding typesCan affect threshold-based filteringModel SelectionEvaluate models based on relevant benchmarksConsider the domain similarity between training data and intended use caseAssessment of computational requirements and efficiency needsFuture Directions
Areas for DevelopmentMore comprehensive benchmarks for multimodal tasksBetter support for semi-structured dataImproved handling of non-photographic imagesUpcoming Developments at Jina AIMultilingual support for Jina ColBERTNew version of text embedding modelsFocus on complex multimodal search applicationsPractical Applications
E-commerceProduct search and recommendationsCombined text-image embeddings for better resultsSynthetic data generation for fine-tuningFine-tuning StrategiesUsing click data and query logsGenerative pseudo-labeling for creating training dataDomain-specific adaptationsKey Takeaways for Engineers
Be aware of similarity value scales and their implicationsEstablish quantitative evaluation metrics before optimizationConsider model limitations (e.g., image resolution, text length)Use performance optimizations like flash attention and activation checkpointingUniversal embedding models might not be optimal for specific use casesMichael Guenther
LinkedInX (Twitter)Jina AINew Multilingual Embedding ModalNicolay Gerold:
LinkedInX (Twitter)00:00 Introduction to Uni-modal and Multimodal Embeddings 00:16 Exploring Multimodal Embeddings and Their Applications 01:06 Training Multimodal Embedding Models 02:21 Challenges and Solutions in Embedding Models 07:29 Advanced Techniques and Future Directions 29:19 Understanding Model Interference in Search Specialization 30:17 Fine-Tuning Jina CLIP for E-Commerce 32:18 Synthetic Data Generation and Pseudo-Labeling 33:36 Challenges and Learnings in Embedding Models 40:52 Future Directions and Takeaways
-
Imagine a world where data bottlenecks, slow data loaders, or memory issues on the VM don't hold back machine learning.
Machine learning and AI success depends on the speed you can iterate. LanceDB is here to to enable fast experiments on top of terabytes of unstructured data. It is the database for AI. Dive with us into how LanceDB was built, what went into the decision to use Rust as the main implementation language, the potential of AI on top of LanceDB, and more.
"LanceDB is the database for AI...to manage their data, to do a performant billion scale vector search."
“We're big believers in the composable data systems vision."
"You can insert data into LanceDB using Panda's data frames...to sort of really large 'embed the internet' kind of workflows."
"We wanted to create a new generation of data infrastructure that makes their [AI engineers] lives a lot easier."
"LanceDB offers up to 1,000 times faster performance than Parquet."
LinkedInX (Twitter)
Change She:LanceDB:
X (Twitter)GitHubWebDiscordVectorDB RecipesNicolay Gerold:
LinkedInX (Twitter)00:00 Introduction to Multimodal Embeddings
00:26 Challenges in Storage and Serving
02:51 LanceDB: The Solution for Multimodal Data
04:25 Interview with Chang She: Origins and Vision
10:37 Technical Deep Dive: LanceDB and Rust
18:11 Innovations in Data Storage Formats
19:00 Optimizing Performance in Lakehouse Ecosystems
21:22 Future Use Cases for LanceDB
26:04 Building Effective Recommendation Systems
32:10 Exciting Applications and Future Directions -
Today’s guest is Mór Kapronczay. Mór is the Head of ML at superlinked. Superlinked is a compute framework for your information retrieval and feature engineering systems, where they turn anything into embeddings.
When most people think about embeddings, they think about ada, openai.
You just take your text and throw it in there.
But that’s too crude.
OpenAI embeddings are trained on the internet.
But your data set (most likely) is not the internet.
You have different nuances.
And you have more than just text.
So why not use it.
Some highlights:
Text Embeddings are Not a Magic Bullet➡️ Pouring everything into a text embedding model won't yield magical results ➡️ Language is lossy - it's a poor compression method for complex information
Embedding Numerical Data➡️ Direct number embeddings don't work well for vector search ➡️ Consider projecting number ranges onto a quarter circle ➡️ Apply logarithmic transforms for skewed distributions
Multi-Modal Embeddings➡️ Create separate vector parts for different data aspects ➡️ Normalize individual parts ➡️ Weight vector parts based on importance
A Multi-Vector approach can help you understand the contributions of each modality or embedding and give you an easier time to fine-tune your retrieval system without fine-tuning your embedding models by tuning your vector database like you would a search database (like Elastic).
Mór Kapronczay
LinkedInSuperlinkedX (Twitter)Nicolay Gerold:
LinkedInX (Twitter)00:00 Introduction to Embeddings 00:30 Beyond Text: Expanding Embedding Capabilities 02:09 Challenges and Innovations in Embedding Techniques 03:49 Unified Representations and Vector Computers 05:54 Embedding Complex Data Types 07:21 Recommender Systems and Interaction Data 08:59 Combining and Weighing Embeddings 14:58 Handling Numerical and Categorical Data 20:35 Optimizing Embedding Efficiency 22:46 Dynamic Weighting and Evaluation 24:35 Exploring AB Testing with Embeddings 25:08 Joint vs Separate Embedding Spaces 27:30 Understanding Embedding Dimensions 29:59 Libraries and Frameworks for Embeddings 32:08 Challenges in Embedding Models 33:03 Vector Database Connectors 34:09 Balancing Production and Updates 36:50 Future of Vector Search and Modalities 39:36 Building with Embeddings: Tips and Tricks 42:26 Concluding Thoughts and Next Steps
-
Today we have Jessica Talisman with us, who is working as an Information Architect at Adobe. She is (in my opinion) the expert on taxonomies and ontologies.
That’s what you will learn today in this episode of How AI Is Built. Taxonomies, ontologies, knowledge graphs.
Everyone is talking about them no-one knows how to build them.
But before we look into that, what are they good for in search?
Imagine a large corpus of academic papers. When a user searches for "machine learning in healthcare", the system can:
Recognize "machine learning" as a subcategory of "artificial intelligence"Identify "healthcare" as a broad field with subfields like "diagnostics" and "patient care"We can use these to expand the query or narrow it down.We can return results that include papers on "neural networks for medical imaging" or "predictive analytics in patient outcomes", even if these exact phrases weren't in the search queryWe can also filter down and remove papers not tagged with AI that might just mention it in a side not.So we are building the plumbing, the necessary infrastructure for tagging, categorization, query expansion and relexation, filtering.
So how can we build them?
1️⃣ Start with Industry Standards • Leverage established taxonomies (e.g., Google, GS1, IAB) • Audit them for relevance to your project • Use as a foundation, not a final solution
2️⃣ Customize and Fill Gaps • Adapt industry taxonomies to your specific domain • Create a "coverage model" for your unique needs • Mine internal docs to identify domain-specific concepts
3️⃣ Follow Ontology Best Practices • Use clear, unique primary labels for each concept • Include definitions to avoid ambiguity • Provide context for each taxonomy node
Jessica Talisman:
LinkedInNicolay Gerold:
LinkedInX (Twitter)00:00 Introduction to Taxonomies and Knowledge Graphs 02:03 Building the Foundation: Metadata to Knowledge Graphs 04:35 Industry Taxonomies and Coverage Models 06:32 Clustering and Labeling Techniques 11:00 Evaluating and Maintaining Taxonomies 31:41 Exploring Taxonomy Granularity 32:18 Differentiating Taxonomies for Experts and Users 33:35 Mapping and Equivalency in Taxonomies 34:02 Best Practices and Examples of Taxonomies 40:50 Building Multilingual Taxonomies 44:33 Creative Applications of Taxonomies 48:54 Overrated and Underappreciated Technologies 53:00 The Importance of Human Involvement in AI 53:57 Connecting with the Speaker 55:05 Final Thoughts and Takeaways
-
ColPali makes us rethink how we approach document processing.
ColPali revolutionizes visual document search by combining late interaction scoring with visual language models. This approach eliminates the need for extensive text extraction and preprocessing, handling messy real-world data more effectively than traditional methods.
In this episode, Jo Bergum, chief scientist at Vespa, shares his insights on how ColPali is changing the way we approach complex document formats like PDFs and HTML pages.
Introduction to ColPali:
Combines late interaction scoring from Colbert with visual language model (PoliGemma)Represents screenshots of documents as multi-vector representationsEnables searching across complex document formats (PDFs, HTML)Eliminates need for extensive text extraction and preprocessingAdvantages of ColPali:
Handles messy, real-world data better than traditional methodsConsiders both textual and visual elements in documentsPotential applications in various domains (finance, medical, legal)Scalable to large document collections with proper optimizationJo Bergum:
LinkedInVespaX (Twitter)PDF Retrieval with Vision Language ModelsScaling ColPali to billions of PDFs with VespaNicolay Gerold:
LinkedInX (Twitter)00:00 Messy Data in AI 01:19 Challenges in Search Systems 03:41 Understanding Representational Approaches 08:18 Dense vs Sparse Representations 19:49 Advanced Retrieval Models and ColPali 30:59 Exploring Image-Based AI Progress 32:25 Challenges and Innovations in OCR 33:45 Understanding ColPali and MaxSim 38:13 Scaling and Practical Applications of ColPali 44:01 Future Directions and Use Cases
-
Today, we're talking to Aamir Shakir, the founder and baker at mixedbread.ai, where he's building some of the best embedding and re-ranking models out there. We go into the world of rerankers, looking at how they can classify, deduplicate documents, prioritize LLM outputs, and delve into models like ColBERT.
We discuss:
The role of rerankers in retrieval pipelinesAdvantages of late interaction models like ColBERT for interpretabilityTraining rerankers vs. embedding models and their impact on performanceIncorporating metadata and context into rerankers for enhanced relevanceCreative applications of rerankers beyond traditional searchChallenges and future directions in the retrieval spaceStill not sure whether to listen? Here are some teasers:
Rerankers can significantly boost your retrieval system's performance without overhauling your existing setup.Late interaction models like ColBERT offer greater explainability by allowing token-level comparisons between queries and documents.Training a reranker often yields a higher impact on retrieval performance than training an embedding model.Incorporating metadata directly into rerankers enables nuanced search results based on factors like recency and pricing.Rerankers aren't just for search—they can be used for zero-shot classification, deduplication, and prioritizing outputs from large language models.The future of retrieval may involve compound models capable of handling multiple modalities, offering a more unified approach to search.Aamir Shakir:
LinkedInX (Twitter)Mixedbread.aiNicolay Gerold:
LinkedInX (Twitter)00:00 Introduction and Overview 00:25 Understanding Rerankers 01:46 Maxsim and Token-Level Embeddings 02:40 Setting Thresholds and Similarity 03:19 Guest Introduction: Aamir Shakir 03:50 Training and Using Rerankers (Episode Start) 04:50 Challenges and Solutions in Reranking 08:03 Future of Retrieval and Recommendation 26:05 Multimodal Retrieval and Reranking 38:04 Conclusion and Takeaways
- Daha fazla göster