Folgen

  • Modern search systems face a complex balancing act between performance, relevancy, and cost, requiring careful architectural decisions at each layer.

    While vector search generates buzz, hybrid approaches combining traditional text search with vector capabilities yield better results.

    The architecture typically splits into three core components:

    ingestion/indexing (requiring decisions between batch vs streaming)query processing (balancing understanding vs performance)analytics/feedback loops for continuous improvement.

    Critical but often overlooked aspects include query understanding depth, systematic relevancy testing (avoid anecdote-driven development), and data governance as search systems naturally evolve into organizational data hubs.

    Performance optimization requires careful tradeoffs between index-time vs query-time computation, with even 1-2% improvements being significant in mature systems.

    Success requires testing against production data (staging environments prove unreliable), implementing proper evaluation infrastructure (golden query sets, A/B testing, interleaving), and avoiding the local maxima trap where improving one query set unknowingly damages others.

    The end goal is finding an acceptable balance between corpus size, latency requirements, and cost constraints while maintaining system manageability and relevance quality.

    "It's quite easy to end up in local maxima, whereby you improve a query for one set and then you end up destroying it for another set."

    "A good marker of a sophisticated system is one where you actually see it's getting worse... you might be discovering a maxima."

    "There's no free lunch in all of this. Often it's a case that, to service billions of documents on a vector search, less than 10 millis, you can do those kinds of things. They're just incredibly expensive. It's really about trying to manage all of the overall system to find what is an acceptable balance."

    Search Pioneers:

    WebsiteGitHub

    Stuart Cam:

    LinkedIn

    Russ Cam:

    GithubLinkedInX (Twitter)

    Nicolay Gerold:

    ⁠LinkedIn⁠⁠X (Twitter)

    00:00 Introduction to Search Systems 00:13 Challenges in Search: Relevancy vs Latency 00:27 Insights from Industry Experts 01:00 Evolution of Search Technologies 03:16 Storage and Compute in Search Systems 06:22 Common Mistakes in Building Search Systems 09:10 Evaluating and Improving Search Systems 19:27 Architectural Components of Search Systems 29:17 Understanding Search Query Expectations 29:39 Balancing Speed, Cost, and Corpus Size 32:03 Trade-offs in Search System Design 32:53 Indexing vs Querying: Key Considerations 35:28 Re-ranking and Personalization Challenges 38:11 Evaluating Search System Performance 44:51 Overrated vs Underrated Search Techniques 48:31 Final Thoughts and Contact Information

  • Today we are talking to Michael Günther, a senior machine learning scientist at Jina about his work on JINA Clip.

    Some key points:

    Uni-modal embeddings convert a single type of input (text, images, audio) into vectorsMultimodal embeddings learn a joint embedding space that can handle multiple types of input, enabling cross-modal search (e.g., searching images with text)Multimodal models can potentially learn richer representations of the world, including concepts that are difficult or impossible to put into words

    Types of Text-Image Models

    CLIP-like ModelsSeparate vision and text transformer modelsEach tower maps inputs to a shared vector spaceOptimized for efficient retrievalVision-Language ModelsProcess image patches as tokensUse transformer architecture to combine image and text informationBetter suited for complex document matchingHybrid ModelsCombine separate encoders with additional transformer componentsAllow for more complex interactions between modalitiesExample: Google's Magic Lens model

    Training Insights from Jina CLIP

    Key LearningsFreezing the text encoder during training can significantly hinder performanceShort image captions limit the model's ability to learn rich text representationsLarge batch sizes are crucial for training embedding models effectivelyTraining ProcessThree-stage training approach: Stage 1: Training on image captions and text pairsStage 2: Adding longer image captionsStage 3: Including triplet data with hard negatives

    Practical Considerations

    Similarity ScalesDifferent modalities can produce different similarity value scalesImportant to consider when combining multiple embedding typesCan affect threshold-based filteringModel SelectionEvaluate models based on relevant benchmarksConsider the domain similarity between training data and intended use caseAssessment of computational requirements and efficiency needs

    Future Directions

    Areas for DevelopmentMore comprehensive benchmarks for multimodal tasksBetter support for semi-structured dataImproved handling of non-photographic imagesUpcoming Developments at Jina AIMultilingual support for Jina ColBERTNew version of text embedding modelsFocus on complex multimodal search applications

    Practical Applications

    E-commerceProduct search and recommendationsCombined text-image embeddings for better resultsSynthetic data generation for fine-tuningFine-tuning StrategiesUsing click data and query logsGenerative pseudo-labeling for creating training dataDomain-specific adaptations

    Key Takeaways for Engineers

    Be aware of similarity value scales and their implicationsEstablish quantitative evaluation metrics before optimizationConsider model limitations (e.g., image resolution, text length)Use performance optimizations like flash attention and activation checkpointingUniversal embedding models might not be optimal for specific use cases

    Michael Guenther

    LinkedInX (Twitter)Jina AINew Multilingual Embedding Modal

    Nicolay Gerold:

    ⁠LinkedIn⁠⁠X (Twitter)

    00:00 Introduction to Uni-modal and Multimodal Embeddings 00:16 Exploring Multimodal Embeddings and Their Applications 01:06 Training Multimodal Embedding Models 02:21 Challenges and Solutions in Embedding Models 07:29 Advanced Techniques and Future Directions 29:19 Understanding Model Interference in Search Specialization 30:17 Fine-Tuning Jina CLIP for E-Commerce 32:18 Synthetic Data Generation and Pseudo-Labeling 33:36 Challenges and Learnings in Embedding Models 40:52 Future Directions and Takeaways

  • Fehlende Folgen?

    Hier klicken, um den Feed zu aktualisieren.

  • Imagine a world where data bottlenecks, slow data loaders, or memory issues on the VM don't hold back machine learning.

    Machine learning and AI success depends on the speed you can iterate. LanceDB is here to to enable fast experiments on top of terabytes of unstructured data. It is the database for AI. Dive with us into how LanceDB was built, what went into the decision to use Rust as the main implementation language, the potential of AI on top of LanceDB, and more.

    "LanceDB is the database for AI...to manage their data, to do a performant billion scale vector search."

    “We're big believers in the composable data systems vision."

    "You can insert data into LanceDB using Panda's data frames...to sort of really large 'embed the internet' kind of workflows."

    "We wanted to create a new generation of data infrastructure that makes their [AI engineers] lives a lot easier."

    "LanceDB offers up to 1,000 times faster performance than Parquet."


    Change She:

    LinkedInX (Twitter)

    LanceDB:

    X (Twitter)GitHubWebDiscordVectorDB Recipes

    Nicolay Gerold:

    LinkedInX (Twitter)

    00:00 Introduction to Multimodal Embeddings
    00:26 Challenges in Storage and Serving
    02:51 LanceDB: The Solution for Multimodal Data
    04:25 Interview with Chang She: Origins and Vision
    10:37 Technical Deep Dive: LanceDB and Rust
    18:11 Innovations in Data Storage Formats
    19:00 Optimizing Performance in Lakehouse Ecosystems
    21:22 Future Use Cases for LanceDB
    26:04 Building Effective Recommendation Systems
    32:10 Exciting Applications and Future Directions

  • Today’s guest is Mór Kapronczay. Mór is the Head of ML at superlinked. Superlinked is a compute framework for your information retrieval and feature engineering systems, where they turn anything into embeddings.

    When most people think about embeddings, they think about ada, openai.

    You just take your text and throw it in there.

    But that’s too crude.

    OpenAI embeddings are trained on the internet.

    But your data set (most likely) is not the internet.

    You have different nuances.

    And you have more than just text.

    So why not use it.

    Some highlights:

    Text Embeddings are Not a Magic Bullet

    ➡️ Pouring everything into a text embedding model won't yield magical results ➡️ Language is lossy - it's a poor compression method for complex information

    Embedding Numerical Data

    ➡️ Direct number embeddings don't work well for vector search ➡️ Consider projecting number ranges onto a quarter circle ➡️ Apply logarithmic transforms for skewed distributions

    Multi-Modal Embeddings

    ➡️ Create separate vector parts for different data aspects ➡️ Normalize individual parts ➡️ Weight vector parts based on importance

    A Multi-Vector approach can help you understand the contributions of each modality or embedding and give you an easier time to fine-tune your retrieval system without fine-tuning your embedding models by tuning your vector database like you would a search database (like Elastic).

    Mór Kapronczay

    LinkedInSuperlinkedX (Twitter)

    Nicolay Gerold:

    ⁠LinkedIn⁠⁠X (Twitter)

    00:00 Introduction to Embeddings 00:30 Beyond Text: Expanding Embedding Capabilities 02:09 Challenges and Innovations in Embedding Techniques 03:49 Unified Representations and Vector Computers 05:54 Embedding Complex Data Types 07:21 Recommender Systems and Interaction Data 08:59 Combining and Weighing Embeddings 14:58 Handling Numerical and Categorical Data 20:35 Optimizing Embedding Efficiency 22:46 Dynamic Weighting and Evaluation 24:35 Exploring AB Testing with Embeddings 25:08 Joint vs Separate Embedding Spaces 27:30 Understanding Embedding Dimensions 29:59 Libraries and Frameworks for Embeddings 32:08 Challenges in Embedding Models 33:03 Vector Database Connectors 34:09 Balancing Production and Updates 36:50 Future of Vector Search and Modalities 39:36 Building with Embeddings: Tips and Tricks 42:26 Concluding Thoughts and Next Steps

  • Today we have Jessica Talisman with us, who is working as an Information Architect at Adobe. She is (in my opinion) the expert on taxonomies and ontologies.

    That’s what you will learn today in this episode of How AI Is Built. Taxonomies, ontologies, knowledge graphs.

    Everyone is talking about them no-one knows how to build them.

    But before we look into that, what are they good for in search?

    Imagine a large corpus of academic papers. When a user searches for "machine learning in healthcare", the system can:

    Recognize "machine learning" as a subcategory of "artificial intelligence"Identify "healthcare" as a broad field with subfields like "diagnostics" and "patient care"We can use these to expand the query or narrow it down.We can return results that include papers on "neural networks for medical imaging" or "predictive analytics in patient outcomes", even if these exact phrases weren't in the search queryWe can also filter down and remove papers not tagged with AI that might just mention it in a side not.

    So we are building the plumbing, the necessary infrastructure for tagging, categorization, query expansion and relexation, filtering.

    So how can we build them?

    1️⃣ Start with Industry Standards • Leverage established taxonomies (e.g., Google, GS1, IAB) • Audit them for relevance to your project • Use as a foundation, not a final solution

    2️⃣ Customize and Fill Gaps • Adapt industry taxonomies to your specific domain • Create a "coverage model" for your unique needs • Mine internal docs to identify domain-specific concepts

    3️⃣ Follow Ontology Best Practices • Use clear, unique primary labels for each concept • Include definitions to avoid ambiguity • Provide context for each taxonomy node

    Jessica Talisman:

    LinkedIn

    Nicolay Gerold:

    ⁠LinkedIn⁠⁠X (Twitter)

    00:00 Introduction to Taxonomies and Knowledge Graphs 02:03 Building the Foundation: Metadata to Knowledge Graphs 04:35 Industry Taxonomies and Coverage Models 06:32 Clustering and Labeling Techniques 11:00 Evaluating and Maintaining Taxonomies 31:41 Exploring Taxonomy Granularity 32:18 Differentiating Taxonomies for Experts and Users 33:35 Mapping and Equivalency in Taxonomies 34:02 Best Practices and Examples of Taxonomies 40:50 Building Multilingual Taxonomies 44:33 Creative Applications of Taxonomies 48:54 Overrated and Underappreciated Technologies 53:00 The Importance of Human Involvement in AI 53:57 Connecting with the Speaker 55:05 Final Thoughts and Takeaways

  • ColPali makes us rethink how we approach document processing.

    ColPali revolutionizes visual document search by combining late interaction scoring with visual language models. This approach eliminates the need for extensive text extraction and preprocessing, handling messy real-world data more effectively than traditional methods.

    In this episode, Jo Bergum, chief scientist at Vespa, shares his insights on how ColPali is changing the way we approach complex document formats like PDFs and HTML pages.

    Introduction to ColPali:

    Combines late interaction scoring from Colbert with visual language model (PoliGemma)Represents screenshots of documents as multi-vector representationsEnables searching across complex document formats (PDFs, HTML)Eliminates need for extensive text extraction and preprocessing

    Advantages of ColPali:

    Handles messy, real-world data better than traditional methodsConsiders both textual and visual elements in documentsPotential applications in various domains (finance, medical, legal)Scalable to large document collections with proper optimization

    Jo Bergum:

    LinkedInVespaX (Twitter)PDF Retrieval with Vision Language ModelsScaling ColPali to billions of PDFs with Vespa

    Nicolay Gerold:

    ⁠LinkedIn⁠⁠X (Twitter)

    00:00 Messy Data in AI 01:19 Challenges in Search Systems 03:41 Understanding Representational Approaches 08:18 Dense vs Sparse Representations 19:49 Advanced Retrieval Models and ColPali 30:59 Exploring Image-Based AI Progress 32:25 Challenges and Innovations in OCR 33:45 Understanding ColPali and MaxSim 38:13 Scaling and Practical Applications of ColPali 44:01 Future Directions and Use Cases

  • Today, we're talking to Aamir Shakir, the founder and baker at mixedbread.ai, where he's building some of the best embedding and re-ranking models out there. We go into the world of rerankers, looking at how they can classify, deduplicate documents, prioritize LLM outputs, and delve into models like ColBERT.

    We discuss:

    The role of rerankers in retrieval pipelinesAdvantages of late interaction models like ColBERT for interpretabilityTraining rerankers vs. embedding models and their impact on performanceIncorporating metadata and context into rerankers for enhanced relevanceCreative applications of rerankers beyond traditional searchChallenges and future directions in the retrieval space

    Still not sure whether to listen? Here are some teasers:

    Rerankers can significantly boost your retrieval system's performance without overhauling your existing setup.Late interaction models like ColBERT offer greater explainability by allowing token-level comparisons between queries and documents.Training a reranker often yields a higher impact on retrieval performance than training an embedding model.Incorporating metadata directly into rerankers enables nuanced search results based on factors like recency and pricing.Rerankers aren't just for search—they can be used for zero-shot classification, deduplication, and prioritizing outputs from large language models.The future of retrieval may involve compound models capable of handling multiple modalities, offering a more unified approach to search.

    Aamir Shakir:

    LinkedInX (Twitter)Mixedbread.ai

    Nicolay Gerold:

    ⁠LinkedIn⁠⁠X (Twitter)

    00:00 Introduction and Overview 00:25 Understanding Rerankers 01:46 Maxsim and Token-Level Embeddings 02:40 Setting Thresholds and Similarity 03:19 Guest Introduction: Aamir Shakir 03:50 Training and Using Rerankers (Episode Start) 04:50 Challenges and Solutions in Reranking 08:03 Future of Retrieval and Recommendation 26:05 Multimodal Retrieval and Reranking 38:04 Conclusion and Takeaways

  • Text embeddings have limitations when it comes to handling long documents and out-of-domain data.

    Today, we are talking to Nils Reimers. He is one of the researchers who kickstarted the field of dense embeddings, developed sentence transformers, started HuggingFace’s Neural Search team and now leads the development of search foundational models at Cohere. Tbh, he has too many accolades to count off here.

    We talk about the main limitations of embeddings:

    Failing out of domainStruggling with long documentsVery hard to debugHard to find formalize what actually is similar

    Are you still not sure whether to listen? Here are some teasers:

    Interpreting embeddings can be challenging, and current models are not easily explainable.Fine-tuning is necessary to adapt embeddings to specific domains, but it requires careful consideration of the data and objectives.Re-ranking is an effective approach to handle long documents and incorporate additional factors like recency and trustworthiness.The future of embeddings lies in addressing scalability issues and exploring new research directions.

    Nils Reimers:

    LinkedInX (Twitter)WebsiteCohere

    Nicolay Gerold:

    ⁠LinkedIn⁠⁠X (Twitter)

    text embeddings, limitations, long documents, interpretation, fine-tuning, re-ranking, future research

    00:00 Introduction and Guest Introduction 00:43 Early Work with BERT and Argument Mining 02:24 Evolution and Innovations in Embeddings 03:39 Constructive Learning and Hard Negatives 05:17 Training and Fine-Tuning Embedding Models 12:48 Challenges and Limitations of Embeddings 18:16 Adapting Embeddings to New Domains 22:41 Handling Long Documents and Re-Ranking 31:08 Combining Embeddings with Traditional ML 45:16 Conclusion and Upcoming Episodes

  • Hey! Welcome back.

    Today we look at how we can get our RAG system ready for scale.

    We discuss common problems and their solutions, when you introduce more users and more requests to your system.

    For this we are joined by Nirant Kasliwal, the author of fastembed.

    Nirant shares practical insights on metadata extraction, evaluation strategies, and emerging technologies like Colipali. This episode is a must-listen for anyone looking to level up their RAG implementations.

    "Naive RAG has a lot of problems on the retrieval end and then there's a lot of problems on how LLMs look at these data points as well."

    "The first 30 to 50% of gains are relatively quick. The rest 50% takes forever."

    "You do not want to give the same answer about company's history to the co-founding CEO and the intern who has just joined."

    "Embedding similarity is the signal on which you want to build your entire search is just not quite complete."

    Key insights:

    Naive RAG often fails due to limitations of embeddings and LLMs' sensitivity to input ordering.Query profiling and expansion: Use clustering and tools like latent Scope to identify problematic query typesExpand queries offline and use parallel searches for better resultsMetadata extraction: Extract temporal, entity, and other relevant information from queriesUse LLMs for extraction, with checks against libraries like Stanford NLPUser personalization: Include user role, access privileges, and conversation historyAdapt responses based on user expertise and readability scoresEvaluation and improvement: Create synthetic datasets and use real user feedbackEmploy tools like DSPY for prompt engineeringAdvanced techniques: Query routing based on type and urgencyUse smaller models (1-3B parameters) for easier iteration and error spottingImplement error handling and cross-validation for extracted metadata

    Nirant Kasliwal:

    X (Twitter)LinkedInSearch in the LLM Era for AI Engineers (course)

    Nicolay Gerold:

    ⁠LinkedIn⁠⁠X (Twitter)

    query understanding, AI-powered search, Lambda Mart, e-commerce ranking, networking, experts, recommendation, search

  • In this episode of How AI is Built, Nicolay Gerold interviews Doug Turnbull, a search engineer at Reddit and author on “Relevant Search”. They discuss how methods and technologies, including large language models (LLMs) and semantic search, contribute to relevant search results.

    Key Highlights:

    Defining relevance is challenging and depends heavily on user intent and contextCombining multiple search techniques (keyword, semantic, etc.) in tiers can improve resultsLLMs are emerging as a powerful tool for augmenting traditional search approachesOperational concerns often drive architectural decisions in large-scale search systemsUnderappreciated techniques like LambdaMART may see a resurgence

    Key Quotes:

    "There's not like a perfect measure or definition of what a relevant search result is for a given application. There are a lot of really good proxies, and a lot of really good like things, but you can't just like blindly follow the one objective, if you want to build a good search product." - Doug Turnbull

    "I think 10 years ago, what people would do is they would just put everything in Solr, Elasticsearch or whatever, and they would make the query to Elasticsearch pretty complicated to rank what they wanted... What I see people doing more and more these days is that they'll use each retrieval source as like an independent piece of infrastructure." - Doug Turnbull on the evolution of search architecture

    "Honestly, I feel like that's a very practical and underappreciated thing. People talk about RAG and I talk, I call this GAR - generative AI augmented retrieval, so you're making search smarter with generative AI." - Doug Turnbull on using LLMs to enhance search

    "LambdaMART and gradient boosted decision trees are really powerful, especially for when you're expressing your re-ranking as some kind of structured learning problem... I feel like we'll see that and like you're seeing papers now where people are like finding new ways of making BM25 better." - Doug Turnbull on underappreciated techniques

    Doug Turnbull

    LinkedInX (Twitter)Web

    Nicolay Gerold:

    ⁠LinkedIn⁠⁠X (Twitter)

    Chapters

    00:00 Introduction and Guest Introduction 00:52 Understanding Relevant Search Results 01:18 Search Behavior on Social Media 02:14 Challenges in Defining Relevance 05:12 Query Understanding and Ranking Signals 10:57 Evolution of Search Technologies 15:15 Combining Search Techniques 21:49 Leveraging LLMs and Embeddings 25:49 Operational Considerations in Search Systems 39:09 Concluding Thoughts and Future Directions

  • In this episode, we talk data-driven search optimizations with Charlie Hull.

    Charlie is a search expert from Open Source Connections. He has built Flax, one of the leading open source search companies in the UK, has written “Searching the Enterprise”, and is one of the main voices on data-driven search.

    We discuss strategies to improve search systems quantitatively and much more.

    Key Points:

    Relevance in search is subjective and context-dependent, making it challenging to measure consistently.Common mistakes in assessing search systems include overemphasizing processing speed and relying solely on user complaints.Three main methods to measure search system performance: Human evaluationUser interaction data analysisAI-assisted judgment (with caution)Importance of balancing business objectives with user needs when optimizing search results.Technical components for assessing search systems: Query logs analysisSource data quality examinationTest queries and cases setup

    Resources mentioned:

    Quepid: Open-source tool for search quality testingHaystack conference: Upcoming event in Berlin (September 30 - October 1)Relevance Slack communityOpenSource Connections

    Charlie Hull:

    LinkedInX (Twitter)

    Nicolay Gerold:

    ⁠LinkedIn⁠⁠X (Twitter)

    search results, search systems, assessing, evaluation, improvement, data quality, user behavior, proactive, test dataset, search engine optimization, SEO, search quality, metadata, query classification, user intent, search results, metrics, business objectives, user objectives, experimentation, continuous improvement, data modeling, embeddings, machine learning, information retrieval

    00:00 Introduction
    01:35 Challenges in Measuring Search Relevance
    02:19 Common Mistakes in Search System Assessment
    03:22 Methods to Measure Search System Performance
    04:28 Human Evaluation in Search Systems
    05:18 Leveraging User Interaction Data
    06:04 Implementing AI for Search Evaluation
    09:14 Technical Components for Assessing Search Systems
    12:07 Improving Search Quality Through Data Analysis
    17:16 Proactive Search System Monitoring
    24:26 Balancing Business and User Objectives in Search
    25:08 Search Metrics and KPIs: A Contract Between Teams
    26:56 The Role of Recency and Popularity in Search Algorithms
    28:56 Experimentation: The Key to Optimizing Search
    30:57 Offline Search Labs and A/B Testing
    34:05 Simple Levers to Improve Search
    37:38 Data Modeling and Its Importance in Search
    43:29 Combining Keyword and Vector Search
    44:24 Bridging the Gap Between Machine Learning and Information Retrieval
    47:13 Closing Remarks and Contact Information

  • Welcome back to How AI Is Built.

    We have got a very special episode to kick off season two.

    Daniel Tunkelang is a search consultant currently working with Algolia. He is a leader in the field of information retrieval, recommender systems, and AI-powered search. He worked for Canva, Algolia, Cisco, Gartner, Handshake, to pick a few.

    His core focus is query understanding.

    **Query understanding is about focusing less on the results and more on the query.** The query of the user is the first-class citizen. It is about figuring out what the user wants and than finding, scoring, and ranking results based on it. So most of the work happens before you hit the database.

    **Key Takeaways:**

    - The "bag of documents" model for queries and "bag of queries" model for documents are useful approaches for representing queries and documents in search systems.
    - Query specificity is an important factor in query understanding. It can be measured using cosine similarity between query vectors and document vectors.
    - Query classification into broad categories (e.g., product taxonomy) is a high-leverage technique for improving search relevance and can act as a guardrail for query expansion and relaxation.
    - Large Language Models (LLMs) can be useful for search, but simpler techniques like query similarity using embeddings can often solve many problems without the complexity and cost of full LLM implementations.
    - Offline processing to enhance document representations (e.g., filling in missing metadata, inferring categories) can significantly improve search quality.

    **Daniel Tunkelang**

    - [LinkedIn](https://www.linkedin.com/in/dtunkelang/)
    - [Medium](https://queryunderstanding.com/)

    **Nicolay Gerold:**

    - [⁠LinkedIn⁠](https://www.linkedin.com/in/nicolay-gerold/)
    - [⁠X (Twitter)](https://twitter.com/nicolaygerold)
    - [Substack](https://nicolaygerold.substack.com/)

    Query understanding, search relevance, bag of documents, bag of queries, query specificity, query classification, named entity recognition, pre-retrieval processing, caching, large language models (LLMs), embeddings, offline processing, metadata enhancement, FastText, MiniLM, sentence transformers, visualization, precision, recall

    [00:00:00] 1. Introduction to Query Understanding

    Definition and importance in search systemsEvolution of query understanding techniques

    [00:05:30] 2. Query Representation Models

    The "bag of documents" model for queriesThe "bag of queries" model for documentsAdvantages of holistic query representation

    [00:12:00] 3. Query Specificity and Classification

    Measuring query specificity using cosine similarityImportance of query classification in search relevanceImplementing and leveraging query classifiers

    [00:19:30] 4. Named Entity Recognition in Query Understanding

    Role of NER in query processingChallenges with unique or tail entities

    [00:24:00] 5. Pre-Retrieval Query Processing

    Importance of early-stage query analysisBalancing computational resources and impact

    [00:28:30] 6. Performance Optimization Techniques

    Caching strategies for query understandingOffline processing for document enhancement

    [00:33:00] 7. Advanced Techniques: Embeddings and Language Models

    Using embeddings for query similarityRole of Large Language Models (LLMs) in searchWhen to use simpler techniques vs. complex models

    [00:39:00] 8. Practical Implementation Strategies

    Starting points for engineers new to query understandingTools and libraries for query understanding (FastText, MiniLM, etc.)Balancing precision and recall in search systems

    [00:44:00] 9. Visualization and Analysis of Query Spaces

    Discussion on t-SNE, UMAP, and other visualization techniquesLimitations and alternatives to embedding visualizations

    [00:47:00] 10. Future Directions and Closing Thoughts - Emerging trends in query understanding - Key takeaways for search system engineers

    [00:53:00] End of Episode

  • Today we are launching the season 2 of How AI Is Built.

    The last few weeks, we spoke to a lot of regular listeners and past guests and collected feedback. Analyzed our episode data. And we will be applying the learnings to season 2.

    This season will be all about search.

    We are trying to make it better, more actionable, and more in-depth. The goal is that at the end of this season, you have a full-fleshed course on search in podcast form, which mini-courses on specific elements like RAG.

    We will be talking to experts from information retrieval, information architecture, recommendation systems, and RAG; from academia and industry. Fields that do not really talk to each other.

    We will try to unify and transfer the knowledge and give you a full tour of search, so you can build your next search application or feature with confidence.

    We will be talking to Charlie Hull on how to systematically improve search systems, with Nils Reimers on the fundamental flaws of embeddings and how to fix them, with Daniel Tunkelang on how to actually understand the queries of the user, and many more.


    We will try to bridge the gaps. How to use decades of research and practice in iteratively improving traditional search and apply it to RAG. How to take new methods from recommendation systems and vector databases and bring it into traditional search systems. How to use all of the different methods as search signals and combine them to deliver the results your user actually wants.

    We will be using two types of episodes:

    Traditional deep dives, like we have done them so far. Each one will dive into one specific topic within search interviewing an expert on that topic.Supplementary episodes, which answer one additional question; often either complementary or precursory knowledge for the episode, which we did not get to in the deep dive.

    We will be starting with episodes next week, looking at the first, last, and overarching action in search: understanding user intent and understanding the queries with Daniel Tunkelang.

    I am really excited to kick this off.

    I would love to hear from you:

    What would you love to learn in this season?What guest should I have on?What topics should I make a deep dive on (try to be specific)?

    Yeah, let me know in the comments or just slide into my DMs on Twitter or LinkedIn.

    I am looking forward to hearing from you guys.

    I want to try to be more interactive. So anytime you encounter anything unclear or any question pops up in one of the episode, give me a shout and I will try to answer it to you and to everyone.

    Enough of me rambling. Let’s kick this off. I will see you next Thursday, when we start with query understanding.

    Shoot me a message and stay up to date:

    ⁠LinkedIn⁠⁠X (Twitter)
  • In this episode of "How AI is Built," host Nicolay Gerold interviews Jonathan Yarkoni, founder of Reach Latent. Jonathan shares his expertise in extracting value from unstructured data using AI, discussing challenging projects, the impact of ChatGPT, and the future of generative AI. From weather prediction to legal tech, Jonathan provides valuable insights into the practical applications of AI across various industries.

    Key Takeaways

    Generative AI projects often require less data cleaning due to the models' tolerance for "dirty" data, allowing for faster implementation in some cases.The success of AI projects post-delivery is ensured through monitoring, but automatic retraining of generative AI applications is not yet common due to evaluation challenges.Industries ripe for AI disruption include text-heavy fields like legal, education, software engineering, and marketing, as well as biotech and entertainment.The adoption of AI is expected to occur in waves, with 2024 likely focusing on internal use cases and 2025 potentially seeing more customer-facing applications as models improve.Synthetic data generation, using models like GPT-4, can be a valuable approach for training AI systems when real data is scarce or sensitive.Evaluation frameworks like RAGAS and custom metrics are essential for assessing the quality of synthetic data and AI model outputs.Jonathan’s ideal tech stack for generative AI projects includes tools like Instructor, Guardrails, Semantic Routing, DSPY, LangChain, and LlamaIndex, with a growing emphasis on evaluation stacks.

    Key Quotes

    "I think we're going to see another wave in 2024 and another one in 2025. And people are familiarized. That's kind of the wave of 2023. 2024 is probably still going to be a lot of internal use cases because it's a low risk environment and there was a lot of opportunity to be had."

    "To really get to production reliably, we have to have these tools evolve further and get more standardized so people can still use the old ways of doing production with the new technology."

    Jonathan Yarkoni

    LinkedInYouTubeX (Twitter)Reach Latent

    Nicolay Gerold:

    ⁠LinkedIn⁠⁠X (Twitter)

    Chapters

    00:00 Introduction: Extracting Value from Unstructured Data
    03:16 Flexible Tailoring Solutions to Client Needs
    05:39 Monitoring and Retraining Models in the Evolving AI Landscape
    09:15 Generative AI: Disrupting Industries and Unlocking New Possibilities
    17:47 Balancing Immediate Results and Cutting-Edge Solutions in AI Development
    28:29 Dream Tech Stack for Generative AI

    unstructured data, textual data, automation, weather prediction, data cleaning, chat GPT, AI disruption, legal, education, software engineering, marketing, biotech, immediate results, cutting-edge solutions, tech stack

  • This episode of "How AI Is Built" is all about data processing for AI. Abhishek Choudhary and Nicolay discuss Spark and alternatives to process data so it is AI-ready.

    Spark is a distributed system that allows for fast data processing by utilizing memory. It uses a dataframe representation "RDD" to simplify data processing.

    When should you use Spark to process your data for your AI Systems?

    → Use Spark when:

    Your data exceeds terabytes in volumeYou expect unpredictable data growthYour pipeline involves multiple complex operationsYou already have a Spark cluster (e.g., Databricks)Your team has strong Spark expertiseYou need distributed computing for performanceBudget allows for Spark infrastructure costs

    → Consider alternatives when:

    Dealing with datasets under 1TBIn early stages of AI developmentBudget constraints limit infrastructure spendingSimpler tools like Pandas or DuckDB suffice

    Spark isn't always necessary. Evaluate your specific needs and resources before committing to a Spark-based solution for AI data processing.

    In today’s episode of How AI Is Built, Abhishek and I discuss data processing:

    When to use Spark vs. alternatives for data processingKey components of Spark: RDDs, DataFrames, and SQLIntegrating AI into data pipelinesChallenges with LLM latency and consistencyData storage strategies for AI workloadsOrchestration tools for data pipelinesTips for making LLMs more reliable in production

    Abhishek Choudhary:

    LinkedInGitHubX (Twitter)

    Nicolay Gerold:

    ⁠LinkedIn⁠⁠X (Twitter)
  • In this episode, Nicolay talks with Rahul Parundekar, founder of AI Hero, about the current state and future of AI agents. Drawing from over a decade of experience working on agent technology at companies like Toyota, Rahul emphasizes the importance of focusing on realistic, bounded use cases rather than chasing full autonomy.

    They dive into the key challenges, like effectively capturing expert workflows and decision processes, delivering seamless user experiences that integrate into existing routines, and managing costs through techniques like guardrails and optimized model choices. The conversation also explores potential new paradigms for agent interactions beyond just chat.

    Key Takeaways:

    Agents need to focus on realistic use cases rather than trying to be fully autonomous. Enterprises are unlikely to allow agents full autonomy anytime soon. Capturing the logic and workflows in the user's head is the key challenge. Shadowing experts and having them demonstrate workflows is more effective than asking them to document processes. User experience is crucial - agents must integrate seamlessly into existing user workflows without major disruptions. Interfaces beyond just chat may be needed. Cost control is important - techniques like guardrails, context windowing, model choice optimization, and dev vs production modes can help manage costs. New paradigms beyond just chat could be powerful - e.g. workflow specification, state/declarative definition of desired end-state. Prompt engineering and dynamic prompt improvement based on feedback remain an open challenge.

    Key Quotes:

    "Empowering users to create their own workflows is essential for effective agent usage." "Capturing workflows accurately is a significant challenge in agent development." "Preferences, right? So a lot of the work becomes like, hey, can you do preference learning for this user so that the next time the user doesn't have to enter the same information again, things like that."

    Rahul Parundekar:

    AI Hero AI Hero Docs

    Nicolay Gerold:

    ⁠LinkedIn⁠ ⁠X (Twitter)

    00:00 Exploring the Potential of Autonomous Agents

    02:23 Challenges of Accuracy and Repeatability in Agents

    08:31 Capturing User Workflows and Improving Prompts

    13:37 Tech Stack for Implementing Agents in the Enterprise

    agent development, determinism, user experience, agent paradigms, private use, human-agent interaction, user workflows, agent deployment, human-in-the-loop, LLMs, declarative ways, scalability, AI Hero

  • In this conversation, Nicolay and Richmond Alake discuss various topics related to building AI agents and using MongoDB in the AI space. They cover the use of agents and multi-agents, the challenges of controlling agent behavior, and the importance of prompt compression.

    When you are building agents. Build them iteratively. Start with simple LLM calls before moving to multi-agent systems.

    Main Takeaways:

    Prompt Compression: Using techniques like prompt compression can significantly reduce the cost of running LLM-based applications by reducing the number of tokens sent to the model. This becomes crucial when scaling to production. Memory Management: Effective memory management is key for building reliable agents. Consider different memory components like long-term memory (knowledge base), short-term memory (conversation history), semantic cache, and operational data (system logs). Store each in separate collections for easy access and reference. Performance Optimization: Optimize performance across multiple dimensions - output quality (by tuning context and knowledge base), latency (using semantic caching), and scalability (using auto-scaling databases like MongoDB). Prompting Techniques: Leverage prompting techniques like ReAct (observe, plan, act) and structured prompts (JSON, pseudo-code) to improve agent predictability and output quality. Experimentation: Continuous experimentation is crucial in this rapidly evolving field. Try different frameworks (LangChain, Crew AI, Haystack), models (Claude, Anthropic, open-source), and techniques to find the best fit for your use case.

    Richmond Alake:

    LinkedIn Medium Find Richmond on MongoDB X (Twitter) YouTube GenAI Showcase MongoDB MongoDB AI Stack

    Nicolay Gerold:

    ⁠LinkedIn⁠ ⁠X (Twitter)

    00:00 Reducing the Scope of AI Agents

    01:55 Seamless Data Ingestion

    03:20 Challenges and Considerations in Implementing Multi-Agents

    06:05 Memory Modeling for Robust Agents with MongoDB

    15:05 Performance Optimization in AI Agents

    18:19 RAG Setup

    AI agents, multi-agents, prompt compression, MongoDB, data storage, data ingestion, performance optimization, tooling, generative AI

  • In this episode, Kirk Marple, CEO and founder of Graphlit, shares his expertise on building efficient data integrations.

    Kirk breaks down his approach using relatable concepts:

    The "Two-Sided Funnel": This model streamlines data flow by converting various data sources into a standard format before distributing it. Universal Data Streams: Kirk explains how he transforms diverse data into a single, manageable stream of information. Parallel Processing: Learn about the "competing consumer model" that allows for faster data handling. Building Blocks for Success: Discover the importance of well-defined interfaces and actor models in creating robust data systems. Tech Talk: Kirk discusses data normalization techniques and the potential shift towards a more streamlined "Kappa architecture." Reusable Patterns: Find out how Kirk's methods can speed up the integration of new data sources.

    Kirk Marple:

    LinkedIn X (Twitter) Graphlit Graphlit Docs

    Nicolay Gerold:

    ⁠LinkedIn⁠ ⁠X (Twitter)

    Chapters

    00:00 Building Integrations into Different Tools

    00:44 The Two-Sided Funnel Model for Data Flow

    04:07 Using Well-Defined Interfaces for Faster Integration

    04:36 Managing Feeds and State with Actor Models

    06:05 The Importance of Data Normalization

    10:54 Tech Stack for Data Flow

    11:52 Progression towards a Kappa Architecture

    13:45 Reusability of Patterns for Faster Integration

    data integration, data sources, data flow, two-sided funnel model, canonical format, stream of ingestible objects, competing consumer model, well-defined interfaces, actor model, data normalization, tech stack, Kappa architecture, reusability of patterns

  • In our latest episode, we sit down with Derek Tu, Founder and CEO of Carbon, a cutting-edge ETL tool designed specifically for large language models (LLMs).

    Carbon is streamlining AI development by providing a platform for integrating unstructured data from various sources, enabling businesses to build innovative AI applications more efficiently while addressing data privacy and ethical concerns.

    "I think people are trying to optimize around the chunking strategy... But for me, that seems a bit maybe not focusing on the right area of optimization. These embedding models themselves have gone just like, so much more advanced over the past five to 10 years that regardless of what representation you're passing in, they do a pretty good job of being able to understand that information semantically and returning the relevant chunks." - Derek Tu on the importance of embedding models over chunking strategies "If you are cost conscious and if you're worried about performance, I would definitely look at quantizing your embeddings. I think we've probably been able to, I don't have like the exact numbers here, but I think we might be saving at least half, right, in storage costs by quantizing everything." - Derek Tu on optimizing costs and performance with vector databases

    Derek Tu:

    LinkedIn Carbon

    Nicolay Gerold:

    ⁠LinkedIn⁠ ⁠X (Twitter)

    Key Takeaways:

    Understand your data sources: Before building your ETL pipeline, thoroughly assess the various data sources you'll be working with, such as Slack, Email, Google Docs, and more. Consider the unique characteristics of each source, including data format, structure, and metadata. Normalize and preprocess data: Develop strategies to normalize and preprocess the unstructured data from different sources. This may involve parsing, cleaning, and transforming the data into a standardized format that can be easily consumed by your AI models. Experiment with chunking strategies: While there's no one-size-fits-all approach to chunking, it's essential to experiment with different strategies to find what works best for your specific use case. Consider factors like data format, structure, and the desired granularity of the chunks. Leverage metadata and tagging: Metadata and tagging can play a crucial role in organizing and retrieving relevant data for your AI models. Implement mechanisms to capture and store important metadata, such as document types, topics, and timestamps, and consider using AI-powered tagging to automatically categorize your data. Choose the right embedding model: Embedding models have advanced significantly in recent years, so focus on selecting the right model for your needs rather than over-optimizing chunking strategies. Consider factors like model performance, dimensionality, and compatibility with your data types. Optimize vector database usage: When working with vector databases, consider techniques like quantization to reduce storage costs and improve performance. Experiment with different configurations and settings to find the optimal balance for your specific use case.

    00:00 Introduction and Optimizing Embedding Models

    03:00 The Evolution of Carbon and Focus on Unstructured Data

    06:19 Customer Progression and Target Group

    09:43 Interesting Use Cases and Handling Different Data Representations

    13:30 Chunking Strategies and Normalization

    20:14 Approach to Chunking and Choosing a Vector Database

    23:06 Tech Stack and Recommended Tools

    28:19 Future of Carbon: Multimodal Models and Building a Platform

    Carbon, LLMs, RAG, chunking, data processing, global customer base, GDPR compliance, AI founders, AI agents, enterprises

  • In this episode, Nicolay sits down with Hugo Lu, founder and CEO of Orchestra, a modern data orchestration platform. As data pipelines and analytics workflows become increasingly complex, spanning multiple teams, tools and cloud services, the need for unified orchestration and visibility has never been greater.

    Orchestra is a serverless data orchestration tool that aims to provide a unified control plane for managing data pipelines, infrastructure, and analytics across an organization's modern data stack.

    The core architecture involves users building pipelines as code which then run on Orchestra's serverless infrastructure. It can orchestrate tasks like data ingestion, transformation, AI calls, as well as monitoring and getting analytics on data products. All with end-to-end visibility, data lineage and governance even when organizations have a scattered, modular data architecture across teams and tools.

    Key Quotes:

    Find the right level of abstraction when building data orchestration tasks/workflows."I think the right level of abstraction is always good. I think like Prefect do this really well, right? Their big sell was, just put a decorator on a function and it becomes a task. That is a great idea. You know, just make tasks modular and have them do all the boilerplate stuff like error logging, monitoring of data, all of that stuff.” Modularize data pipeline components:"It's just around understanding what that dev workflow should look like. I think it should be a bit more modular."Having a modular architecture where different components like data ingestion, transformation, model training are decoupled allows better flexibility and scalability. Adopt a streaming/event-driven architecture for low-latency AI use cases:"If you've got an event-driven architecture, then, you know, that's not what you use an orchestration tool for...if you're having a conversation with a chatbot, like, you know, you're sending messages, you're sending events, you're getting a response back. That I would argue should be dealt with by microservices."

    Hugo Lu:

    LinkedIn Newsletter Orchestra Orchestra Docs

    Nicolay Gerold:

    ⁠LinkedIn⁠ ⁠X (Twitter)

    00:00 Introduction to Orchestra and its Focus on Data Products

    08:03 Unified Control Plane for Data Stack and End-to-End Control

    14:42 Use Cases and Unique Applications of Orchestra

    19:31 Retaining Existing Dev Workflows and Best Practices in Orchestra

    22:23 Event-Driven Architectures and Monitoring in Orchestra

    23:49 Putting Data Products First and Monitoring Health and Usage

    25:40 The Future of Data Orchestration: Stream-Based and Cost-Effective

    data orchestration, Orchestra, serverless architecture, versatility, use cases, maturity levels, challenges, AI workloads