Kafka: LinkedIn's Distributed Messaging System

Episodios

What Are Neural Networks and How Do They Work?
6 ene· 10-Minute System Design
In this 10-minute episode, we’ll explore neural networks — the core component of many modern AI models, including LLMs like ChatGPT, Gemini, and Claude, that mimic how the human brain works. We’ll demystify the “black box” nature of the neural networks and reveal how they work. It’s a must-listen for new learners curious about AI.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
How Meta Trains AI Models at Scale
24 oct 2024· 10-Minute System Design
In this episode, we'll take a look at Meta’s ambitious approach to scaling large language models. We'll explore the shift from handling many smaller models for recommendation engines to building colossal generative AI models, and the immense challenges that come with it. From hardware and software optimizations to managing power and dealing with inevitable hardware failures, we'll break down the critical pieces that make Meta's infrastructure tick. What does it take to run systems this large without breaking? Tune in to learn how Meta did it.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
¿Faltan episodios?

Pulsa aquí para actualizar resultados
How Netflix Streams High-Quality Video
23 oct 2024· 10-Minute System Design
In this episode, let's explore how Netflix revamped their video processing pipeline, moving from a monolithic system to a microservices architecture. What drove such a major shift? You'll hear how their original platform, Reloaded, couldn’t keep up with Netflix’s rapid pace of innovation, and why Cosmos, their new system, is now the backbone of everything from streaming to studio operations. But what challenges did they face along the way? And is Cosmos truly the future-proof solution it promises to be? Tune in and find out.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
How Apple Stores Billions of Data in iCloud
22 oct 2024· 10-Minute System Design
In this episode, we'll explore the intricate system and architecture design behind Apple's iCloud. We'll break down how Apple seamlessly handles billions of users by combining Cassandra and FoundationDB to power iCloud's backbone. What prompted Apple to shift from Cassandra to FoundationDB, and how does this choice impact scalability and performance? Get a closer look at the architecture that makes iCloud tick, and discover how it enables such a smooth user experience. The surprising reason behind Apple’s tech pivot might just change the way you think about designing cloud storage systems.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
How Uber Shows Nearby Drivers Quickly and Reliably
21 oct 2024· 10-Minute System Design
In this episode, we explore the system behind Uber's driver-matching functionality, capable of handling an incredible one million requests per second. We break down the key technologies that make it work, from H3, the hexagonal grid system for location indexing, to Ringpop, which scales services across servers. You'll hear about how GPS data is transformed into road segments, and how databases like Cassandra and Redis power this high-demand platform. Whether you're curious about large-scale systems or just fascinated by Uber's tech, this episode simplifies complex engineering into something anyone can understand.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
How Instagram Scaled to 2.5 Billion Users
14 oct 2024· 10-Minute System Design
In this episode, we'll learn how Instagram scaled to 2.5 billion users. We'll discuss the major challenges Instagram faced — from resource constraints to data consistency and performance, and unpack the innovative strategies the team used to tackle them. From replacing Python with more performant languages to leveraging Cassandra for distributed data storage, we'll learn how Instagram managed to keep things running smoothly at such massive scale. Curious how they did it? Tune in to hear how a mix of clever optimizations and solid technology choices helped them manage internet-scale traffic.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
How Facebook Scaled Memcached
13 oct 2024· 10-Minute System Design
In this episode, we explore how Facebook engineers scaled Memcached, the open-source caching system, to handle billions of requests and trillions of items. We’ll break down the challenges they faced and the smart solutions they developed — from reducing latency to optimizing memory usage. Join us as we uncover how they transitioned from a single cluster to a distributed system spread across the globe, tackling data replication, load balancing, and more. If you’re curious about the inner workings of high-performance caching at massive scale, this one’s for you.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
Spanner: Google's Globally-Distributed Database
12 oct 2024· 10-Minute System Design
In this episode, we explore another important piece of technology from Google: Spanner — a globally distributed database that reshapes how massive datasets are managed. We’ll talk about its unique architecture, including the TrueTime API, which solves clock uncertainty to ensure consistency across data centers. We’ll also cover Spanner’s concurrency control, two-phase commit, and lock-free read-only transactions. Plus, discover how Google’s ad platform, F1, leverages Spanner to handle millions of transactions with impressive speed and reliability.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
Kafka: LinkedIn's Distributed Messaging System
9 oct 2024· 10-Minute System Design
This episode focuses on Kafka, the distributed messaging system born at LinkedIn. Learn how Kafka was designed to tackle the massive streams of log data driving personalized recommendations, search algorithms, and real-time security. We'll explore how it outperforms traditional systems like ActiveMQ and RabbitMQ with its streamlined architecture, decentralized coordination, and focus on efficiency. Tune in to explore Kafka's unique design and how it’s becoming essential for modern data processing.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
Redis Distributed Lock
9 oct 2024· 10-Minute System Design
Ever wondered how multiple processes can safely share resources without stepping on each other's toes? In this episode, we'll talk about Redis's distributed lock and discover how it ensures mutual exclusion for shared resources across a network of Redis servers, allowing only one process at a time to gain access. We’ll delve into its safety and liveness properties that guarantee reliable lock management, even amidst failures. Join us as we unpack potential challenges like network partitions and discuss solutions that improve the Redlock algorithm's resilience.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
Hadoop: Yahoo's Distributed File System
9 oct 2024· 10-Minute System Design
In this episode, we take a closer look at the Hadoop Distributed File System (HDFS), a key part of the Hadoop framework that helps store and manage huge amounts of data. We’ll explore how HDFS spreads data across many affordable servers, making it both scalable and cost-effective. You’ll learn about its main components, like the NameNode and DataNodes, and how they work together. We’ll also discuss features that keep your data safe and ensure it moves efficiently. Join us, we’ll touch on the challenges of managing large data clusters and what the future might hold for HDFS.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
Chubby: Google's Distributed Lock Service
9 oct 2024· 10-Minute System Design
In this episode, our hosts delve into the legendary research paper detailing the creation and implementation of Chubby, Google's innovative distributed lock service. Designed for large-scale, loosely-coupled systems, Chubby offers a reliable mechanism for synchronization, such as electing primary servers among peers. The paper explores the critical design choices prioritizing availability over raw performance, revealing the system's architecture, implementation intricacies, and essential components like distributed consensus protocols and session management. Join us to uncover unexpected uses of Chubby, including its role as a name service, and the challenges of scaling and managing client behavior.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
Bigtable: Google's Distributed Storage System
9 oct 2024· 10-Minute System Design
Imagine a revolutionary storage system that can handle petabytes of data across thousands of ordinary servers. This is Bigtable — a groundbreaking solution that redefines how structured data is managed at scale. Discover how Bigtable handles petabytes of structured data across thousands of servers, enabling unparalleled scalability and flexibility. Join us as we uncover its real-world applications—from Google Analytics to Personalized Search — and the vital lessons learned in designing robust, large-scale systems.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
Cassandra: A Decentralized Structured Storage System from Facebook
9 oct 2024· 10-Minute System Design
In this episode, our hosts delve into Cassandra, the distributed storage system developed at Facebook to tackle the immense challenges of managing structured data. Designed for high availability and scalability, Cassandra emerged from the need to support billions of daily writes for the Inbox Search feature. Join us as we explore this game-changing piece of tech that influences modern distributed systems today.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
MapReduce: How Google Simplifies Large-Scale Data Processing
9 oct 2024· 10-Minute System Design
Join us in this episode as we dive into MapReduce. We’ll explore how it revolutionizes the way we process vast datasets on large clusters. With a focus on simplicity, the MapReduce framework abstracts complex tasks like data partitioning and fault tolerance, allowing users to easily define two essential functions: “Map” and “Reduce.”
We’ll discuss real-world applications that showcase its power—from distributed grep to web link analysis. If you’re curious about how to harness the potential of distributed systems without needing to be a parallel programming expert, this episode is for you!
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
Dynamo: Amazon’s Highly Available Key-Value Store
9 oct 2024· 10-Minute System Design
In this episode, our hosts take a closer look at a groundbreaking research paper on Dynamo, Amazon’s innovative distributed data storage system. With a focus on availability over consistency, Dynamo employs cutting-edge techniques like consistent hashing and gossip-based failure detection to deliver high performance. Join us as we unpack the paper’s insights into its design and implementation, its real-world applications within Amazon, and the fascinating trade-offs between performance and durability.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde
The Google File System
9 oct 2024· 10-Minute System Design
In this 10-minute episode, we explore the Google File System (GFS), a scalable, fault-tolerant distributed file system designed for Google’s vast data needs. Built on commodity hardware, GFS ensures high performance for many clients. We’ll cover key design principles like handling frequent component failures, large file operations, and atomic appends. We’ll also dive into its architecture—featuring a master server for metadata management and chunkservers for storage—along with data handling, fault tolerance, and real-world performance benchmarks.
- Escuchar Escuchar de nuevo Continuar Reproduciendo...
- Escuchar más tarde Escuchar más tarde

Episodios

What Are Neural Networks and How Do They Work?

How Meta Trains AI Models at Scale

How Netflix Streams High-Quality Video

How Apple Stores Billions of Data in iCloud

How Uber Shows Nearby Drivers Quickly and Reliably

How Instagram Scaled to 2.5 Billion Users

How Facebook Scaled Memcached

Spanner: Google's Globally-Distributed Database

Redis Distributed Lock

Hadoop: Yahoo's Distributed File System

Chubby: Google's Distributed Lock Service

Bigtable: Google's Distributed Storage System

Cassandra: A Decentralized Structured Storage System from Facebook

MapReduce: How Google Simplifies Large-Scale Data Processing

Dynamo: Amazon’s Highly Available Key-Value Store

The Google File System