#073 Data Engineering At LinkedIn Case Study – Plumbers of Data Science – Podcast

Episodit

#26 Data Modeling is F***ing Easy!
23 syys· Plumbers of Data Science
In this episode of the Plumbers of Data Science podcast, I’m sharing my thoughts on why data modeling isn’t as complicated as people make it out to be. You hear about courses and tutorials that stretch for hours—but is it really that hard?
I’ll break down the two main things you need to focus on when modeling data and explain why, once you’ve got those down, the rest falls into place.
- Kuuntele Kuuntele uudestaan Jatka Soittaa...
- Kuuntele myöhemmin Kuuntele myöhemmin
#25 His Career Started With a Bootcamp & Now He Helps Others Succeed - Hero Talk w/ Mezue Obi-Eyis
20 syys· Plumbers of Data Science
In this Hero Talk episode, I talk with Mezue, a seasoned Data Engineer with expertise in Azure Databricks Data Engineering. We cover his journey from Electrical Engineering to Data Engineering and discuss the key skills, like Python, SQL, and Spark, that are essential in the field.Mezue also shares his experience running an Azure Databricks bootcamp and offers advice on how to break into Data Engineering, especially in Cloud environments. We also touch on the challenges of finding junior roles and how to stand out by working on practical projects.
- Kuuntele Kuuntele uudestaan Jatka Soittaa...
- Kuuntele myöhemmin Kuuntele myöhemmin
Puuttuva jakso?

Paina tästä ja päivitä feedi.
#24 Dirty Data & Data Cleaning - Hero Talk with "The Classification Guru" Susan Walsh
16 syys· Plumbers of Data Science
In this Hero Talk episode, I chat with Susan Walsh, the “Classification Guru,” known for her expertise in cleaning and classifying messy data.
We dive into her unexpected journey into the data world, starting with a spend analytics job, and how that led to her founding her own business focused on dirty data. Susan shares the unique challenges businesses face with poor data quality, explaining why 99.9% of data problems are actually people problems.
We also explore practical ways to deal with these issues, such as finding those "crappy" data cleaning jobs to gain experience, and the importance of consistent data maintenance to prevent future headaches. From addressing dirty CRM systems to battling fraud, Susan’s stories highlight how critical clean data is for business success.
- Kuuntele Kuuntele uudestaan Jatka Soittaa...
- Kuuntele myöhemmin Kuuntele myöhemmin
#23 A Deep Dive Into APIs, IoT, and Data Storage - Hero Talk with Paolo Lulli
9 syys· Plumbers of Data Science
In this Hero Talk episode, I sit down with Paolo Lulli, an experienced Data Engineer, to explore some of the core challenges and decisions in API development and data management. We dive deep into the debate between serverless infrastructure versus traditional servers, discussing the pros and cons of both approaches, particularly in the context of scalability, cost, and maintenance.
Paolo also shares his hands-on experience with time series databases, explaining their advantages in handling massive amounts of data from IoT devices. We delve into vendor lock-in issues, highlighting how relying too heavily on cloud providers like AWS or Azure can impact long-term flexibility.
- Kuuntele Kuuntele uudestaan Jatka Soittaa...
- Kuuntele myöhemmin Kuuntele myöhemmin
#22 Why testing data pipelines can be so challenging - and how to tackle it
6 syys· Plumbers of Data Science
In this episode of the Plumbers of Data Science podcast, I’m diving into why testing can be so challenging for data engineers. The inspiration for this topic actually came from one of my recent Coaching sessions, where the question of test-driven development (TDD) came up during a Q&A. It stuck with me, so I thought it would be a great topic to dive deeper into.
I’ll explain the key benefits of TDD, like improved code quality and easier refactoring, and why, despite its advantages, it’s not always widely adopted—especially in fast-paced environments where time constraints dominate. We’ll also talk about the specific challenges data engineers face with TDD, such as handling large, unpredictable data, integrating with external systems, and adapting to ever-changing data.
- Kuuntele Kuuntele uudestaan Jatka Soittaa...
- Kuuntele myöhemmin Kuuntele myöhemmin
#21 Is This the Synthetic Data Revolution?! Hero Talk with Mario Scriminaci from Mostly AI
2 syys· Plumbers of Data Science
In this Hero Talk episode, we dive deep into the fascinating world of synthetic data, a critical tool for development, testing, and training Machine Learning models. Joining me is Mario Scriminaci, Chief Product Officer at Mostly AI, who shares his expertise on how synthetic data can revolutionize the way we handle sensitive information, particularly in the context of privacy regulations like GDPR and CCPA.
We discuss the real-world applications of synthetic data, how it differs from traditional mock data, and its potential to drive innovation in AI and ML development. Mario also introduces Mostly AI's cutting-edge tools, highlighting how they make it easier than ever to generate realistic, privacy-safe datasets.
- Kuuntele Kuuntele uudestaan Jatka Soittaa...
- Kuuntele myöhemmin Kuuntele myöhemmin
#20 Bootcamps vs Coaching
30 elo· Plumbers of Data Science
In this episode of the Plumbers of Data Science podcast, I’m diving into the debate between bootcamps and coaching programs, especially for those looking to advance in Data Engineering.
I’ll break down the pros and cons of each approach - from the structured, intensive nature of bootcamps to the personalized, flexible support of coaching, I’ll share insights to help you choose the right path for your career. I’ll also discuss the experiences of my current coaching students and what I’m focusing on to help them achieve their goals.
- Kuuntele Kuuntele uudestaan Jatka Soittaa...
- Kuuntele myöhemmin Kuuntele myöhemmin
#19 Why your data and goals matter more than tools!
23 elo· Plumbers of Data Science
In this episode of the Plumbers of Data Science podcast, I’m diving into what truly matters when building data platforms and pipelines.
As engineers, it’s easy to get caught up in the latest tools, but real success starts with understanding your data sources and defining clear goals. I’ll walk you through the key questions to ask, from data retention to processing speeds and user needs.
- Kuuntele Kuuntele uudestaan Jatka Soittaa...
- Kuuntele myöhemmin Kuuntele myöhemmin
#18 Why Apache Spark Is Such An Essential Skill - Hero Talk with Philipp Brunenberg
19 elo· Plumbers of Data Science
In this episode, we explore the essentials of learning and mastering Apache Spark. Joining me is Philip, an experienced Spark developer and educator, who shares his expert roadmap for becoming proficient in Spark. We discuss why Spark is a crucial tool for data engineers, how to set it up effectively, and the best approaches to start your Spark journey.
Philip also highlights the importance of understanding Spark's internals, deploying real-world applications, and optimizing performance. He walks us through his six-part roadmap, focusing on hands-on practice and building confidence through real-world projects. We also touch on key topics like the Scala vs. Python debate, Spark's role in machine learning, and how it stands against emerging tools like Beam.
- Kuuntele Kuuntele uudestaan Jatka Soittaa...
- Kuuntele myöhemmin Kuuntele myöhemmin
#17 The Future of Data Observability - Hero Talk with Ryan Yackel
12 elo· Plumbers of Data Science
In this Hero Talk episode, we explore the crucial topic of data observability, a field that has become essential for Data Engineers dealing with complex data pipelines. I am joined by my special guest Ryan Yackel from DataBand, who shares his insights and expertise on the subject.
Ryan delves into the concept of data observability and its significance for Data Engineers, addressing common challenges faced in monitoring and maintaining data pipelines. He explains how DataBand helps in monitoring and improving data reliability, ensuring that data flows smoothly from source to destination.
- Kuuntele Kuuntele uudestaan Jatka Soittaa...
- Kuuntele myöhemmin Kuuntele myöhemmin
#16 Should You Move to Germany for a Data Engineering Career?
9 elo· Plumbers of Data Science
In this episode of the Plumbers of Data Science podcast, I’m breaking down the real deal of working as a data engineer in Germany. Does it live up to the hype? Sure, we’ve got free education and solid health insurance, but what’s the actual cost of living here, and how much of your salary do you really take home after taxes?
I’ll walk you through the numbers—from what you can expect to earn, to the surprising deductions that quickly eat away at your paycheck. Plus, I’ll explain why companies in Germany struggle with high labor costs and how that impacts your wallet.
- Kuuntele Kuuntele uudestaan Jatka Soittaa...
- Kuuntele myöhemmin Kuuntele myöhemmin
#15 Personal Branding in Data - Hero Talk with Kate Strachnyi
5 elo· Plumbers of Data Science
In this Hero Talk episode, we delve into the fascinating world of personal branding in data with our special guest, Kate Strachnyi, founder of DataCated.Join us as Kate shares her vast expertise in personal branding, drawing from her experience as an author and LinkedIn Learning instructor. We discuss the nuances of building a personal brand, both intentionally and organically, and explore practical strategies for leveraging social media to increase your visibility and credibility in the data space.
- Kuuntele Kuuntele uudestaan Jatka Soittaa...
- Kuuntele myöhemmin Kuuntele myöhemmin
#14 The Secret Why Time Series Databases Are Awesome - Hero Talk with Jeff Tao
2 elo· Plumbers of Data Science
In this Hero Talk episode, we explore the dynamic world of time series data and time series databases with a special guest, Jeff Tao, founder and CEO of TD Engine.
Join us as Jeff shares his journey from designing smart devices to founding TD Engine, a leading time series database. We dive deep into the benefits and unique features of time series databases, practical use cases, and how they handle massive amounts of data generated by IoT devices, smart meters, and more.
- Kuuntele Kuuntele uudestaan Jatka Soittaa...
- Kuuntele myöhemmin Kuuntele myöhemmin
#13 From India to the U.S.: Becoming a Data Engineer at Toyota - Hero Talk with Ayan Tiwari
29 heinä· Plumbers of Data Science
In this Hero Talk episode, we dive into the inspiring journey of Ayan Tiwari, a Data Engineer at Toyota North America.
Join us as Ayan shares his remarkable transition from being an undergrad in civil engineering in India to working for a major company like Toyota in the U.S.
Ayan walks us through how he made this significant career switch, pursued his master’s degree in the U.S., and the fascinating projects he's currently working on.
- Kuuntele Kuuntele uudestaan Jatka Soittaa...
- Kuuntele myöhemmin Kuuntele myöhemmin
#12 Data Tools & Platforms: Why you should always be skeptical
26 heinä· Plumbers of Data Science
In this episode of the Plumbers of Data Science podcast, we explore why you should be skeptical of data platforms and tools. Using a LEGO Grogu set from Star Wars as an analogy, I reveal the hidden issues behind flashy exteriors: chaotic scaffolding, empty spaces, and missing features.I emphasize the importance of trying tools yourself, running benchmarks, and checking if they fit your use case. We also discuss the role of developer communities and frequent updates in improving these tools.Have you encountered over-promised solutions? Share your experiences and thoughts in the comments ;)
- Kuuntele Kuuntele uudestaan Jatka Soittaa...
- Kuuntele myöhemmin Kuuntele myöhemmin
#11 GenAI from a Data Engineer's perspective - Hero Talk with Vinoth Nageshwaran
22 heinä· Plumbers of Data Science
- Kuuntele Kuuntele uudestaan Jatka Soittaa...
- Kuuntele myöhemmin Kuuntele myöhemmin
#10 Why Excel should be a go-to tool for data professionals
19 heinä· Plumbers of Data Science
- Kuuntele Kuuntele uudestaan Jatka Soittaa...
- Kuuntele myöhemmin Kuuntele myöhemmin
#09 Real Talk on GenAI & Large Language Models - Hero Talk with Harpreet Sahota
15 heinä· Plumbers of Data Science
In this Hero Talk episode we dive into the exciting and evolving world of Generative AI and Large Language Models (LLMs) with a special guest, Harpreet Sahota.
Join us as Harpreet shares his extensive knowledge and experience as a seasoned data scientist. We explore the transformative potential of Generative AI, practical applications, and the challenges that come with integrating these advanced models into various industries.
- Kuuntele Kuuntele uudestaan Jatka Soittaa...
- Kuuntele myöhemmin Kuuntele myöhemmin
#08 Are Job Guarantees a Scam?
12 heinä· Plumbers of Data Science
- Kuuntele Kuuntele uudestaan Jatka Soittaa...
- Kuuntele myöhemmin Kuuntele myöhemmin
#07 Data Science Career AMA! - Hero Talk with Andrew Jones
5 heinä· Plumbers of Data Science
In this Hero Talk episode we dive into the world of Data Science careers with a special Ask Me Anything (AMA) session.
Join me as I welcome Andrew Jones, founder of the Data Science Infinity program. Andrew shares his journey from working at top tech companies such as PlayStation to founding his own academy. We discuss the current job market, the role of certifications, and how to build an effective resume and portfolio to stand out in a competitive field.
- Kuuntele Kuuntele uudestaan Jatka Soittaa...
- Kuuntele myöhemmin Kuuntele myöhemmin
Näytä enemmän

Episodit

#26 Data Modeling is F***ing Easy!

#25 His Career Started With a Bootcamp & Now He Helps Others Succeed - Hero Talk w/ Mezue Obi-Eyis

#24 Dirty Data & Data Cleaning - Hero Talk with "The Classification Guru" Susan Walsh

#23 A Deep Dive Into APIs, IoT, and Data Storage - Hero Talk with Paolo Lulli

#22 Why testing data pipelines can be so challenging - and how to tackle it

#21 Is This the Synthetic Data Revolution?! Hero Talk with Mario Scriminaci from Mostly AI

#20 Bootcamps vs Coaching

#19 Why your data and goals matter more than tools!

#18 Why Apache Spark Is Such An Essential Skill - Hero Talk with Philipp Brunenberg

#17 The Future of Data Observability - Hero Talk with Ryan Yackel

#16 Should You Move to Germany for a Data Engineering Career?

#15 Personal Branding in Data - Hero Talk with Kate Strachnyi

#14 The Secret Why Time Series Databases Are Awesome - Hero Talk with Jeff Tao

#13 From India to the U.S.: Becoming a Data Engineer at Toyota - Hero Talk with Ayan Tiwari

#12 Data Tools & Platforms: Why you should always be skeptical

#11 GenAI from a Data Engineer's perspective - Hero Talk with Vinoth Nageshwaran

#10 Why Excel should be a go-to tool for data professionals

#09 Real Talk on GenAI & Large Language Models - Hero Talk with Harpreet Sahota

#08 Are Job Guarantees a Scam?

#07 Data Science Career AMA! - Hero Talk with Andrew Jones