Episodes
-
Welcome back to the Airflow Podcast.
This week, we met up with Ben Wisegarver, a staff data scientist at Reddit who runs their data warehousing and data engineering functions.
Reddit users generate petabytes of data every day that needs to be processed, stored, and analyzed by a wide breadth of backend services. Our conversation with Ben touches on everything from Airflow as a tool for career mobility across the data stack to scaling out a self-service data architecture across many teams.
For folks interested, our team at Astronomer is growing rapidly and we're on the hunt for new folks to join in a variety of different roles. If you're passionate about Airflow and interested in building the future of data engineering, please get in touch. You can check our current job postings at careers.astronomer.io, but we're constantly updating our listings to accommodate new hiring needs. Please feel free to email me directly at [email protected] if you're passionate about what we're doing and think you'd be a good addition to the team.
Mentioned Resources:
Careers: https://careers.astronomer.io
Guest Profile:
Ben Wisegarver: https://www.linkedin.com/in/ben-wisegarver-54566576 -
Welcome back to the Airflow Podcast.
This week, we met up with Albert Franzi and Carlos Escura from Typeform. Typeform is a tool that allows you to build beautiful interactive forms that you can use for a wide variety of use cases, including customer surveys, employee engagement, product feedback, and market research to name a few. In our conversation, we discussed Airflow as a tool for GDPR compliance, the concept of self-service data and how it allows your data operations team to function as a data platform team, and some of the more specialized infrastructure tooling that the Typeform team has built out to support their internal teams.
For folks interested, our team at Astronomer is growing rapidly and we're on the hunt for new folks to join in a variety of different roles. If you're passionate about Airflow and interested in building the future of data engineering, please get in touch. You can check our current job postings at careers.astronomer.io, but we're constantly updating our listings to accommodate new hiring needs. Please feel free to email me directly at [email protected] if you're passionate about what we're doing and think you'd be a good addition to the team.
Mentioned Resources:
Dag Factory: https://github.com/ajbosco/dag-factory
Astronomer Careers: https://careers.astronomer.io
Guest Profiles:
Albert Franzi: https://www.linkedin.com/in/albertfranzi/?originalSubdomain=es
Carlos Escura: https://www.linkedin.com/in/carlosescura/en-us/ -
Episodes manquant?
-
After a bit of a break, we're back with the third official episode bundle of The Airflow Podcast. In this batch, we'll get a little bit deeper with current Airflow users and maintainers on core fundamental concepts in data engineering, architectures for operating modern data platforms at scale, and the process of maintaining and operating Airflow, specifically as we go through the release process of Airflow 2.0.
This week, we met up with Brian de la Motte and Florian Hines at Netlify. Netlify provides an extremely popular toolset for building and deploying JAMstack sites. They provide hosting services, CI, DNS, authentication, and managed backend tools that help users run and operate static sites at scale. The team over there recently adopted Airflow to help decouple orchestration logic from a complex collection Spark jobs and are currently in the process of expanding their Airflow footprint to accommodate a broader group of interesting use-cases.
Disclaimer: we get a bit of a surprise about halfway through the episode when Brian tells us that they had recently signed up for Astronomer- we promise that it wasn't a planted ad :).
Please contact [email protected] if you'd like to get in touch regarding future episodes. Hope you enjoy!
Guest Profiles:
Brian de la Motte: https://www.linkedin.com/in/brian-de-la-motte/
Florian Hines: https://www.linkedin.com/in/florianhines/ -
This week, we linked up with Airflow release manager, core committer, and Astronomer platform engineer Ash Berlin-Taylor to discuss the Airflow 2.0 roadmap [1]. There is some great stuff in the works around performance, autoscaling, and usability that we're excited about. In this episode, Ash lends his thoughts on the design, implementation, and value-add around all of the upcoming features, including:
- The Knative Executor
- A modern and real-time UI
- A production-grade API
- Improved scheduler and webserver performance
- An official production Docker image for Airflow
We hope you enjoy! Please email [email protected] if you have thoughts on topics you'd like to see covered in future episodes.
Separately, some good folks from the Airflow community are running a user survey that will help collect some useful information around the Airflow UX. If you have five minutes to spare, filling out the following form will help the core Airflow committers to shape the project roadmap: https://forms.gle/XAzR1pQBZiftvPQM7
[1] https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+2.0 -
This week, we had the pleasure of meeting up with Jarek Potiuk, Principal Software Engineer at Polidea and Apache Airflow committer, to discuss his most recent contribution to the community, Airflow Breeze. Jarek deeply values developer productivity and realized while building a team of Airflow committers that, in order to open a PR on the project, passing unit tests and waiting for the CI build was a cumbersome process that could take up to a few hours. Breeze seeks to improve that experience for Airflow committers and lower the barrier-to-entry of contribution for folks that are new to the open-source community.
You can read more about Airflow Breeze here: https://www.polidea.com/blog/its-a-breeze-to-develop-apache-airflow/#the-apache-airflow-projects-setup -
This episode kicks off season 2 of The Airflow Podcast. In this next season, we'll focus on the future of Airflow and chat with leading members of the community to paint a picture of what's to come. We're pumped to be diving back into this project and look forward to the great conversations we have lined up.
This week, we chatted with James Malone, Product Manager of Google's Cloud Composer. James had some interesting things to say about open source at Google and where his team plans on contributing most to the project going forward.
As always, thanks for listening and please email [email protected] if you have any feedback or would like to be considered as a guest. -
This week, we met up with Ash Berlin-Taylor to discuss the recent 1.10 release, what it's like to be a release manager for an open source project, Airflow's bid to graduate from incubating status, and the next phase of Airflow project development.
As mentioned in our podcast intro, we at Astronomer are hiring Data Engineers who are passionate about contributing to open source and making Airflow great. Please shoot us an email at [email protected] if you're interested in hearing more about the fully-remote opportunity.
Check us out at www.astronomer.io if you're interested in seeing a demo of our platform. -
This time, we met up with WePay's Joy Gao to talk through her work on the RBAC components in the recent Airflow 1.10 release. We dove deep into what inspired her work and took some time to discuss what it's like to be a woman contributing to a predominately male open-source community. Hope you enjoy!
If you'd like to get started using Airflow in your org, check out our recently-launched Spacecamp program here: https://www.astronomer.io/spacecamp
Feel free to email me at [email protected] with any feedback or if you'd like to be considered as a guest! -
In this episode, we dove into the relationship between Airflow and Kuberenetes and interviewed Daniel Imberman, Senior Software Engineer at Bloomberg (1:30), and Greg Neiheisel, CTO here at Astronomer (37:31). Daniel has done most of the work on the Kubernetes executor for Airflow and Greg plans to take on a chunk of the development going forward, so it was really interesting to hear both of their perspectives on the project. Enjoy!
-
This week, we’ll examine conversations with both old guests and new to paint a comprehensive picture of Airflow’s pain points. While we still undoubtedly believe that Airflow is the future of ETL, it’s important to acknowledge that any incubating project will have issues, and bringing those issues to the forefront of the community’s attention will help shape the future of the project.
We’ll talk with Thomas La Piana (1:36), Data Engineer at OrderMyGear, Frank Hsu (14:20), Data Engineer at mines.io, and Alan Cruickshank (27:41), business insights and data manager at tails.com.
Check out our open-source library of Airflow plugins at github.com/airflow-plugins, and feel free to contribute anything that you've been working on!
If you're interested in being on the podcast or have any feedback on how you think we could make it better, shoot me an email at [email protected] -
On this episode, we linked up with Erik Bernhardsson (@erikbern), creator of Luigi and CTO of Better Mortgage. We chatted about everything from the motivations behind Luigi's creation and his current thoughts on Airflow- we hope you enjoy!
Check out:
- Erik's blog at erikbern.com
- Our open-source library of Airflow plugins at github.com/airflow-plugins
All podcast feedback is hugely appreciated- feel free to email me at [email protected] if you have any thoughts. -
In this episode, we dive into Airflow Best Practices and include longer portions of interviews with Alan Cruickshank (1:30), Business Insights and Data Manager at Tails.com, Chris Riccomini (7:27), Principal Software Engineer at WePay, and Bolke de Bruin(31:45), Head of Advanced Analytics Technology at ING. Hope you enjoy!
We're still working to get better at podcasting, so please send over any feedback to [email protected]. We really appreciate hearing what the community has to say, and your feedback is hugely helpful in making us better.
If you're interested in Astronomer Spacecamp, a guided Airflow development course, you can find more info on that here:
https://www.astronomer.io/blog/announcing-astronomer-spacecamp/
We also launched our Managed Airflow on Product Hunt last week- you can check that out here:
https://www.producthunt.com/posts/apache-airflow-on-astronomer
Thanks so much for listening! -
Episode 2 of The Airflow Podcast is here to discuss six specific use cases that we’ve seen for Apache Airflow. Here’s the lineup:
Patrick Atwater (@patwater), Water Data Projects Manager at ARGO Labs: 2:03-5:35
Maksime Pecherskiy (@mrmaksimize), CDO of San Diego: 5:35-23:06
Scott Halgrim (@shalgrim), Data Engineer at Zapier: 23:06-27:27
Bolke de Bruin (@bolke2028), Head of Advanced Analytics at ING: 27:27-39:46
Chris Riccomini (@criccomini), Principal Software Engineer at WePay: 39:46-54:20
Ben Gregory (@benbeingbin), Data Engineer (and noted craft soda enthusiast) at Astronomer: 54:20-1:14:38
Contribute to our open-source library of Airflow plugins at github.com/airflow-plugins
Contact us at www.astronomer.io if you’re interested in Spacecamp: A guided development program to get your team up and running on Airflow. -
For the first episode of the Airflow Podcast, we met up with Maxime Beauchemin, creator of Airflow, to explore the motivations behind its creation and the problems it was designed to solve. We asked Maxime for his definition of Airflow, the design principles behind hook/operator use, and his vision for the project.
Speaker list:
Pete DeJoy - Product at Astronomer
Viraj Parekh - Data Engineer at Astronomer
Maxime Beauchemin - Software Engineer at Lyft, creator of Airflow
Talk mentioned at the end of the podcast- Advanced Data Engineering Patterns with Apache Airflow: http://www.ustream.tv/recorded/109227704
Maxime's Blog: https://medium.com/@maximebeauchemin -
A sneak peek at our upcoming podcast about Apache Airflow.
Featured in this clip (in order of appearance):
Pete DeJoy - Product Specialist at Astronomer
Patrick Atwater - Water Data Projects Manager at ARGO Labs
Maksime Pecherskiy - Chief Data Officer of the City of San Diego
Bolke de Bruin - Head of Advanced Analytics at ING