Episódios
-
In Episode 80 Niall Murphy talked about the need for SREs to be better at articulating the value of our work. In this episode I'm joined by ex-Googler and Engineering Director (SRE) at Culture Amp Artem Yakimenko about how we might achieve this.
We discuss both quantifiable and qualitative approaches including leveraging the untapped data in support tickets, customer sentiment and rankings, the relationship between finance and performance, the link between user design and performance, and so much more.
Books mentioned in the episode:
100 Things Every Designer Needs to Know About People
By Susan Weinschenk
https://www.amazon.com.au/Things-Every-Designer-Needs-People/dp/0321767535
You can find Artem on LinkedIn: https://www.linkedin.com/in/temikus/You can find the official Slight Reliability podcast website at: https://slightreliability.com/
You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_kiwi_sre
YouTube: https://www.youtube.com/c/SlightReliability
Instagram: https://www.instagram.com/slight_reliability/
TikTok: https://www.tiktok.com/@the_kiwi_sreThis episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.
-
In the world of SRE we constantly talk about defining SLOs, but what about evolving them over time? This week I chat with SRE Tech Lead Dom Finn about just that. We cover the relationship between reliability and user analytics, latency classes as a way to speak SLOs with business stakeholders, the role of NFRs and how the thresholds differ from SLOs, and much more.
Books mentioned in the episode:
The Beginning of Infinity: Explanations That Transform the World
By David Deutch
https://www.amazon.com.au/Beginning-Infinity-Explanations-Transform-World/dp/0143121359Turn The Ship Around!
By David Marquette
https://davidmarquet.com/turn-the-ship-around-book/You can find Dom on LinkedIn: https://www.linkedin.com/in/dom-finn/
You can find the official Slight Reliability podcast website at: https://slightreliability.com/
You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_kiwi_sre
YouTube: https://www.youtube.com/c/SlightReliability
Instagram: https://www.instagram.com/slight_reliability/
TikTok: https://www.tiktok.com/@the_kiwi_sreThis episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.
-
Estão a faltar episódios?
-
This week I talk about the impact of SaaS-first technology strategies on the work of an SRE. I pose questions about observability, ownership, on-call, and how much control we have over reliability.
You can find the Bleeding Tech blog on Medium: https://medium.com/@stownshend
You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_kiwi_sre
YouTube: https://www.youtube.com/c/SlightReliability
Instagram: https://www.instagram.com/slight_reliability/
TikTok: https://www.tiktok.com/@the_kiwi_sre -
This week I chat with Dan Slimmon about applying the approach doctors use to treat patient symptoms during incident response.
You can find Dan's blog at https://blog.danslimmon.com/ or connect with him on LinkedIn here: https://www.linkedin.com/in/danslimmon/
You can find the official Slight Reliability podcast website at: https://slightreliability.com/
You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_kiwi_sre
YouTube: https://www.youtube.com/c/SlightReliability
Instagram: https://www.instagram.com/slight_reliability/
TikTok: https://www.tiktok.com/@the_kiwi_sreThis episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.
-
This week I hear about all things Kubernetes from Komodor CTO and co-founder Itiel Shwartz. We chat about the promise that was made when Kubernetes first entered the industry, the challenge of getting developers engaged and capable of working in Kubernetes, my hate/hate relationship with Helm but its important contribution to the Kubernetes project, Kubernetes observability, and so much more.
You can find the Kubernetes for Humans podcast here:
https://komodor.com/blog/the-kubernetes-for-humans-podcast/
Or find out more about Komodor here:
https://komodor.com/
Or find Itiel on LinkedIn: https://www.linkedin.com/in/itiel-shwartz-18542853/
You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_kiwi_sre
YouTube: https://www.youtube.com/c/SlightReliability
Instagram: https://www.instagram.com/slight_reliability/
TikTok: https://www.tiktok.com/@the_kiwi_sre
This episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account. -
This week I sit down and have a discussion with Amin Astaneh (from Certo Modo) about CI/CD. We cover the power of the standard change as a way to navigate ITIL while still implementing DevOps practices, what to monitor to make your CI/CD observable, single piece flow, testing in production, and so much more.
You can find Amin on his company website https://certomodo.io, LinkedIn: https://www.linkedin.com/in/aminastaneh/ and Twitter: https://twitter.com/aastaneh
You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_kiwi_sre
YouTube: https://www.youtube.com/c/SlightReliability
Instagram: https://www.instagram.com/slight_reliability/
TikTok: https://www.tiktok.com/@the_kiwi_sre
This episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account. -
"Environment issues are just incidents that happened to occur in a non-production environment"... so why do we treat them so differently?
In this first episode of the 2024 season I reflect on how we handle incidents in non-prod environments.
(Note: Had a few issues with noise suppression in OBS Studio cutting off the start of some words, will sort it for the next episode)
You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_kiwi_sre
YouTube: https://www.youtube.com/c/SlightReliability
Instagram: https://www.instagram.com/slight_reliability/
TikTok: https://www.tiktok.com/@the_kiwi_sre -
This week I speak with co-author of the original SRE book + the SRE workbook, and renowned speaker Niall Murphy.
We chat about the state of SRE in the current macro-economic climate and how we're not yet doing a very good job at articulating the value of SRE to leaders, the relationship that velocity and reliability have, the value of new features versus reliability improvements, and *much* more.
You can find Niall at:
LinkedIn: https://www.linkedin.com/in/niallm/
X: https://twitter.com/niallm
Website: https://relyabilit.ie/
(and his company Stanza: https://www.stanza.systems/)
You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
X: https://twitter.com/the_kiwi_sre
Instagram: https://www.instagram.com/slight_reliability/ -
Paige Cruz (from Chronosphere) is back. This week we discuss sampling. What is sampling? Why do it? What kinds of sampling are there?
You can check out Chronosphere's cloud native observability platform here: https://chronosphere.io/
You can find Paige on:
LinkedIn: https://www.linkedin.com/in/paigerduty/
X: https://twitter.com/paigerduty
You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
X: https://twitter.com/the_kiwi_sre
Instagram: https://www.instagram.com/slight_reliability/ -
This week Valeska Victoria returns to share some of her experiences working as an SRE at eBay.
We look at the cascading effect of production issues in complex integrated environments (how there's often no single root cause), developer literacy of how infrastructure works, the importance of ownership and accountability of reliability, and much more.
You can find Valeska on:
LinkedIn: https://www.linkedin.com/in/valeska-victoria/
You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
X: https://twitter.com/the_kiwi_sre
Instagram: https://www.instagram.com/slight_reliability/ -
This week I chat with Ankit Jain from aviator.co about developer experience.
We define developer experience and developer productivity, and how this applies to SRE. We discuss the growing expectation on developers and how this leads to frustration and burnout. We also explore how to measure developer experience and how to start working to make improvements.
You can check out Aviator's developer experience platform here: https://www.aviator.co/
You can find Ankit on:
LinkedIn: https://www.linkedin.com/in/ankitjaindce/
You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
X: https://twitter.com/the_kiwi_sre
Instagram: https://www.instagram.com/slight_reliability/ -
A brief mid-week update on my changing circumstances and the future of the podcast.
-
This week I had the privilege of interviewing Liz Fong-Jones from honeycomb.io about DevRel, Developer Advocacy, and how that applies to SRE.
We discuss the difference between Developer Relations (DevRel) and Developer Advocacy, how Liz got into advocacy, how DevRel helps companies and the community, and some tips on how to get traction with SRE practices in your organisation.
You can check out Honeycomb's observability platform here: https://www.honeycomb.io/
You can find Liz on:
LinkedIn: https://www.linkedin.com/in/efong/
Website: https://www.lizthegrey.com/ (all her social/links are here)
You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
X: https://twitter.com/the_kiwi_sre
Instagram: https://www.instagram.com/slight_reliability/ -
This week I had the honour of chatting with Steve McGhee (former Google SRE, current Google Reliability Advocate, and co-author of Enterprise Roadmap to SRE).
We discuss the evolution of SRE from where it began at Google and how it is being adopted by enterprises around the world now (and why this is happening). We talk about getting leadership support and how we get reliability taken seriously, the lies we tell ourselves to justify incidents and issues, leveraging transformation projects to bring SRE to life, how SLOs can act as the fulcrum between dev and ops, the fallacy of the pyramid model of reliability... and so much more.
You can find Steve at on:
LinkedIn: https://www.linkedin.com/in/stevemcghee/
X: https://twitter.com/stevemcghee
You can find Steve's book "Enterprise Roadmap to SRE" here: https://sre.google/resources/practices-and-processes/enterprise-roadmap-to-sre/
Steve also mentions the book "A Seat at the Table": https://itrevolution.com/product/a-seat-at-the-table/
You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
X: https://twitter.com/the_kiwi_sre
Instagram: https://www.instagram.com/slight_reliability/ -
This week on Slight Reliability Stephen discusses observability vendor lock-in. What is it? What does OpenTelemetry do to help? What areas are yet to be solved?
You can find the official Slight Reliability podcast website at: https://slightreliability.com/
You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_kiwi_sre
YouTube: https://www.youtube.com/c/SlightReliability
Instagram: https://www.instagram.com/slight_reliability/
TikTok: https://www.tiktok.com/@the_kiwi_sre -
This week we sit down and talk about SLOs with CPO and co-founder of Nobl9 Brian Singer.
We talk about the importance of reviewing operational effectiveness, getting buy in from leadership, using SLOs to reduce noise, how to implement SLOs within different cultures and structures, the parallels between security and reliability... and much more.
You can check out Nobl9's reliability and SLO platform here: https://www.nobl9.com/
You can find Brian on LinkedIn: https://www.linkedin.com/in/briantsinger/
You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_kiwi_sre
Instagram: https://www.instagram.com/slight_reliability/ -
This week Stephen chats with Valeska Victoria about her time working as an SRE at eBay.
Valeska shares her data driven approach to SRE, having a voice as a less experienced engineer, handling incidents under high pressure, leveraging large language models to rapidly find the information you need during an incident, and much more.
You can check out PromptOps here: https://www.promptops.com/
You can find Valeska on LinkedIn: https://www.linkedin.com/in/valeska-victoria/
You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_kiwi_sre
Instagram: https://www.instagram.com/slight_reliability/ -
This week Stephen chats with Dr. Vlad Ukis about his journey discovering, and then implementing SRE practices at Siemens Healthineers (which led to him writing a book).
They discuss how the evolution of infrastructure necessitates a shift in how we operate, the power of selling SRE practices, the SRE infrastructure used to build SLOs and reliability capabilities, how he implemented SLOs, and much more.
You can find Vlad's book "Establishing SRE Foundations" here: https://www.amazon.com/Establishing-Foundations-Step-Step-Organizations/dp/0137424604
You can find Vlad on LinkedIn: https://www.linkedin.com/in/dr-vladyslav-ukis-5172ba32/
You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_kiwi_sre
Instagram: https://www.instagram.com/slight_reliability/ -
Amin Astaneh (from Certo Modo) is back to discuss his experience working as a production engineer (SRE equivalent) at Meta.
Stephen and Amin discuss what it's like interviewing for big tech, "you build it, you own it", different SRE engagement models, SRE at different sizes of organisation, socialising your SRE success as a way to get traction, and so much more.
You can find Amin on his company website https://certomodo.io, LinkedIn: https://www.linkedin.com/in/aminastaneh/ and Twitter: https://twitter.com/aastaneh
The books Amin mentions are...
The Practice of Cloud System Administration: https://www.oreilly.com/library/view/practice-of-cloud/9780133478549/
Leading Change:
https://www.kotterinc.com/bookshelf/leading-change/
You can find the official Slight Reliability podcast website at: https://slightreliability.com/You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_kiwi_sre
Instagram: https://www.instagram.com/slight_reliability/ -
This week Stephen talks to Praveen Kasam from Diconium Digital Solutions about how he led SRE transformations.
Praveen shares his experience transitioning from development to SRE and how leveraging automation and bringing application knowledge to the ops team provided quick wins. He also covers how he later applied SRE concepts to uplift the wider organisation. If you are out there looking for advice on how to implement SRE in your organisation, this is the episode for you.
You can find Praveen at:
LinkedIn: https://www.linkedin.com/in/kasampraveen/
You can find the official Slight Reliability podcast website at: https://slightreliability.com/
You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
X: https://twitter.com/the_kiwi_sre
YouTube: https://www.youtube.com/c/SlightReliability
Instagram: https://www.instagram.com/slight_reliability/
TikTok: https://www.tiktok.com/@the_kiwi_sre - Mostrar mais