Episodi
-
"It doesn't really matter to me how much resiliency you say you've built into your system, if you can't demonstrate the same rigor applied to how you think about humans and how they interact with the system."
In this episode, Matt Stine, author of Migrating to Cloud-Native Architectures (O'Reilly) thinks about how resiliency for long-lived, critical systems spans domains, includes humans and software, and how communication is key to getting distributed developers to embrace rigor.
Watch the interview: https://www.youtube.com/watch?v=aL-_pYrAvc0
Read Matt's book: https://www.oreilly.com/library/view/migrating-to-cloud-native/9781492047605/
Find show notes: https://www.pagerduty.com/resources/podcast/
-
Did you know there's a PagerDuty plug-in for Backstage? Learn more about what Backstage is, the PagerDuty plug-in, and what's new from Head of Product for Backstage at Spotify, Meg Watson, and PagerDuty Developer Advocate, Tiago Barbosa.
This interview was recorded just before BackstageCon and KubeCon in November 2023, but the next BackstageCon is coming up next month in Paris as a day zero event before KubeCon Europe. Stop by the PagerDuty booth to learn more!
PagerDuty plugin for Backstage documentation: https://pagerduty.github.io/backstage-plugin-docs/
Watch the interview: https://www.youtube.com/watch?v=Zr0ZMH8Z6P8
Register for the March 2024 BackstageCon at KubeCon Europe: https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/co-located-events/backstagecon/
Read the Spotify Marketplace for Backstage announcement: https://backstage.spotify.com/blog/introducing-spotify-marketplace-for-backstage/
Watch the October Backstage Roadmap webinar: https://www.youtube.com/watch?v=XgTMH18Q18U
-
Episodi mancanti?
-
"If you want to have a resilient system system in the broader sense of the people and the technology working together, you also have to create an environment in which you can be resilient."
In this episode, Sam Newman touches on how psychological safety, planning versus practice, chaos engineering, and more all play into how organizations build resilience.
Resources:
Sam's website: https://samnewman.io/
Watch the interview here: https://www.youtube.com/watch?v=jQ9fQk5odyI
Find more show notes here: https://www.pagerduty.com/resources/podcast/
-
APIs are foundational to scaling operational efficiency. In this episode, you'll learn more from PagerDuty Product Manager, Nakul Bhagat, about the different types of APIs supported by PagerDuty, why they matter, and what's new!
In the episode refer to a few things, so here are handy links:
- Developer API documentation: https://developer.pagerduty.com/
- An earlier livestream on API scopes: https://youtu.be/Nersu9KbvA4?feature=shared
- A survey on APIs in the PagerDuty community forums: https://community.pagerduty.com/challenges/339
You can watch the interview here: https://www.youtube.com/watch?v=sqxvW-xqL7k
-
What are the secrets to successfully adopting PagerDuty? Dormain sits down with Senior Manager of Customer Success Engineering, Matt Linebarger to get his list of things he wished every PagerDuty user knew.
Watch the interview: https://www.youtube.com/watch?v=N5BRQWsv2iU
Find show notes: https://www.pagerduty.com/resources/podcast/
-
On the heels of the public beta opening for AI-generated runbooks in Runbook Automation, we asked Jake Cohen from product management about how this is different from generating code with something like chatGPT or various AI-powered code completion tools available. We get into prompt engineering, managing output quality, and privacy and security concerns. You can try out generating your first runbook in the new Runbook Automation trial here: https://www.pagerduty.com/sign-up/runbook-automation/
https://youtu.be/6-LMt-i5xq8 -
In honor of National Preparedness Month, Dormain talks to Jason Flint, Senior Manager of Workspaces and Crisis Response, who recently authored a Crisis Response Management Operations Guide (https://response.pagerduty.com/crisis/crisis_intro/). We discuss the going beyond âcheckboxâ firedrills, the value in cross-functional planning, and engaging workforces to improve preparedness
https://youtu.be/wAcs66NdPGQ -
In this episode, Donnie breaks down where ITIL came from and where itâs starting to go, and why thatâs useful for teams that are trying to adopt DevOps practices in ITIL-oriented organizations. Donnie gives some great examples of building empathy and bringing the ITIL teams along for automating changes and decentralizing Sev 2 incident management. He also lays out his core philosophies on Platform Engineering and how to justify the effort.
Platify Insights: https://platifyinsights.com/
Watch the interview: https://youtu.be/5oF42wItsSM
Find full show notes: https://www.pagerduty.com/resources/podcast/
-
In this episode, Mitra shares a bunch of valuable insights in how to successfully adopt generative AI, from selecting use cases that deliver value, having foundational data infrastructure in place, to having design and privacy guidelines. Grab a paper and pen and take some notes! Dormain included a bunch of them in this article on "7 Habits of Successful Generative AI Adopters."
https://youtu.be/xEU0Ltxp99M -
âAll data is valuable when itâs generated, but the question is how fast does the value of the data decay?â
In this episode, we hear from James Urquhart, author of "Flow Architectures" (O'Reilly), on how flow architectures follow patterns that move data through distributed systems from producers to consumers. By unblocking data from moving faster, organizations have the opportunity to generate more value from that data. We also discuss the role of standards, how LLMs and AI will exacerbate the need for standards, and how being able to take action quickly increases the realized value of data. We also touch on how resiliency benefits from being able to gain quick feedback and actioning on data.
https://youtu.be/wqnTX16Xow0 -
In this episode, Martin Van Son provides a simplified definition of platforms in this context: a way for internal users to request anything from environments to deployments. The platform engineering comes in because someone needs to own stitching together and automating away all the complexity involved to complete that action. In the end, both the consumers and the creators save time. Furthermore, platform engineers have an opportunity to encode best practices and cost saving measures that are often forgotten when users are left to their own devices.
https://youtu.be/90AEMOQSEC4Find full show notes: https://www.pagerduty.com/resources/podcast/
-
In this episode, Heather Hinton describes how security teams can evolve away from spending cycles on âsilly little jobsâ and scouring multiple sources to try to identify the kinds of unplanned interrupt work that needs to be dealth with urgently. Instead, they can complete projects faster and take on more because on-call rotations are spent getting work done (with the occasional interruption) instead of âseekingâ for the interrupt work. We also discuss how this fits in with encouraging broader employees to participate in security hygiene practices.
Watch the interview: https://youtu.be/l9mjgXD0564
Find full show notes of episodes: https://www.pagerduty.com/resources/podcast/
-
âAIOpsâ is a term some love to hate, but what makes it useful? In this episode, Heath Newburn breaks down the three things to look for in an AIOps solution: reduce noise, create context, and reduce toil. He also explains the challenges with domain-specific approaches, versus domain-agnostic approaches to AIOps. But even within that approach, Heath warns of âgotchasâ in rules âtech debtâ, data formats, and overall long implementation times.
https://youtu.be/mzaqO-lOIfsFind the full show notes at: https://www.pagerduty.com/resources/podcast/
-
Long gone are the days when data is batch loaded into a data warehouse for business intelligence reports that are looked at periodically and if something is broken, a few internal people would have to wait. Today, data pipelines are âinfinitely more complicatedâ, with more sources from cloud services to on premises systems, and supporting data applications that are critical parts of a businessâ ecosystem.
In this episode, Dormain Drewitz sits down with Manuraj Rajasekharan, Senior Director of Analytics and Data Engineering at PagerDuty, and James Zhao, Senior Product Manager at Snowflake to discuss how DataOps has evolved and how it will be essential to support large language models (LLMs) in production.
https://youtu.be/I5TzvstL8tM
https://medium.com/@ravikuma2003/unlocking-the-potential-of-snowflake-alerts-pagerduty-operations-cloud-enhancing-data-operations-66511a25b8af -
Generative AI is a rapidly-evolving ecosystem with a lot of attention. In this episode, Dormain Drewitz asks Sriram Subramanian about the main challenges to responsibly implement generative AI, including content thatâs harmful, inaccurate or violates privacy or security standards. Sriram discusses Microsoftâs 6 tenets to responsible generative AI, as well as the notion of shared responsibility between platform providers and foundational LLMs and the developers and data engineers building on top. Sriram also answers questions about where to get started safely with generative AI and shares his framework for identifying opportunities to add value.
-
A software engineer, a data scientist, and a product manager walk into a generative AI project⊠Using technology that didnât exist a year ago, they identify a customer pain point they might be able to solve, build on teammatesâ experience with building AI features, and test how to feed inputs and constrain outputs into something useful. Hear the full conversation here.
https://youtu.be/2mEVY6rmX3MRead the article referred to in the episode here: https://thenewstack.io/llms-and-incident-response-it-starts-with-summarization/
-
In this episode, Hadijah Creary breaks down what Customer Service teams are versus Customer Success teams. What do they care about? How can they each get more proactive to improve the overall customer experience? And why is it PagerDuty Customer Service Operations and not Customer Success Operations?
https://youtu.be/ve1RT4a_Udo -
In this, the inaugural episode of âThe Unplanned Showâ, Dormain talks to Damon Edwards about the âcapacity conundrumâ where everyone is working so hard, but everything takes too long and costs too much. We talk about the âcoordination overheadâ costs of getting unplanned work done, how generative AI is both adding complexity and offers to accelerate automating as much as you can, and four steps to creating capacity.
https://www.youtube.com/watch?v=UYjx49Sz1CI&t