A curated list of Site Reliability and Production Engineering resources.
A curated list of Chaos Engineering resources.
A collection of postmortem templates
A role-playing game for incident management training
Calculate how much downtime should be permitted in your Service Level Agreement or Objective
A collection templates ported from the SRE Workbook
A list of common Disaster Recovery (DR) scenarios for software companies
Geo-Distributed Infrastructure Emulation using Traffic Shaping
A Go package with read-only operations for determining the Out-Of-Memory (OOM) status of a process on Linux
Deterministic Subsetting as defined in the SRE book