--- description: Root Cause Analysis workflow for Verkada Access Control incidents. Parallel data gathering from Slack, Linear, GitHub, and Datadog, timeline construction, Five Whys analysis, and Notion-formatted RCA output. category: verkada --- flowchart TD _HEADER_["
Writing RCA Reports
Root Cause Analysis workflow for Verkada Access Control incidents. Parallel data gathering from Slack, Linear, GitHub, and Datadog, timeline construction, Five Whys analysis, and Notion-formatted RCA output.
"]:::headerStyle classDef headerStyle fill:none,stroke:none subgraph _MAIN_[" "] %% Phase 1: Trigger subgraph Trigger["Phase 1: Incident Assigned"] INCIDENT[Incident Assigned
via Slack or Linear] --> CHECKLIST[Copy RCA Checklist
Track Progress] end %% Phase 2: Parallel Data Gathering subgraph Gather["Phase 2: Parallel Data Gathering"] CHECKLIST --> SLACK[Slack Threads
eng-crisis + team channels] & LINEAR[Linear Tickets
Eng + Support] & GITHUB[GitHub PRs
Causal + Revert + Fix] & DATADOG[Datadog
Traces + Dashboards] SLACK & LINEAR & GITHUB & DATADOG --> RAW[Raw Data Collected] end %% Phase 3: Timeline Construction subgraph Timeline["Phase 3: Timeline Construction"] RAW --> MILESTONES[Identify Milestones
7 Key Events] MILESTONES --> TZ_CONVERT[Convert All Timestamps
to PST] TZ_CONVERT --> VERIFY_DATES{Dates Verified?} VERIFY_DATES -->|No| TZ_FIX[Fix Timestamps
via Python] TZ_FIX --> TZ_CONVERT VERIFY_DATES -->|Yes| TIMELINE_TABLE[Build Timeline Table] end %% Phase 4: Root Cause Analysis subgraph Analysis["Phase 4: Root Cause Analysis"] TIMELINE_TABLE --> ROOT_CAUSE[Technical Root Cause] TIMELINE_TABLE --> CONTRIBUTING[Contributing Factors] ROOT_CAUSE & CONTRIBUTING --> FIVE_WHYS[Five Whys Analysis
Min 3 Levels Deep] FIVE_WHYS --> BLAMELESS[Blameless Root Cause
Statement] end %% Phase 5: Impact and Response subgraph Impact["Phase 5: Impact and Response Assessment"] BLAMELESS --> CUST_IMPACT[Document Customer Impact
Scope + Affected Orgs] BLAMELESS --> WENT_WRONG[What Went Wrong?
Process Failures] BLAMELESS --> WENT_RIGHT[What Went Right?
Response Strengths] CUST_IMPACT & WENT_WRONG & WENT_RIGHT --> RESPONSE[Response Assessment
Complete] end %% Phase 6: Action Items subgraph Actions["Phase 6: Action Items + Linear Setup"] RESPONSE --> IMMEDIATE[Immediate Actions
7 day deadline] RESPONSE --> MIDTERM[Mid-term Actions
30 day deadline] RESPONSE --> LONGTERM[Long-term Actions
90 day deadline] IMMEDIATE & MIDTERM & LONGTERM --> CREATE_PROJECT[Create Linear Project
Under RCA Initiative] CREATE_PROJECT --> CREATE_ISSUES[Create Linear Issues
with Due Dates] CREATE_ISSUES --> LINK[Link Issues to
RCA Initiative] end %% Phase 7: Document Assembly subgraph Document["Phase 7: Document Assembly"] LINK --> TEMPLATE[Load RCA Template
14 Sections] TEMPLATE --> POPULATE[Populate All Sections
Insert Raw Links] POPULATE --> FILL_META[Fill Notion DB Fields
Severity + Times in UTC] FILL_META --> REVIEW{User Approves?} REVIEW -->|No| REVISE[Revise Draft] REVISE --> POPULATE REVIEW -->|Yes| POST_NOTION[Post to Notion
RCA Database] end click INCIDENT "#" "**Incident Assigned**\nRCA writing begins when an incident is assigned, usually by Han Hong.\n- Check `#eng-crisis` for the assignment thread\n- Note the due date from the assignment\n- Identify the originating ticket ID" click CHECKLIST "#" "**Copy RCA Checklist**\nCopy the 8-step checklist and track progress:\n- Step 1: Gather data\n- Step 2: Build timeline\n- Step 3: Analyze root cause\n- Step 4: Customer impact\n- Step 5: Response quality\n- Step 6: Action items\n- Step 7: Populate document\n- Step 8: Verify dates" click SLACK "#" "**Slack Threads**\nCollect from Slack using `slackctl` skill:\n- `#eng-crisis` incident thread\n- Team channel discussions\n- Timestamps of key decisions\n- Participants involved\n- Raw message quotes for the RCA narrative\n\nSlack timestamps are Unix epoch (UTC).\nConvert: `TZ=America/Los_Angeles date -r `" click LINEAR "#" "**Linear Tickets**\nGather from Linear using `linear-cli`:\n- Originating engineering ticket (full details)\n- Support ticket(s) with case counts\n- ALL comments on both tickets\n- The support ticket creator = Support PoC\n\nCommands:\n`linear-cli issue get ACBE-XXXX`\n`linear-cli issue comment list ACBE-XXXX`" click GITHUB "#" "**GitHub PRs**\nGather from GitHub using `gh`:\n- Causal PR (author, merge date, files, reviews)\n- Revert PR (timing, author)\n- Subsequent fix PRs\n- Review comments with context\n\nCommand:\n`gh pr view NNNNN --repo verkada/Verkada-Backend --json title,author,mergedAt,body,files,reviews`" click DATADOG "#" "**Datadog Traces + Dashboards**\nSearch Datadog using `datadog-cli` skill:\n- Error rate spikes around incident timeline\n- Service traces for affected endpoints\n- Dashboard links for the RCA document\n\nOnly needed if alerts or metrics are relevant." click RAW "#" "**Raw Data Collected**\nAll four parallel data streams complete.\nOrganize by source, ready for timeline construction." click MILESTONES "#" "**Identify Milestones**\nExtract the 7 key timeline events:\n1. Bug Introduced (PR merge time)\n2. Incident Started (first report)\n3. Time Detected (engineering engaged)\n4. Root Cause Identified\n5. Mitigation Started (rollback began)\n6. Issue Mitigated (service restored)\n7. Code Revert/Fix (permanent fix merged)" click TZ_CONVERT "#" "**Convert Timestamps to PST**\nAll RCA narrative times must be PST.\n\nSlack (Unix epoch):\n`TZ=America/Los_Angeles date -r `\n\nLinear/GitHub (ISO UTC):\nUse Python -- do NOT use `date -j -f`.\n`python3 -c 'from datetime import ...'`\n\nThe `Z` in ISO timestamps means UTC.\n`date -j -f` treats Z as literal -- wrong result." click VERIFY_DATES "#" "**Dates Verified?**\nDouble-check every timestamp:\n- Is the day-of-week correct?\n- Does the PST time make sense?\n- Are events in chronological order?\n- Do Slack/GitHub/Linear timestamps agree?" click TZ_FIX "#" "**Fix Timestamps**\nUse Python for reliable UTC to PST conversion:\n`python3 -c`\n`from datetime import datetime, timezone, timedelta`\n`pst = timezone(timedelta(hours=-8))`\n`utc = datetime.fromisoformat('...')`\n`print(utc.astimezone(pst))`" click TIMELINE_TABLE "#" "**Build Timeline Table**\nFormat as a Notion-compatible table:\n- All times in PST with day-of-week\n- Each row links to its source\n- Include PR URLs, Slack message links,\n and Linear comment URLs" click ROOT_CAUSE "#" "**Technical Root Cause**\nWhat code/config change caused the issue?\nWhat assumption was violated?\n\nLink to the causal PR diff and\nany relevant code review comments." click CONTRIBUTING "#" "**Contributing Factors**\nProcess and systemic issues:\n- Test coverage gaps\n- Permission model misunderstandings\n- Insufficient staging testing\n- Timing (Friday deploys, pre-vacation)\n- Review process gaps" click FIVE_WHYS "#" "**Five Whys Analysis**\nMust go at least 3 levels deep.\n\n1. Why did the incident happen?\n2. Why did that cause happen?\n3. Why did that deeper cause exist?\n4. Why was it not caught?\n5. Why does that systemic gap exist?\n\nFocus on process failures, not individuals." click BLAMELESS "#" "**Blameless Root Cause**\nFinal root cause statement:\n- Focus on process and system failures\n- No individual blame\n- What needs to change to prevent recurrence" click CUST_IMPACT "#" "**Document Customer Impact**\n- Scope: which users, functionality, geography\n- Known affected customers from Support tickets\n- Salesforce case counts\n- Workarounds available during the incident" click WENT_WRONG "#" "**What Went Wrong?**\n- Why was the bug introduced?\n- Why was it not caught in review/testing?\n- Why was there a detection delay?\n- Was it customer-detected vs internally detected?" click WENT_RIGHT "#" "**What Went Right?**\n- Mitigation speed\n- Cross-team collaboration\n- Rollback speed\n- Communication quality" click RESPONSE "#" "**Response Assessment Complete**\nImpact documented, response evaluated.\nReady to define action items." click IMMEDIATE "#" "**Immediate Actions (7 days)**\nExamples:\n- Rollback/revert the causal change\n- Add regression tests\n- Emergency hotfix\n\nEach item MUST link to a Linear ticket." click MIDTERM "#" "**Mid-term Actions (30 days)**\nExamples:\n- Documentation updates\n- Test coverage improvements\n- Proper fix replacing the revert\n\nEach item MUST link to a Linear ticket." click LONGTERM "#" "**Long-term Actions (90 days)**\nExamples:\n- Monitoring and alerting improvements\n- Broader audit of similar patterns\n- Process changes\n\nEach item MUST link to a Linear ticket." click CREATE_PROJECT "#" "**Create Linear Project**\nCreate a project under the RCA Action Items initiative:\n`linear-cli project create --name 'YYYY/MM/DD - incident' --team-ids --state started`\n\nAttach to initiative:\n`linear-cli gql 'mutation { initiativeToProjectCreate(...) }'`\nInitiative ID: `0e9fa292-cefd-4b79-8d74-61a2fba9fbb9`" click CREATE_ISSUES "#" "**Create Linear Issues**\nCreate follow-up issues with due dates:\n`linear-cli issue create --title '...' --team ACBE --priority 2 --assign-me --project --due-date YYYY-MM-DD`\n\nDue dates by priority:\n- Immediate: +7 days from incident\n- Mid-term: +30 days\n- Long-term: +90 days\n\nSet to Backlog (default Triage is filtered out)." click LINK "#" "**Link Issues to Initiative**\nLink RCA Notion page to:\n- The Linear project description\n- Each issue as an attachment\n\n`linear-cli issue attachment create ACBE-XXXX --url 'https://notion.so/...' --title 'RCA: ...'`" click TEMPLATE "#" "**Load RCA Template**\n14 sections from `templates/rca-template.md`:\n- Executive Summary\n- Background\n- Timeline of Events\n- Customer Impact\n- Root Cause\n- Analysis (Detection, Escalation, Remediation)\n- Five Whys\n- Blameless Root Cause\n- What went wrong/right?\n- Related Incidents\n- Prevention\n- Action Items\n- Follow-ups" click POPULATE "#" "**Populate All Sections**\nFill every section with gathered data.\nInsert raw links at specific locations:\n- Exec Summary: Linear ticket + causal PR\n- Timeline: source link per event\n- Root Cause: PR diff + review comments\n- Detection: Datadog dashboard or Slack thread\n- Escalation: #eng-crisis thread\n- Remediation: revert/fix PR\n- Action Items: Linear ticket per item\n- Follow-ups: Linear project link" click FILL_META "#" "**Fill Notion DB Fields**\nSet all required database properties:\n- Name: `YYYY/MM/DD - incident`\n- Severity: Sev 0-3\n- Environments, Services, Teams\n- Product Surface, Detection Method\n- Time fields in UTC (drives auto-calculated metrics)\n\nKey metrics auto-calculated:\n- TTD (Time to Detect)\n- TTA (Time to Acknowledge)\n- TTF (Time to Fix)\n- Time Broken" click REVIEW "#" "**User Approves?**\nRCAs require manual approval before posting.\nOutput the draft for the user to review.\nNever post directly to Notion without approval." click REVISE "#" "**Revise Draft**\nIncorporate user feedback.\nCommon revisions:\n- Adjust severity assessment\n- Clarify timeline events\n- Soften/strengthen language\n- Add missing context" click POST_NOTION "#" "**Post to Notion RCA Database**\nUse Notion MCP `create-pages` tool.\nData source ID: `3ecbc88b-5c13-4f1d-a06f-426441080c20`\n\nTwo API calls:\n1. Create page with content + basic properties\n2. Update time fields (UTC datetimes)\n\nNotes:\n- Use `` tags, not markdown tables\n- Do NOT wrap headers in bold asterisks\n- Person fields need user ID lookup first" classDef trigger fill:#d1ecf1,stroke:#7ec8d8 classDef gather fill:#dfe6ff,stroke:#5b7bce classDef timeline fill:#fff3cd,stroke:#f0c040 classDef analysis fill:#e8daef,stroke:#b07cc6 classDef impact fill:#ffeaa7,stroke:#e0c040 classDef actions fill:#d4edda,stroke:#5cb85c classDef document fill:#f0e6ff,stroke:#9b59b6 classDef decision fill:#fff3cd,stroke:#f0c040 classDef error fill:#f8d7da,stroke:#e06070 class INCIDENT,CHECKLIST trigger class SLACK,LINEAR,GITHUB,DATADOG,RAW gather class MILESTONES,TZ_CONVERT,TIMELINE_TABLE timeline class TZ_FIX error class VERIFY_DATES decision class ROOT_CAUSE,CONTRIBUTING,FIVE_WHYS,BLAMELESS analysis class CUST_IMPACT,WENT_WRONG,WENT_RIGHT,RESPONSE impact class IMMEDIATE,MIDTERM,LONGTERM,CREATE_PROJECT,CREATE_ISSUES,LINK actions class TEMPLATE,POPULATE,FILL_META,POST_NOTION document class REVIEW decision class REVISE error end style _MAIN_ fill:none,stroke:none,padding:0 _HEADER_ ~~~ _MAIN_