--- description: Systematic Access Control backend investigation workflow. From signal (Slack, Linear, Datadog alert) through tracing, entity lookup, DB queries, device inspection, payload analysis, to reproduction and root cause. category: verkada --- flowchart TD _HEADER_["
ACBE Investigating
Systematic Access Control backend investigation workflow. From signal (Slack, Linear, Datadog alert) through tracing, entity lookup, DB queries, device inspection, payload analysis, to reproduction and root cause.
"]:::headerStyle classDef headerStyle fill:none,stroke:none subgraph _MAIN_[" "] %% Phase 1: Parse the Signal subgraph Signal["Phase 1: Parse the Signal"] SRC_SLACK[Slack Thread] --> EXTRACT[Extract IDs
+ Time Range] SRC_LINEAR[Linear Ticket] --> EXTRACT SRC_DATADOG[Datadog Alert] --> EXTRACT SRC_SUPPORT[Support Ticket] --> EXTRACT end %% Phase 2: Trace the Error subgraph Trace["Phase 2: Trace the Error"] EXTRACT --> SEARCH_TRACE[Search by trace_id
via traces search] SEARCH_TRACE --> FIND_DISPATCH[Find dispatch_request
Span with Stack Trace] FIND_DISPATCH --> EXTRACT_META[Extract Entity ID
+ Request Metadata] EXTRACT_META --> AGGREGATE[Aggregate Errors
Top Entities + IPs] end %% Phase 3: Identify the Entity subgraph Entity["Phase 3: Identify the Entity"] AGGREGATE --> DEV_LOOKUP[vtb device-lookup] & ORG_LOOKUP[vtb get-org-info] & USER_LOOKUP[vtb get-info
by Email] DEV_LOOKUP & ORG_LOOKUP & USER_LOOKUP --> ENTITY_CTX[Entity Context Built] end %% Phase 4: Query the Data subgraph DBQuery["Phase 4: Query the Data"] ENTITY_CTX --> DB_SCHEDULES[Schedules
+ Events] & DB_CREDS[Credentials
+ NFC State] & DB_DOORS[Doors
+ Access Policies] & DB_ORG[Org Settings
+ Config] DB_SCHEDULES & DB_CREDS & DB_DOORS & DB_ORG --> DB_ANOMALIES{Anomalies
Found?} end %% Phase 5: Inspect the Device subgraph Device["Phase 5: Inspect the Device"] DB_ANOMALIES -->|Yes| DEV_LOGS[Read vaccess.log
on Device] DB_ANOMALIES -->|No| DEV_LOGS DEV_LOGS --> DEV_BUFFER[Check Offline
Buffer Queue] DEV_BUFFER --> DEV_CONFIG[Read Device
Config Files] end %% Phase 6: Analyze Payloads subgraph Payload["Phase 6: Analyze Payloads"] DEV_CONFIG --> DOWNLOAD[Download Payload
to Local] DOWNLOAD --> PARSE_JSON[Parse JSON
Strip ANSI] PARSE_JSON --> SIZE_BREAKDOWN[Size Breakdown
by Field] SIZE_BREAKDOWN --> ROOT_FIELD{Oversized or
Malformed Field?} end %% Phase 7: Reproduce and Test subgraph Reproduce["Phase 7: Reproduce and Test"] ROOT_FIELD -->|Yes| VERKURL_REPRO[verkurl: Replay
Against Staging] ROOT_FIELD -->|No| CODE_READ[Read Source
at Failing Line] CODE_READ --> VERKURL_REPRO VERKURL_REPRO --> TDD_TEST[Write TDD Test
Confirm Root Cause] TDD_TEST --> ROOT_CAUSE([Root Cause
Identified]) end click SRC_SLACK "#" "**Slack Thread**\nRead a Slack thread for investigation context.\n`slackctl read '' -r`\n\nExtract:\n- Datadog trace URLs\n- Org/device IDs\n- Timestamps and error descriptions" click SRC_LINEAR "#" "**Linear Ticket**\nRead a Linear ticket for bug/support details.\n`linear issue get ACBE-1234`\n\nExtract:\n- Ticket description and comments\n- Linked PRs\n- Org ID, user ID, timestamps" click SRC_DATADOG "#" "**Datadog Alert**\nParse a Datadog monitor alert.\nUse `datadog-cli` to read the monitor query.\n\nExtract:\n- Affected services\n- Time range\n- Alert threshold and current value" click SRC_SUPPORT "#" "**Support Ticket**\nCustomer support ticket with auth code.\n`vtb auth-code-info` to decode.\n\nExtract:\n- Org info from support code\n- Customer-reported symptoms" click EXTRACT "#" "**Extract IDs + Time Range**\nConsolidate actionable identifiers:\n- `org_id` (UUID)\n- `device_id` or `entity_id`\n- `trace_id` (hex, from Datadog URL)\n- Time window (epoch ms or ISO-8601)\n- Environment/shard (prod1, prod2, prod-ap-syd)" click SEARCH_TRACE "#" "**Search by trace_id**\nUse `traces search`, NOT `traces get` (which takes span IDs).\n`datadog-cli traces search --query 'trace_id: env:' --from 24h --limit 20 --json`\n\nThe `--from` window must cover when the trace occurred." click FIND_DISPATCH "#" "**Find dispatch_request Span**\nError details live on `flask.dispatch_request`, NOT `flask.request`.\nThe request span shows status:500 but no stack trace.\n\nFilter:\n`operation_name:flask.dispatch_request`\n\nExtract: `error.type`, `error.message`, `error.stack`" click EXTRACT_META "#" "**Extract Entity + Request Metadata**\nFrom the `flask.request` span:\n- `entity_id` (device/ACU ID)\n- `http.status_code`\n- `http.client_ip`\n- `duration` (nanoseconds)\n- Request headers" click AGGREGATE "#" "**Aggregate Errors**\nSearch broader to find patterns:\n`service: resource_name:'' status:error`\n\nCount by:\n- Top entity IDs (which devices hit most)\n- Top client IPs\n- Error frequency over time" click DEV_LOOKUP "#" "**vtb device-lookup**\n`vtb -Q -e prod1 -d device-lookup`\n\nReturns: org, firmware version, model, region.\n\nNote: 'Failed to find region' is a warning, not a failure -- device data still returns." click ORG_LOOKUP "#" "**vtb get-org-info**\n`vtb -Q -e prod1 -o get-org-info`\n\nReturns: org name, environment, shard, feature flags, license info." click USER_LOOKUP "#" "**vtb get-info by Email**\n`vtb -Q -e prod1 get-info --emails user@example.com`\n\nLooks up user details by email address.\nUseful when the signal contains a user email but no user ID." click ENTITY_CTX "#" "**Entity Context Built**\nNow have:\n- Device model, firmware, org\n- Org configuration and shard\n- User identity (if applicable)\n\nReady to query the database with full context." click DB_SCHEDULES "#" "**Schedules + Events**\nQuery schedules table for the org:\n- Active schedules and their types\n- Event counts per schedule\n- Check for `end_date` in the past\n- Verify `deleted = false` filter\n\nUse `verkada-db-querying` skill for connection strings." click DB_CREDS "#" "**Credentials + NFC State**\nQuery `mobile_nfc_org_settings` and `mobile_nfc_credential`:\n- Is mobile NFC allowed/enabled?\n- Apple Wallet TCI assignment\n- Active vs inactive credentials\n- Look for duplicate or orphaned records" click DB_DOORS "#" "**Doors + Access Policies**\nQuery door configurations:\n- `doors_to_schedules` mappings\n- Access level assignments\n- Door controller associations\n- Lockdown state" click DB_ORG "#" "**Org Settings + Config**\nQuery organization-level configuration:\n- `ORG_MOBILE_NFC_ENABLED` flag\n- Feature flags from vtb\n- Compare settings against known-good orgs\n- Check for sconfig inconsistencies" click DB_ANOMALIES "#" "**Anomalies Found?**\nDo any DB queries reveal unexpected state?\n- Stale schedules with past end dates\n- Missing or duplicate records\n- Inconsistent flags (allowed vs enabled)\n- Data that contradicts expected behavior\n\nEither way, proceed to device inspection." click DEV_LOGS "#" "**Read vaccess.log**\n`vtb -Q -e prod1 -d cat /mnt/log/vaccess.log`\n\nDevice-side access control firmware logs.\nLook for errors, timeouts, rejected credentials." click DEV_BUFFER "#" "**Check Offline Buffer Queue**\n`vtb -Q -e prod1 -d ls /mnt/log/vcerberus_offline_buffer/`\n\nFailed HTTP requests queued for retry.\nOversized payloads often pile up here." click DEV_CONFIG "#" "**Read Device Config Files**\n`vtb -Q -C -e prod1 -d cat /mnt/config/...`\n\nUseful paths:\n- `/mnt/config/device_id` - Device UUID\n- `/mnt/config/` - All configuration\n- `/mnt/log/behavior_events.log` - Behavior debug log\n\nUse `-C` flag to disable ANSI color codes." click DOWNLOAD "#" "**Download Payload**\n`vtb -Q -C -e prod1 -d cat > /tmp/payload.txt`\n\nSave a buffered request or log locally for analysis.\nAlways use `-C` to strip ANSI codes." click PARSE_JSON "#" "**Parse JSON + Strip ANSI**\nClean the downloaded payload:\n- Strip non-JSON content before the first `{`\n- Parse with `json.loads()`\n- vtb output often has ANSI escape sequences mixed in" click SIZE_BREAKDOWN "#" "**Size Breakdown by Field**\nRecursively measure each JSON field:\n- Total payload size in bytes\n- Per-key sizes, sorted largest first\n- Drill into fields > 500 bytes\n- Identify bloated arrays or strings\n\nUse the Python size_of() helper from the skill." click ROOT_FIELD "#" "**Oversized or Malformed Field?**\nDoes the analysis reveal:\n- A single field dominating payload size?\n- Garbage data (e.g., repeated characters)\n- Unexpected field types or lengths?\n\nIf yes, the payload itself is the problem.\nIf no, the issue is in the code path." click VERKURL_REPRO "#" "**verkurl: Replay Against Staging**\nReproduce the issue with the actual payload:\n`verkurl request vcerberus /access/v2/endpoint --profile my-staging -m POST -d @/tmp/payload.txt`\n\nCompare staging vs production behavior." click CODE_READ "#" "**Read Source at Failing Line**\nFrom the Datadog stack trace, open the exact file and line.\nCheck for:\n- Bare `except` clauses swallowing errors\n- Missing size/validation checks\n- Recent changes via `git blame`" click TDD_TEST "#" "**Write TDD Test**\nWrite a test that exercises the failure path:\n`bazel test //vcerberus/test/unit/... --config=remote`\n\nIf the test fails as expected, root cause is confirmed.\nNever use `--test_tag_filters`." click ROOT_CAUSE "#" "**Root Cause Identified**\nInvestigation complete. Deliverables:\n- Confirmed root cause with evidence\n- Reproducible test case\n- Recommended fix\n- Update Linear ticket with findings" classDef signal fill:#d1ecf1,stroke:#7ec8d8 classDef trace fill:#dfe6ff,stroke:#5b7bce classDef entity fill:#e8daef,stroke:#b07cc6 classDef database fill:#ffeaa7,stroke:#e0c040 classDef device fill:#fff3cd,stroke:#f0c040 classDef payload fill:#f8d7da,stroke:#e06070 classDef reproduce fill:#d4edda,stroke:#5cb85c classDef decision fill:#fff3cd,stroke:#f0c040 class SRC_SLACK,SRC_LINEAR,SRC_DATADOG,SRC_SUPPORT,EXTRACT signal class SEARCH_TRACE,FIND_DISPATCH,EXTRACT_META,AGGREGATE trace class DEV_LOOKUP,ORG_LOOKUP,USER_LOOKUP,ENTITY_CTX entity class DB_SCHEDULES,DB_CREDS,DB_DOORS,DB_ORG database class DB_ANOMALIES decision class DEV_LOGS,DEV_BUFFER,DEV_CONFIG device class DOWNLOAD,PARSE_JSON,SIZE_BREAKDOWN payload class ROOT_FIELD decision class VERKURL_REPRO,CODE_READ,TDD_TEST,ROOT_CAUSE reproduce end style _MAIN_ fill:none,stroke:none,padding:0 _HEADER_ ~~~ _MAIN_