Risk Assessment Report - 2026 (Annual)

1. Executive Summary

This is the first formal annual risk assessment conducted under Clerk.io's documented Risk Assessment & Treatment Methodology. The assessment was performed on 2026-04-30, covering the state of the Information Security Management System (ISMS) as of that date and informed by all available evidence from the three completed bi-monthly SRE compliance audits (2025-10, 2025-12, 2026-02) plus the Statement of Applicability.

This iteration covers the SRE / Technical risk domain. People, Supplier, Legal/DPO and Physical domain workshops are deferred for later iterations.

Headline:

Band	Definition	Risks
Unacceptable	Score >= 15 - treatment required	1
Manage	Score 6-14 - management decision on treatment vs acceptance	12
Acceptable	Score <= 5 - documented and monitored	3
Deferred	Carried over for next domain workshop	1

The single Unacceptable-band risk has compensating controls in place and treatment options identified. All other risks in the Manage band have treatment options identified and named risk-owner roles. Selecting which treatment to apply, the specific actions, the timelines and the resource commitments for each risk are decided in the Risk Treatment Plan, which is owned and signed off at management level. The residual-risk picture is consistent with the documented risk appetite, contingent on the Risk Treatment Plan being decided and executed.

Next scheduled assessment: 2027-04-30, or sooner upon significant change.

2. Methodology

The assessment followed the approved Risk Assessment & Treatment Methodology, using the 1-5 likelihood and impact scales defined therein with risk score = L x I (max 25). Risks were sourced from:

The existing Risk Treatment Plan (carry-over of prior risks, refreshed)
Findings from the three bi-monthly SRE compliance audits completed since the last cycle
TODO-marked or aspirationally-worded controls in the Statement of Applicability
New risks identified during the workshop based on industry-baseline expectations

Each candidate risk was scored against the documented L/I scales using current operational evidence; treatment options were chosen from the methodology's four options (Mitigate, Transfer, Avoid, Accept). The full risk register, including scoring rationale and implementation detail, is maintained internally by the Information Security Manager and made available to auditors under non-disclosure.

3. Scope of this assessment

In scope (this iteration):

All technical and SRE-owned risks in the production platform: compute, network, storage, databases, CI/CD, source repositories, monitoring, backup and restore, dependency supply chain, secret management, change management.

Deferred to later workshops:

Domain	Status
Legal / DPO	Deferred for later
People / HR	Deferred for later
Supplier / Third-party	Deferred for later
Physical	Deferred for later

This deferral is a deliberate choice of the first iteration: rather than a shallow pass across all domains, this report represents a thorough pass across one domain. Subsequent iterations will broaden coverage. The deferred domains are tracked in the internal register so they cannot be lost.

4. Risk profile

4.1 Distribution by band

The 16 active SRE-domain risks distribute as follows:

Band	Count
Unacceptable (>= 15)	1
Manage (6-14)	12
Acceptable (<= 5)	3

Plus one risk deferred from the prior cycle to an upcoming Legal/DPO workshop.

4.2 Themes observed

Four themes recurred across the workshop and are worth surfacing because they shape how this report should be read:

Several risks reflect deliberate management trade-offs. Where management has chosen velocity, cost-efficiency, or partner-integration value over a stricter security posture, the affected risks have been documented as such. The Information Security Manager has surfaced compensating controls for each; the underlying choice is owned by management. This is a healthy pattern: trade-offs are visible and traceable rather than hidden.
Several risks reflect documentation lagging operational reality. In multiple cases, the Statement of Applicability or operational KPIs describe a state stricter or weaker than current practice. These mismatches generated specific corrective actions to bring the documents in line with operational truth.
Several risks compound across the technical estate. A small number of architectural conditions (the data-isolation gap, the absence of database-layer access logging) interact with one another and amplify several other risks. The treatment plan reflects this by treating cross-cutting conditions explicitly rather than risk-by-risk.
Role-doubling between ISM and SRE Lead. The same individual currently holds both roles. This is acknowledged in the register and will be raised at the next ISMS management review for evaluation under the SoA's separation-of-duties control. The doubling makes independent verification both more important and more constrained.

5. Risks identified, with treatment posture

The summaries below describe each Manage-band-and-above risk at posture level: the risk itself, the existing controls already in place, the treatment options identified during the workshop, and a proposed treatment direction. The report does not select treatments, define specific actions, or commit timelines - those decisions are made in the Risk Treatment Plan, which is owned and signed off at management level. Specific exploit detail, ticket numbers, and scoring rationale are not in scope for this document; detailed scoring and implementation specifics are in the internal register, available to auditors under non-disclosure.

Unacceptable band

Credential stuffing on customer-facing APIs

Authenticated customer accounts are exposed to credential-stuffing attacks using credentials leaked from other services. Existing controls: application-layer rate limiting on all authentication endpoints, channel-based two-factor authentication for employee logins, and Google SSO as an opt-in MFA path for customers. Treatment options identified during the workshop: in-app bot/automation detection, breached-password checks at login, anomaly-based alerting on login-failure spikes, and increased visibility of the SSO option in customer onboarding. Treatment direction proposed: Mitigate. Risk owner role: SRE Lead.

Manage band

Software supply-chain compromise via third-party dependencies

Compromise of an upstream third-party dependency could introduce malicious code into Clerk.io builds. Existing controls: lockfile pinning across all ecosystems, Dependabot CVE notifications and update PRs across all repositories, ECR container-image scanning, and SAST/DAST in CI. Treatment options identified during the workshop: adopting a formal Software Composition Analysis tool in CI, generating Software Bill of Materials artefacts per build for post-incident impact assessment, and including dependency-update PRs in any future hot-path mandatory-review scope. Treatment direction proposed: Mitigate. Risk owner role: SRE Lead / Head of Product.

Secret / credential leakage via source code or build artifacts

Production secrets are managed via AWS Secrets Manager with IAM-role-based authentication for application services. Existing controls: AWS Secrets Manager as the canonical store; IAM-role-based authentication for non-DevOps; ongoing programme to delete and rotate historic hardcoded secrets; demonstrated 24-hour detection-to-rotation response on a recent slip. Active work in progress: DevOps long-lived AWS access keys are being phased out, reviewed each audit cycle. Treatment options identified during the workshop: enabling automated secret scanning at multiple layers (GitHub secret scanning on public repositories, pre-commit hooks, CI scanning), a one-time deep scan of code-bases for residual historic secrets, codifying the response runbook to lock in the demonstrated response time, and isolating Jenkins build containers from production secrets where stages do not require them. Treatment direction proposed: Mitigate. Risk owner role: SRE Lead.

Production / non-production data isolation gap (architectural)

Application architecture currently relies on direct production database access for non-trivial development. Some subsystems hardcode production-only paths, preventing the code from operating in an isolated environment. The result is that all developers have effective access to production data as a routine part of their work. Existing controls: developer access is gated by VPN authentication and SSH to a dev-server (private key or kerberos); database credentials are read from AWS Secrets Manager once on the dev-server; bi-monthly access reviews for both Kerberos and VPN; zero-trust device stance limiting local data; no AWS Console / SSH-to-production-hosts / Infrastructure-as-Code / Configuration-as-Code access for non-DevOps engineers. These controls gate access to the dev-server but do not prevent broad production-data access once a developer is on it - which is the architectural condition this risk captures. Treatment options identified during the workshop: introducing environment-aware secrets paths and refactoring fringe subsystems for environment-portability (multi-quarter architectural work, dependent on engineering capacity), plus near-term compensating controls of a bulk-export approval workflow for engineering database queries and database-layer query logging. Treatment direction proposed: Mitigate (compensating controls near-term; architectural fix multi-quarter). Risk owner role: Head of Product / SRE Lead.

Single-region cloud outage of production platform

Production runs in a single AWS region. The Business Continuity Plan currently cites a 4-6 hour RTO that is realistic for partial failures but optimistic for total region loss; restoration to a different region is achievable through existing Infrastructure-as-Code and replicated backups but is multi-day. Existing controls: comprehensive Terraform coverage of the AWS account, Ansible playbooks for server configuration, multi-region backup of the CI/CD system, cross-region replication of database backups, and a documented Business Continuity Plan. Treatment options identified during the workshop: realigning the BCP RTO to separate partial-failure and full-region-loss scenarios with realistic times for each, adding an interim region-loss playbook, and validating cross-region restoration end-to-end once the replacement backup system is delivered. Multi-region active failover was considered and is currently not in scope; that trade-off is owned at management level. Treatment direction proposed: Mitigate. Risk owner role: Head of Product / SRE Lead.

Backup integrity / ransomware exposure

Backups are stored in segregated S3 buckets with versioning, replicated to a paired bucket with stronger access restrictions. The replication architecture provides a useful immutability property: deletion on the source bucket does not propagate to the replica. The remaining ransomware vector requires administrator-level account compromise. Existing controls: segregated buckets per backup type, source-bucket versioning, replica-bucket access restricted to administrators and root, delete markers not replicated to the replica, Amazon S3 managed KMS encryption (SSE-S3), replication lag audit-checked each cycle, and application-aware backup tooling supporting surgical restore. Treatment options identified during the workshop: enabling S3 Object Lock (Compliance mode) on the new mariadbbackup buckets at the replacement-system launch, evaluating retroactive enablement on existing buckets, and adding CloudTrail alerting on KMS key disable / delete events. Treatment direction proposed: Mitigate. Risk owner role: SRE Lead.

Limited detection of customer-data exfiltration

The platform's analytics products legitimately load entire customer datasets at high volume, and production database servers handle tens of thousands of queries per second. Together these make query-layer audit logging both operationally infeasible at scale (storage and SIEM-ingest cost is order ~30-70 TB/year per server compressed) and ineffective for outlier detection (legitimate workloads already match the worst-case extraction patterns, so there is no meaningful "abnormal" baseline). Existing controls: developer access is gated by VPN authentication and SSH to a dev-server (private key or kerberos); database credentials are read from AWS Secrets Manager once on the dev-server; bi-monthly access reviews for both VPN and Kerberos; VPN session bandwidth is logged per user, cumulative within a session. None of these provide visibility into individual queries once issued. Treatment options identified during the workshop: extending the VPN bandwidth logging to aggregate per-user egress across sessions (mitigating the trivial session-reset bypass on the existing per-session counter), defining alert thresholds on cross-session aggregate egress rather than absolute volume, leveraging the bulk-export approval workflow (cross-reference: the data-isolation-gap risk above) as a prevention layer, and documenting the deliberate non-treatment of query-layer logging with rationale. Full query-layer detection was evaluated and rejected on operational and cost grounds; the residual gap on low-and-slow exfiltration that stays beneath aggregate-egress thresholds is acknowledged. Treatment direction proposed: Mitigate at the network-egress layer plus prevention via the approval workflow; accept residual on per-query detection. Risk owner role: SRE Lead.

Production change management - peer review on PRs is optional

On 2026-04-29, the change-management policy was revised: pull-request creation remains required, but peer review on PRs was made optional. The change supports a small, highly-skilled engineering team augmented by LLM agents and was made by management to reduce review-bottleneck friction. Existing controls: PR creation is technically still required (record-of-change preserved); CI must pass (SAST, automated tests); GitHub audit log of all merges; cultural disposition within the engineering team toward seeking review on non-trivial work and skepticism of LLM-generated code. Treatment options identified during the workshop: defining a hot-path scope where review remains technically mandatory regardless of base policy (e.g. authentication, billing, infrastructure, customer-data-access code paths), strengthening automated CI guards, deploy-time guards (canary, automated rollback), and ISM-led post-merge sampling of merged PRs. Treatment direction proposed: Mitigate, within the accepted base policy. Risk owner role: Head of Product / SRE Lead.

Vulnerability remediation - KPI vs risk-based practice mismatch

The organisation's vulnerability-remediation KPI requires 100% compliance with strict SLAs (72h critical, 14d high). Operational practice is risk-based: dependency updates are remediated promptly via Dependabot; OS and stack-level CVEs are patched immediately when non-disruptive or scheduled responsibly weighing exposure against disruption. The KPI wording does not formally accommodate exposure-aware deferrals, creating a structural mismatch between the documented metric and honest operational practice. Existing controls: Dependabot enabled on all repositories for both CVE-driven and routine dependency updates with weekly email digest, exposure-aware risk assessment of OS- and stack-level CVEs. Active work in progress: an LLM-augmented Dependabot triage automation in development to pre-screen simple updates. Treatment options identified during the workshop: documenting the risk-based patching policy explicitly so it survives personnel changes, raising KPI 3 realignment at the next ISMS management review to permit documented risk-based exceptions, and ensuring exposure-based deferral decisions are recorded. Treatment direction proposed: Mitigate. Risk owner role: SRE Lead / Head of Product.

DR / failover capability vs documented RTO/RPO

The legacy database backup-and-restore architecture cannot meet the 4-6 hour RTO documented in the Business Continuity Plan for full-restore scenarios on the largest customer databases. Real-world full-restore scenarios are rare (one in six years across the platform). Existing controls: EBS volume persistence handles most failure modes without requiring restore from backup; single-schema and single-table restores are routine and trivially within RTO; cross-region replication of dumps; documented BCP. Active work in progress: a replacement physical-backup system based on mariadbbackup is in delivery; the design eliminates the index-rebuild step and is expected to bring restoration within data-transfer-bound time. Treatment options identified during the workshop: completing delivery of the replacement backup system, re-running a DR exercise once the new system is in place, and splitting the BCP RTO into separate partial-failure and full-restore numbers to align documentation with realistic capability. Treatment direction proposed: Mitigate. Risk owner role: SRE Lead.

Insider exfiltration of customer data

Authorised personnel could remove customer data through routine database access. Existing controls: confidentiality / NDA clauses for all employees; bi-monthly access reviews; role-based access controls in the customer-facing tooling; employee zero-trust device policy with no local customer data; database access gated by VPN authentication and SSH to a dev-server (private key or kerberos), with database credentials read from AWS Secrets Manager; SSO+MFA for collaboration cloud services (Google Workspace, SaaS tools); tightly-restricted billing-data access (limited to a small number of named individuals); professional norms within a small full-employee workforce (no contractors with customer-data access). None of these provide visibility into individual queries once issued. Treatment options identified during the workshop: network-egress-based exfiltration detection (overlaps with the exfiltration-detection risk above), a bulk-export approval workflow with secondary authorisation, and continuing the bi-monthly access review cadence. Final likelihood scoring on this risk is pending the People-domain workshop, where executive input on human-operations factors will be sought. Treatment direction proposed: Mitigate. Risk owner role: Information Security Manager.

Limited data-staging capability

There is currently no running data-staging environment. Code-staging exists for the dashboard and API services only. The original staging programme - including a planned data-anonymisation Lambda - was halted due to engineering capacity constraints. Existing controls: code-staging for the dashboard and API services; spot-restore methodology onto separate database servers (restored data not exposed to applications); preserved Infrastructure-as-Code and Lambda code so revival remains feasible; deploy-and-rollback discipline for changes that bypass staging. Treatment options identified during the workshop: documenting the spot-restore methodology and code-staging coverage explicitly as compensating controls, documenting a deploy-and-rollback runbook for data-layer changes in production, preserving the staging Infrastructure-as-Code so revival remains feasible, and re-evaluating staging revival at each ISMS management review when team capacity permits. Treatment direction proposed: Mitigate (compensating-controls focus); revival not currently in scope. Risk owner role: Head of Product / SRE Lead.

Acceptable band

Asset inventory verification gap

The asset inventory is reported complete and the public extract describes Linear as the source-of-truth, refreshed weekly via GitHub Action. The Information Security Manager does not currently have read-access to the Linear CMDB project for independent verification. Existing controls: Linear CMDB process operationally maintained, documented sync mechanism, bi-monthly audit check on the asset inventory. Treatment options identified during the workshop: provisioning ISM read-access to the Linear CMDB project (the CEO is the existing approver), verifying the sync mechanism and the staleness signal in the public extract, and adding an explicit ISM verification step to the bi-monthly SRE audit so future cycles do not depend on operational-team self-attestation. Treatment direction proposed: Mitigate (verification scaffolding). Risk owner role: CEO.

Configuration drift detection cadence

Production infrastructure has comprehensive Infrastructure-as-Code and Configuration-as-Code coverage. Drift detection is event-triggered (every Terraform apply) and audit-triggered (bi-monthly mandated plan run); it is not continuous. The acknowledged gap is small. Existing controls: 100% IaC and CaC coverage of production, drift implicit at every Terraform apply, audit-time mandated plan runs catching slow-accumulating drift, the latest audit confirming zero unmanaged drifts at audit end. Treatment options identified during the workshop: documenting the drift-detection cadence explicitly, adding a post-incident manual-change reconciliation step to the runbook, and (optional / future) evaluating continuous drift-detection tooling such as AWS Config or Driftctl. Treatment direction proposed: Mitigate (documentation focus). Risk owner role: SRE Lead.

Legacy AWS network primitives

Some of the production VPC and subnet resources are AWS legacy resource types that the current API will not recreate. They function identically to current resource types and have no technical dependencies that would compound difficulty during recovery. Existing controls: Terraform IaC manages the resources as-is; legacy resources function identically to current resource types in console and runtime; the recovery delta is bounded (a single resource-type change in Terraform). Treatment options identified during the workshop: documenting the recovery step in the disaster-recovery runbook so any operator can recover without prior knowledge, monitoring AWS announcements for resource-type deprecation, and (optional) proactively migrating to current resource types during a planned maintenance window to remove the curiosity entirely. Treatment direction proposed: Document and monitor. Risk owner role: SRE Lead.

6. Accepted residual risks

One risk is treated as Accept rather than Mitigate.

Iframe embedding policy on customer-facing applications

The customer dashboard intentionally permits embedding in third-party iframes to support partner integrations. Mitigation via per-origin allowlisting is not viable - partner integrations originate from an effectively unbounded set of customer storefront domains. Existing controls: multi-factor authentication for employee/admin endpoints, secure session cookies with signed, IP-bound JWT verification (cookie theft alone is non-useful; an attack must occur in the legitimate user's own browser), and ongoing external monitoring through the public bug-bounty programme. The risk has been re-evaluated and re-affirmed across multiple bug-bounty cycles by the Head of Product as risk owner; a formal acceptance memo capturing the most recent re-affirmation date is pending. Treatment direction proposed: Accept (the decision is established in practice through repeated re-affirmation; the formal memo formalises it). Risk owner role: Head of Product.

7. Continuous improvement themes

This first formal assessment surfaced several cross-cutting items that go beyond any single risk and are being tracked for the next ISMS management review:

Statement of Applicability corrections. Several SoA controls describe a state stricter or weaker than current operational practice. Corrective wording for each affected control is in flight.
KPI realignment. Two organisational KPIs (vulnerability remediation, DR exercise targets) currently fail because they are written more strictly than realistic operational practice can meet. Realignment is owned at the management-review level.
Asset Inventory data-classification fix. The customer-behaviour dataset contains email addresses and is therefore not "anonymised" as the public extract currently implies; reclassification under GDPR-personal-data is in flight, with corresponding update to the Record of Processing Activities.
Role-doubling acknowledgment. The Information Security Manager and SRE Lead are currently the same individual. The doubling has been documented and will be evaluated at the next ISMS management review under the SoA's separation-of-duties control.
Repository-as-sign-off governance pattern. Changes to ISMS documents in this repository are merged by the CEO (with audit documents merged by the ISM to preserve the auditor's commit signature). This effectively makes pull-request merge a sign-off vehicle for ISMS document changes; the pattern is being documented in methodology / document control for explicit traceability.

These items are not risks in the formal register sense. They are governance and continuous-improvement work that the first assessment exposed and that the bi-monthly ISMS review cycle will progress.

8. Conclusion

Clerk.io's information-security risk exposure across the SRE / Technical domain is within the documented risk appetite, contingent on (a) management selecting and resourcing treatments for the Manage-band risks via the Risk Treatment Plan, (b) those treatments being executed, and (c) the deferred-domain workshops (Legal/DPO, People, Supplier, Physical) being conducted. The single Unacceptable-band risk has compensating controls already in place and treatment options identified for management decision.

The honest character of this report - documenting deliberate trade-offs, surfacing documentation-vs-reality gaps, acknowledging role-doubling, and deferring entire domains rather than producing a thin pass across them - is itself evidence that the ISMS is functioning as designed. Issues are visible, owned, and surfaced for treatment rather than hidden.

Progress on all open treatments is tracked through the bi-monthly SRE compliance audit cycle. Material changes to the risk picture will trigger an ad-hoc reassessment per the methodology's frequency rule.

9. Sign-off

Role	Means of sign-off	Date
Information Security Manager	Commit signature on this file	2026-05-04
Head of Product	Co-signature on PR or ISMS management review minutes	pending
Chief Executive Officer	PR merge to this repository (per documented governance)	pending

Prepared in accordance with ISO/IEC 27001:2022 §6.1.2 - 6.1.3.