Clerk.io

System Description

1. Overview

Clerk.io is a multi-tenant, cloud-native e-commerce personalisation platform that provides Search, Recommendations, Audience, Email and Chat services to merchants world-wide.

The System in scope for SOC 2 Type II covers all production services used to collect, process, store and transmit customer data, together with supporting infrastructure and personnel processes.

Component Technology Location Notes
Application Servers Node.js (TypeScript) on Amazon EC2 AWS – eu-central-1 Stateless micro-services (ALB + ASG)
Databases Self-managed MySQL on EC2 eu-central-1 Customer catalogue & behavioural events
Object Storage Amazon S3 eu-central-1 (replica eu-west-1) Media, backups, model artefacts
CI/CD GitHub Actions, Terraform Cloud EU/US IaC, image build & deploy
Monitoring Prometheus, Grafana, Amazon CloudWatch EU Metrics & alerts
Monitoring Datadog (metrics, traces, logs) EU Metrics & alerts forwarded to Better Stack
Logging Amazon CloudWatch Logs, exported to S3 & SIEM EU Immutable, 365-day retention
Logging Datadog Log Management with S3 cold archive EU Immutable, 365-day retention
Authentication Clerk.dev SaaS US/EU MFA enforced for workforce

1.1 Architecture Diagram

graph TD
  Shopper("End-user Browser") -->|HTTPS 443| ALB["AWS Application Load Balancer"]
  ALB --> API["Public API (REST/GraphQL) on EC2 Auto-Scaling Group"]
  API --> Search["Search Service"]
  API --> Reco["Recommendation Service"]
  Search --> DB["MySQL on EC2"]
  Search --> MQ[(Kafka)]
  MQ --> Stream["Stream Processor on EC2"]
  Stream --> S3Analytics["S3 Analytical Data"]
  Admin["Merchant Dashboard"] -->|HTTPS| ALB
  S3Analytics --> ML["ML Training Jobs on EC2"]
  ML --> S3["Amazon S3 Models"]

2. In-Scope Services

  1. Public APIs – GraphQL & REST endpoints consumed by merchant storefronts.
  2. Search & Recommendation Engine – Real-time ranking pipeline.
  3. Admin Dashboard – Merchant management UI.
  4. Data Processing Jobs – Batch ETL and ML model training.

Out-of-scope: Corporate marketing websites, staging environments, and internal dev tooling not connected to production.

3. Boundaries & Interfaces

All external traffic terminates at AWS Application Load Balancer (ALB) protected by AWS WAF and TLS 1.3. Internal service-to-service traffic is encrypted via mTLS (Istio).

Data ingress occurs via: * HTTPS API requests (port 443) * SFTP ingest for catalogue files

Data egress occurs via: * HTTPS responses & Webhooks * Managed exports to Amazon Redshift or S3 (opt-in)

4. Data Flow Summary

  1. Shopper triggers search on merchant site.
  2. Request hits AWS Application Load Balancer (ALB) ⇒ Search service.
  3. Result set returned; anonymised click data enqueued to Kafka.
  4. Stream processor updates behavioural dataset in Amazon Redshift.
  5. Daily ML job retrains ranking model; artefacts stored in S3.

4.1 Sequence Diagram

sequenceDiagram
  participant Shopper
  participant ALB as AWS ALB
  participant API as Public API
  participant Search
  participant DB as Amazon RDS
  Shopper->>ALB: Search request
  ALB->>API: Forward
  API->>Search: Query
  Search->>DB: SQL SELECT
  DB-->>Search: Results
  Search-->>API: JSON
  API-->>ALB: Response
  ALB-->>Shopper: Products
  Note over Search,Kafka: Click event emitted
  Search->>Kafka: Publish click
  Kafka->>StreamProc: Stream processing
  StreamProc->>Redshift: Store metrics

5. Supporting Processes

6. Changes Since Last Report


Version 1.0