System Description
1. Overview
Clerk.io is a multi-tenant, cloud-native e-commerce personalisation platform that provides Search, Recommendations, Audience, Email and Chat services to merchants world-wide.
The System in scope for SOC 2 Type II covers all production services used to collect, process, store and transmit customer data, together with supporting infrastructure and personnel processes.
Component | Technology | Location | Notes |
Application Servers | Node.js (TypeScript) on Amazon EC2 | AWS – eu-central-1 | Stateless micro-services (ALB + ASG) |
Databases | Self-managed MySQL on EC2 | eu-central-1 | Customer catalogue & behavioural events |
Object Storage | Amazon S3 | eu-central-1 (replica eu-west-1 ) | Media, backups, model artefacts |
CI/CD | GitHub Actions, Terraform Cloud | EU/US | IaC, image build & deploy |
Monitoring | Prometheus, Grafana, Amazon CloudWatch | EU | Metrics & alerts |
Monitoring | Datadog (metrics, traces, logs) | EU | Metrics & alerts forwarded to Better Stack |
Logging | Amazon CloudWatch Logs, exported to S3 & SIEM | EU | Immutable, 365-day retention |
Logging | Datadog Log Management with S3 cold archive | EU | Immutable, 365-day retention |
Authentication | Clerk.dev SaaS | US/EU | MFA enforced for workforce |
1.1 Architecture Diagram
graph TD
Shopper("End-user Browser") -->|HTTPS 443| ALB["AWS Application Load Balancer"]
ALB --> API["Public API (REST/GraphQL) on EC2 Auto-Scaling Group"]
API --> Search["Search Service"]
API --> Reco["Recommendation Service"]
Search --> DB["MySQL on EC2"]
Search --> MQ[(Kafka)]
MQ --> Stream["Stream Processor on EC2"]
Stream --> S3Analytics["S3 Analytical Data"]
Admin["Merchant Dashboard"] -->|HTTPS| ALB
S3Analytics --> ML["ML Training Jobs on EC2"]
ML --> S3["Amazon S3 Models"]
2. In-Scope Services
- Public APIs – GraphQL & REST endpoints consumed by merchant storefronts.
- Search & Recommendation Engine – Real-time ranking pipeline.
- Admin Dashboard – Merchant management UI.
- Data Processing Jobs – Batch ETL and ML model training.
Out-of-scope: Corporate marketing websites, staging environments, and internal dev tooling not connected to production.
3. Boundaries & Interfaces
All external traffic terminates at AWS Application Load Balancer (ALB) protected by AWS WAF and TLS 1.3. Internal service-to-service traffic is encrypted via mTLS (Istio).
Data ingress occurs via: * HTTPS API requests (port 443) * SFTP ingest for catalogue files
Data egress occurs via: * HTTPS responses & Webhooks * Managed exports to Amazon Redshift or S3 (opt-in)
4. Data Flow Summary
- Shopper triggers search on merchant site.
- Request hits AWS Application Load Balancer (ALB) ⇒ Search service.
- Result set returned; anonymised click data enqueued to Kafka.
- Stream processor updates behavioural dataset in Amazon Redshift.
- Daily ML job retrains ranking model; artefacts stored in S3.
4.1 Sequence Diagram
sequenceDiagram
participant Shopper
participant ALB as AWS ALB
participant API as Public API
participant Search
participant DB as Amazon RDS
Shopper->>ALB: Search request
ALB->>API: Forward
API->>Search: Query
Search->>DB: SQL SELECT
DB-->>Search: Results
Search-->>API: JSON
API-->>ALB: Response
ALB-->>Shopper: Products
Note over Search,Kafka: Click event emitted
Search->>Kafka: Publish click
Kafka->>StreamProc: Stream processing
StreamProc->>Redshift: Store metrics
5. Supporting Processes
- Secure SDLC with mandatory peer review & automated security testing.
- Continuous deployment with canary percentage rolls.
- 24×7 on-call rotation with Better Stack.
6. Changes Since Last Report
- Migrated compute from EC2 Auto-Scaling Groups to AWS Fargate to improve isolation.
- Added US-Central 1 standby region for reduced RTO.
Version 1.0