Business Continuity & Disaster Recovery Plan
1. Purpose
Ensure continuity of critical business functions and rapid recovery from disruptive incidents.
2. Objectives
- Resume customer-facing platform operations within RTO 4-6 h for most scenarios.
- Limit data loss to RPO 24 h for customer data (nightly backups); system database has no data loss (HA).
- Maintain essential internal communication & decision-making capabilities.
3. Scope
Covers production infrastructure, support operations, and HQ office. Excludes non-critical marketing websites.
4. Critical Functions & Recovery Targets
Function | Max Acceptable Outage | RTO | RPO |
API & Search Engine | 8 h | 4-6 h | 24 h |
Recommendations Service | 8 h | 4-6 h | 24 h |
Dashboard & Admin UI | 12 h | 4-6 h | 24 h |
Email Delivery | 12 h | 6 h | 24 h |
Internal Comms (Slack, GMail) | N/A | N/A | N/A (external) |
5. Strategies
- Infrastructure as Code – Terraform and Ansible (manual processes) can provision base infrastructure; full restoration takes longer.
- Database Backups – Nightly DB dumps stored in both
eu-central-1
and replicated to eu-west-1
; retained 7 days daily + 30+ days monthly. - Data Replication – Critical data replicated to
eu-west-1
(Ireland) for disaster recovery; no standby infrastructure.
6. Communication Plan
- Declare incident in Better Stack; BC Lead notifies Exec Team.
- Customer status updates every 60 min via status page & email.
- Media inquiries handled by Marketing VP.
7. Roles
Role | Backup | Responsibilities |
BC Lead (Head of Product) | SRE Lead | Invoke plan, allocate resources |
DR Coordinator | Senior SRE | Execute technical recovery |
Comms Coordinator | Marketing VP | Stakeholder updates |
Logistics | HR Manager | Facilities, employee safety |
8. Testing & Maintenance
- Bi-monthly (every 2 months) tabletop exercise involving execs.
- Plan reviewed after major changes.
9. Dependencies
- Cloud providers' uptime SLAs.
- Third-party email/SMS gateways – secondary providers configured.
10. Plan Activation & Deactivation
BC Lead authorises activation when impact exceeds predefined thresholds. Deactivation occurs after services stabilise and root cause addressed.
Version 1.1 — effective 2025-09-01