Business Continuity & Disaster Recovery Plan
1. Purpose
Ensure continuity of critical business functions and rapid recovery from disruptive incidents.
2. Objectives
- Resume customer-facing platform operations within RTO 2 h.
- Limit data loss to RPO 15 min via continuous backups.
- Maintain essential internal communication & decision-making capabilities.
3. Scope
Covers production infrastructure, support operations, and HQ office. Excludes non-critical marketing websites.
4. Critical Functions & Recovery Targets
Function | Max Acceptable Outage | RTO | RPO |
API & Search Engine | 2 h | 2 h | 15 min |
Recommendations Service | 4 h | 3 h | 30 min |
Dashboard & Admin UI | 8 h | 4 h | 1 h |
Email Delivery | 12 h | 6 h | 1 h |
Internal Comms (Slack, GMail) | 4 h | 2 h | 0 |
5. Strategies
- Multi-Region Cloud Architecture – Active/standby in
eu-central-1
& eu-west-1
. - IaC & CI/CD – Terraform & GitHub Actions enable infra redeploy <30 min.
- Database Replication – Amazon RDS cross-region replicas to
eu-west-1
(Ireland); WAL streaming. - Snapshot Backups – Hourly DB snapshots replicated to
eu-west-1
; retained 30 days in isolated account.
6. Communication Plan
- Declare incident in Better Stack; BC Lead notifies Exec Team.
- Customer status updates every 60 min via status page & email.
- Media inquiries handled by Marketing VP.
7. Roles
Role | Backup | Responsibilities |
BC Lead (CTO) | SRE Lead | Invoke plan, allocate resources |
DR Coordinator | Senior SRE | Execute technical recovery |
Comms Coordinator | Marketing VP | Stakeholder updates |
Logistics | HR Manager | Facilities, employee safety |
8. Testing & Maintenance
- Semi-annual failover drills to standby region.
- Tabletop exercise involving execs once a year.
- Plan reviewed after each test or major change.
9. Dependencies
- Cloud providers' uptime SLAs.
- Third-party email/SMS gateways – secondary providers configured.
10. Plan Activation & Deactivation
BC Lead authorises activation when impact exceeds predefined thresholds. Deactivation occurs after services stabilise and root cause addressed.
Version 1.0 — effective 2025-07-01