Backup & Restore Policy
1. Purpose
Ensure critical data is backed up securely and can be restored within defined RPO/RTO.
2. Scope
Applies to production databases, configuration repositories and essential SaaS data.
3. Backup Schedule
Resource | Method | Frequency | Retention |
MariaDB Instances (system- and customer-databases) | Table-level backup (bespoke tool based on mysqldump ) nightly | Nightly | 7 days (daily) + ≥30 days (monthly snapshot) |
Object Storage (Amazon S3) | Versioning & cross-region replication from eu-central-1 (Frankfurt) to eu-west-1 (Ireland) for critical buckets | Real-time | N/A |
4. Storage & Encryption
Backups stored with IAM separation; encrypted at rest with KMS (AWS managed keys for existing buckets, CMKs for new buckets).
5. Testing
- Bi-monthly (every 2 months) restore tests for a representative customer database.
- File-level object-store restore drills are also executed bi-monthly.
- Test results are used to generate Linear tickets if any issues arise.
6. Roles
SRE Lead owns backup infrastructure; any devops personnel can perform restores (on-call engineers handle emergency restores).
7. Incident Handling
Backup failures generate notifications in a dedicated Slack channel monitored by SRE and email alerts via Datadog; investigated at the beginning of the next work day.
8. Disposal
Daily backups automatically overwritten on a rolling weekly basis; monthly backups overwritten after 30+ days when replaced by newer monthly snapshot. S3 lifecycle policies enforce single version retention (no old versions kept).
Version 1.2 — effective 2025-08-29