The challenges of soft delete
3 months ago
- #postgresql
- #database
- #soft-delete
- Soft delete implementations, like using a `deleted` boolean or `archived_at` timestamp, help recover accidentally deleted data but introduce complexity.
- Most archived data is never accessed, leading to bloated databases with dead rows, especially if retention policies are not enforced.
- Restoring from backups becomes slower with large amounts of archived data, complicating disaster recovery.
- Queries, indexes, and migrations become more complex due to the need to filter out archived data.
- Restoring archived records isn't always straightforward and may require validation against current business rules.
- Alternatives to `archived_at` include application-level event archiving, trigger-based archiving, and WAL-based change data capture (CDC).
- Application-level archiving simplifies the primary database but introduces risks of data loss and requires additional infrastructure.
- Trigger-based archiving moves deleted data to a separate table, keeping live tables clean and simplifying queries and migrations.
- WAL-based CDC (e.g., Debezium) captures all changes without application modifications but adds significant operational complexity.
- A novel idea is maintaining a PostgreSQL replica that ignores DELETE queries, though this may introduce schema migration challenges and higher costs.
- The trigger-based approach is recommended for new projects due to its simplicity and minimal infrastructure requirements.