๐ท๏ธ Data Classification and Sensitive Data Lifecycle
Intro: Classification only matters if it changes engineering behavior. This page treats data classification as a design input for storage, logging, export, retention, and deletion.
What this page includes
- a practical classification model for product teams
- where sensitive data spreads beyond the primary datastore
- retention and deletion questions to force early
- how classification affects architecture reviews
Practical classes
A useful minimal model is:
- public or low sensitivity;
- internal operational data;
- customer confidential data;
- regulated or highly sensitive data.
Propagation map
Sensitive data rarely stays where the feature owner thinks it stays. Review all of these:
- primary datastore;
- caches and search indexes;
- logs and traces;
- analytics pipelines;
- backups and snapshots;
- support exports and ad hoc reports;
- message queues and dead-letter stores.
Design questions
- Does the product truly need to store this field, or only verify it briefly?
- Which derived datasets must also be deleted later?
- Can support personnel see raw values, or only masked forms?
- What happens to the data in exports and incident evidence bundles?
Retention rules that help
- tie retention to a business purpose, not convenience;
- define deletion owners for primary and derived systems;
- document exceptions such as legal hold or fraud review;
- prefer irreversible transformations for fields that do not need raw retrieval.
Common anti-patterns
- logs retaining more than the product database;
- backups with no practical restore-scope controls;
- analytics copies surviving long after user deletion;
- โtemporaryโ CSV exports that become shadow systems.
Related pages
Author attribution: Ivan Piskunov, 2026 - Educational and defensive-engineering use.