Respectlytics Respect lytics
Menu
Data Retention Data Minimization Storage Cost

How Long Should You Store
Analytics Data?

9 min read

Short answer: 13–25 months of granular events covers almost every product use case (year-over-year comparisons + a buffer). Aggregated rollups can live longer because they carry less risk. Anything past two years of raw events is usually keeping cost and exposure you do not need.

⚖️ What Changes With Long Retention

Long retention is not free. Every additional month carries four costs:

CostWhy it matters
StorageLinear with time. A high-volume mobile app can produce hundreds of GB per year of raw events.
Query latencyLarger tables mean slower dashboards. Indices help; deletion helps more.
Disclosure surfaceMore history means more data subject to a deletion request, subpoena, or breach.
Re-identification riskBehavioral histories are themselves identifiers; longer windows make re-identification easier (why anonymization is not enough).

🎯 Match the Window to the Use Case

Different questions need different windows. Pick the longest window your most-frequent question requires, not the longest window any question could ever require.

Use caseWindow needed
Active funnel debugging7–30 days of granular events
Quarterly product review90–120 days
Year-over-year comparison13 months minimum, 14–18 with buffer
Multi-cycle seasonality25 months
Long-horizon trend lines3–5 years (use aggregates, not raw events)

🧱 A Two-Tier Retention Pattern

A pragmatic pattern for product teams: keep raw events short, keep aggregates long.

Tier 1: Raw events

  • Window: 13–25 months.
  • Used for: ad-hoc queries, funnel debugging, conversion analysis.
  • Deletion: daily job that removes rows older than the window.

Tier 2: Daily / weekly aggregates

  • Window: 3–5 years.
  • Used for: long-horizon dashboards, board reports, retention lines.
  • Risk profile: aggregates over hundreds of sessions per day are harder to re-identify and cheaper to store.

🤖 Automate the Deletion

Manual deletion is a policy that does not exist. Schedule a job, log every run, alert when it fails:

# Pseudo-SQL daily retention job
DELETE FROM events
 WHERE timestamp < NOW() - INTERVAL '13 months';
-- Log: rows_deleted, run_started, run_finished
-- Alert if: rows_deleted == 0 for 7 consecutive days
--           OR run_started missing for 24h

💡 Test the deletion job

Once a quarter, plant a sentinel event with a known timestamp older than the window. Confirm it disappears in the next run. If you cannot produce evidence the deletion runs, regulators and auditors will treat the policy as non-existent.

📝 Documenting Your Policy

A documented retention policy is a one-page artifact. Most teams over-engineer this. Keep it boring:

Minimum sections

  1. Scope. Which datasets are covered (events, aggregates, logs).
  2. Window. The retention period for each tier and the rationale.
  3. Deletion mechanism. Where the job runs, who owns it, how it is monitored.
  4. Exceptions. Legal hold and incident-response carve-outs, with approver names.
  5. Review cadence. Annual policy review and the last review date.

⚠️ Common Mistakes

"Keep everything forever, just in case"

The just-in-case data ends up answering questions you did not need to answer, while paying ongoing cost and risk. If you cannot name a recurring question that needs the older data, delete it.

Only deleting from the primary table

If your pipeline writes to a warehouse, S3 lake, BI cache, and backup, the policy applies to all of them. The hardest one is usually backups — choose retention windows that align with your backup rotation.

Tying retention to user identity

If your retention strategy is "delete a user when they ask," you are still operating a tracking pipeline. A time-based window is a stronger guarantee than a per-user one. See why over-deletion is better.

Soft deletes that are not deletes

A deleted_at column is a flag, not a deletion. The data is still there. If your policy says "deleted," the row must be gone.

💡 How retention works in Respectlytics

Respectlytics stores five fields per event: event_name, session_id, timestamp, platform, country. None of them is a personal identifier.

Retention is automated server-side. The Data Deletion Guide covers the granular delete API for ad-hoc requests; the time-based job handles the rest.

Frequently Asked Questions

How long should you keep analytics data?

13–25 months of granular event data is enough for almost every product use case. Keep aggregated rollups longer if you need historical trend lines. Past two years of raw events you are usually paying for risk you do not need.

What is the default retention period for mobile analytics?

There is no universal default. Common choices: 14 months (annual cycle + buffer), 25 months (two annual cycles), 36 months for seasonality concerns. Pick a window, document the rationale, automate the deletion.

Should I keep raw events or aggregates?

Both, with different windows. Raw events for 13 months. Pre-aggregated daily metrics for 3–5 years.

How do I delete analytics data automatically?

Schedule a daily job that deletes events older than the retention window. Make it idempotent, log row counts, alert if it does not run.

Does Respectlytics have a configurable retention period?

Yes. Older events are deleted automatically. Because only five non-identifying fields are stored per event, even maximum retention contains no personal identifiers — but a shorter window still reduces cost and exposure.

Legal Disclaimer: This information is provided for educational purposes and does not constitute legal advice. Retention requirements vary by jurisdiction, industry, and contract. Consult your legal team to determine the requirements that apply to your specific situation.

Related Reading

Less data, less retention, less risk.

Five fields per event. Automated retention. Granular deletion when you need it.