Migration Guide for version 24.10

Follow

The latest version, Countly 24.10 has been released, and this migration guide will help you navigate the upgrade process smoothly. This document outlines the important changes, benefits, and the steps needed to ensure a smooth transition to the latest version. 

Introduction to the New Data Model

The new data model introduces two key collections to simplify data management:

  1. Aggregated Data Collection - countly.events_data

Stores aggregated data for all events across all apps.

  1. Detailed Event Data Collection - countly_drill.drill_events

Contains detailed, raw event data from all apps.

This consolidation aims to improve performance, simplify data management, and provide better scalability.

Benefits

  1. Simplified Collection Management - Reduces the number of collections, easing index management and sharding.
  2. Improved Performance - Faster writes and reduced overhead from fewer collections.
  3. Enhanced Data Export/ETL - Easier and more efficient data export from a single collection.
  4. Increased Scalability - Removal of event key limits and better support for large datasets.
  5. Cross-Event Querying - Facilitates more flexible querying across different events.
  6. Future Proofing - Better preparation for new databases and technologies with a consolidated data structure.

How the New Model is Different from the Old Model

Old Model

  1. Structure - Utilized multiple collections for each app and event. This resulted in a complex schema with numerous collections.
  2. Indexing and Sharding - Managing indexes and sharding across many collections was complex and prone to errors.
  3. Data Export - Exporting data required handling each collection individually, making the process inefficient.
  4. Performance - The old MongoDB engine used collection-level locks, which restricted throughput and impacted performance.

New Model

  1. Structure - Consolidates into two main collections: countly.events_data for aggregated data and countly_drill.drill_events for detailed event records.
  2. Indexing and Sharding - Simplified with fewer collections, improving management and performance.
  3. Data Export - Streamlined by querying a single collection, making the export process more efficient.
  4. Performance - Enhanced by reducing the number of collections and leveraging improvements in the new MongoDB engine.

Migrating to the New Data Model

You can follow the steps below to migrate to the new data model.

  1. Upgrade Countly - Update Countly to the latest version to start recording new data in both countly.events_data and countly_drill.drill_events collection. As a part of the upgrade, there is a script to copy current aggregated data to countly.events_data collection;

As data duplication, the new collection will take up the same amount of space as all current aggregated event collections.

The script can be run also later (after upgrade) but it will take 40 times longer for collections that already have newly recorded data. If this upgrade script is skipped, then data before the upgrade will not be visible in aggregated data sections for events.

If the script exists with one of the messages:

Script failed. Exiting. PLEASE RERUN SCRIPT TO MIGRATE ALL DATA. 'Script failed. Exiting'

It is advised to share the output with Countly team to check for issues.

If in a final output there is a line like:

"Failed to merge collections: (NUMBER)”

It means some of the collections were not fully moved. It has to be checked for errors. The script can be run multiple times.

Upon successfully running the upgrade script, old events aggregated data collections can be cleared by running this script:

/bin/scripts/data-cleanup/remove_old_events_collections.js
  1. Steps to reduce data loss in case of rollback - Aggregated data recorded while the server was switched to the new version will be lost. Before upgrading again, clear all countly.events_data collection. For all events collections run an update to user ‘merged’ {“$unset”:{“merged”:””}}
  2. Data Migration Plan for Drill Collections:
  • Export Old Collections - Use mongoexport to export data from old collections. This process is estimated to take 28 hours for 2 billion data points (DP).
  • Add New Fields - Add new fields (a:app_id and key:event) to the exported data. This step is expected to take 34 hours for 2 billion data points.
  • Import Data - Import the modified data into the countly_drill.drill_events collection. This import process is estimated to take 32 hours for 2 billion data points.
  • Verify Imported Data - Perform data verification to ensure the integrity and accuracy of the imported data.
  • Drop Old Collections - Once data verification is complete, drop the old collections to free up resources.
  • Automated Repeat for All Collections - Repeat the migration process for all collections. This process will be automated by scripts, handling each collection one by one.
  1. Start Recording Drill Data - Upon upgrade all new data will be recorded in countly_drill.drill_events collection.
  2. Dual Data Querying - Query both old and new collections during the transition period until all old data expires for drill collections. As querying both would slow down queries it is advised to toggle off dual querying once data is moved. It is configured by this setting:

For aggregated collections, we run an upgrade script that copies data and we delete old collections after (countly.events). For drill (granular) no collection is moved.

  1. Delete Old Collections - Remove outdated collections once all data has been successfully migrated and verified.

Example

Old Data Model

For the old data model, each event key, were having two collections. Let’s take an example of a website and the event being a login. The operations performed are drill. The old model would look like the following:

App: My Website

  • Event: Clicked button
  • Event: Purchase made

App: My App

  • Event: Login

Results in

  • countly.events5acc585c82ec317a7d00d06a39b9453697a3e84b
  • countly_drill.drill_events5acc585c82ec317a7d00d06a39b9453697a3e84b
  • countly.events5d1c6e6925889e294cff2b135d7b65d66a741688
  • countly_drill.drill_events5d1c6e6925889e294cff2b135d7b65d66a741688
  • countly.events54c2f21f3f8b98e22fc6afe8a3511caed7fc8240
  • countly_drill.drill_events54c2f21f3f8b98e22fc6afe8a3511caed7fc8240

This data model will query all of the collections mentioned above and it results in reduced latency. So, a new data model has been introduced.

New Data Model

In the new data model, if you use the same example of a website and logging in, it will create two collections for all events.

App: My Website

  • Event: Clicked button
  • Event: Purchase made

App: My App

  • Event: Login

Results in

  • countly.events_data
  • countly_drill.drill_events

Now, as you can see above you have to query wo collections only, which will increase the latency and throughput.

Granular Document

Old Data Model New Data Model
{
"_id": "04d20438e44d212643fc9d0dcb25643e7f050f241626189010_S_1626185010000_1", "uid": "S", "did": "98039e11-a829-1331-e0bd-15ac946b919d", "lsid": "04d20438e44d212643fc9d0dcb25643e7f050f241626189010_S_1626189010000", "ts": 1626185010000, "cd": { "$date": "2021-07-16T15:38:24.145Z" }, "d": "2021:7:13", "w": "2021:w28", "m": "2021:m7", "h": "2021:7:13:h17", "s": 0, "dur": 0, "c": 1, "up": { ... }, "custom": { ... }, "sg": { ... } }

Properties of the Data Model

The new properties that are added are mentioned below. To know more about the other properties, you can click here.

Property Name Definition Example Value
"a" It defines the application ID. “6423f48e7393a0ca3410f42d”
"e" It defines the event. "Login"

 

FAQs

Can we read from old and new collections at the same time?

Yes, there is a change in code that allows to read from both old and new at the same time.

Looking for help?