The latest version, Countly 24.10 has been released, and this migration guide will help you navigate the upgrade process smoothly. This document outlines the important changes, benefits, and the steps needed to ensure a smooth transition to the latest version.
Introduction to the New Data Model
The new data model introduces two key collections to simplify data management:
-
Aggregated Data Collection -
countly.events_data
Stores aggregated data for all events across all apps.
-
Detailed Event Data Collection -
countly_drill.drill_events
Contains detailed, raw event data from all apps.
This consolidation aims to improve performance, simplify data management, and provide better scalability.
Benefits
- Simplified Collection Management - Reduces the number of collections, easing index management and sharding.
- Improved Performance - Faster writes and reduced overhead from fewer collections.
- Enhanced Data Export/ETL - Easier and more efficient data export from a single collection.
- Increased Scalability - Removal of event key limits and better support for large datasets.
- Cross-Event Querying - Facilitates more flexible querying across different events.
- Future Proofing - Better preparation for new databases and technologies with a consolidated data structure.
How the New Model is Different from the Old Model
Old Model
- Structure - Utilized multiple collections for each app and event. This resulted in a complex schema with numerous collections.
- Indexing and Sharding - Managing indexes and sharding across many collections was complex and prone to errors.
- Data Export - Exporting data required handling each collection individually, making the process inefficient.
- Performance - The old MongoDB engine used collection-level locks, which restricted throughput and impacted performance.
New Model
-
Structure - Consolidates into two main collections:
countly.events_data
for aggregated data andcountly_drill.drill_events
for detailed event records. - Indexing and Sharding - Simplified with fewer collections, improving management and performance.
- Data Export - Streamlined by querying a single collection, making the export process more efficient.
- Performance - Enhanced by reducing the number of collections and leveraging improvements in the new MongoDB engine.
Migrating to the New Data Model
You can follow the steps below to migrate to the new data model.
-
Upgrade Countly - Update Countly to the latest version to start recording new data in both
countly.events_data
andcountly_drill.drill_events
collection. As a part of the upgrade, there is a script to copy current aggregated data tocountly.events_data
collection;
As data duplication, the new collection will take up the same amount of space as all current aggregated event collections.
The script can be run also later (after upgrade) but it will take 40 times longer for collections that already have newly recorded data. If this upgrade script is skipped, then data before the upgrade will not be visible in aggregated data sections for events.
If the script exists with one of the messages:
Script failed. Exiting. PLEASE RERUN SCRIPT TO MIGRATE ALL DATA. 'Script failed.
Exiting'
It is advised to share the output with Countly team to check for issues.
If in a final output there is a line like:
"Failed to merge collections: (NUMBER)”
It means some of the collections were not fully moved. It has to be checked for errors. The script can be run multiple times.
Upon successfully running the upgrade script, old events aggregated data collections can be cleared by running this script:
/bin/scripts/data-cleanup/remove_old_events_collections.js
-
Steps to reduce data loss in case of rollback - Aggregated data recorded while the server was switched to the new version will be lost. Before upgrading again, clear all
countly.events_data
collection. For all events collections run an update to user‘merged’ {“$unset”:{“merged”:””}}
- Data Migration Plan for Drill Collections:
-
Export Old Collections - Use
mongoexport
to export data from old collections. This process is estimated to take 28 hours for 2 billion data points (DP). -
Add New Fields - Add new fields (a:
app_id
and key:event
) to the exported data. This step is expected to take 34 hours for 2 billion data points. -
Import Data - Import the modified data into the
countly_drill.drill_events
collection. This import process is estimated to take 32 hours for 2 billion data points. - Verify Imported Data - Perform data verification to ensure the integrity and accuracy of the imported data.
- Drop Old Collections - Once data verification is complete, drop the old collections to free up resources.
- Automated Repeat for All Collections - Repeat the migration process for all collections. This process will be automated by scripts, handling each collection one by one.
-
Start Recording Drill Data - Upon upgrade all new data will be recorded in
countly_drill.drill_events
collection. - Dual Data Querying - Query both old and new collections during the transition period until all old data expires for drill collections. As querying both would slow down queries it is advised to toggle off dual querying once data is moved. It is configured by this setting:
For aggregated collections, we run an upgrade script that copies data and we delete old collections after (countly.events
). For drill (granular) no collection is moved.
- Delete Old Collections - Remove outdated collections once all data has been successfully migrated and verified.
Example
Old Data Model
For the old data model, each event key, were having two collections. Let’s take an example of a website and the event being a login. The operations performed are drill. The old model would look like the following:
App: My Website
- Event: Clicked button
- Event: Purchase made
App: My App
- Event: Login
Results in
countly.events5acc585c82ec317a7d00d06a39b9453697a3e84b
countly_drill.drill_events5acc585c82ec317a7d00d06a39b9453697a3e84b
countly.events5d1c6e6925889e294cff2b135d7b65d66a741688
countly_drill.drill_events5d1c6e6925889e294cff2b135d7b65d66a741688
countly.events54c2f21f3f8b98e22fc6afe8a3511caed7fc8240
countly_drill.drill_events54c2f21f3f8b98e22fc6afe8a3511caed7fc8240
This data model will query all of the collections mentioned above and it results in reduced latency. So, a new data model has been introduced.
New Data Model
In the new data model, if you use the same example of a website and logging in, it will create two collections for all events.
App: My Website
- Event: Clicked button
- Event: Purchase made
App: My App
- Event: Login
Results in
countly.events_data
countly_drill.drill_events
Now, as you can see above you have to query wo collections only, which will increase the latency and throughput.
Granular Document
{
"_id": "04d20438e44d212643fc9d0dcb25643e7f050f241626189010_S_1626185010000_1", "uid": "S", "did": "98039e11-a829-1331-e0bd-15ac946b919d", "lsid": "04d20438e44d212643fc9d0dcb25643e7f050f241626189010_S_1626189010000", "ts": 1626185010000, "cd": { "$date": "2021-07-16T15:38:24.145Z" }, "d": "2021:7:13", "w": "2021:w28", "m": "2021:m7", "h": "2021:7:13:h17", "s": 0, "dur": 0, "c": 1, "up": { ... }, "custom": { ... }, "sg": { ... } }
{
"_id": "04d20438e44d212643fc9d0dcb25643e7f050f241626189010_S_1626185010000_1", "a": "6423f48e7393a0ca3410f42d", "e": "Login", "uid": "S", "did": "98039e11-a829-1331-e0bd-15ac946b919d", "lsid": "04d20438e44d212643fc9d0dcb25643e7f050f241626189010_S_1626189010000", "ts": 1626185010000, "cd": { "$date": "2021-07-16T15:38:24.145Z" }, "d": "2021:7:13", "w": "2021:w28", "m": "2021:m7", "h": "2021:7:13:h17", "s": 0, "dur": 0, "c": 1, "up": { ... }, "custom": { ... }, "sg": { ... } }
Properties of the Data Model
The new properties that are added are mentioned below. To know more about the other properties, you can click here.
Property Name | Definition | Example Value |
"a" | It defines the application ID. | “6423f48e7393a0ca3410f42d” |
"e" | It defines the event. | "Login" |
FAQs
Can we read from old and new collections at the same time?
Yes, there is a change in code that allows to read from both old and new at the same time.