Cleanup scripts for Postgres and MongoDB databases
Over time, Capacity Private Cloud accumulates session, interaction, and associated audio data across two backing stores: a PostgreSQL database used by the archive service, and a MongoDB database used by binary storage. This article provides the SQL and JavaScript routines required to safely remove aged records from both systems on a recurring schedule, keeping storage footprint and query performance under control.
The cleanup runs in three logical phases:
- Extract the MongoDB object references for the records you intend to remove from PostgreSQL into a working file (
mongoIds.txt). - Delete the aged records from PostgreSQL.
- Delete the corresponding binary entries from MongoDB using the extracted reference list.
All examples below use a 30-day retention window. Adjust the INTERVAL '30 days' clause to match your organization's data retention policy.
Prerequisites
Two command-line tools are used throughout this guide:
psql— the standard PostgreSQL client, used to query and modify the archive service database.mongosh— the official MongoDB Shell, a JavaScript and Node.js environment for interacting with a MongoDB deployment whether hosted locally, on Atlas, or on another remote host.
Depending on your environment, these tools can be installed on a workstation or executed from within the existing database containers. If you need to install them locally, follow the official setup guides for psql and mongosh.
Version requirement: The scripts in this article apply only to Capacity Private Cloud version 4.4 and later.
Throughout the examples, <postgres_connection_string> and <mongo_connection_string> are placeholders. A PostgreSQL connection string follows this form:
postgres://<username>:<password>@<host>:<port>/<db_name>
Removing Old Sessions, Interactions, and Audio Data
This is the primary cleanup workflow. It removes sessions whose end_date falls outside the retention window, along with the audio binaries they reference in MongoDB.
Step 1 — Extract MongoDB References for Aged Audio
Run the following query to collect the MongoDB storage references for audio records older than 30 days, writing them to mongoIds.txt:
psql <postgres_connection_string> -t -A -c "SELECT sa.storage_ref FROM storage_audio sa INNER JOIN session s ON s.id=sa.session_id WHERE s.end_date < NOW() - INTERVAL '30 days'" >> mongoIds.txt
Step 2 — Delete Aged Session Records from PostgreSQL
Remove the corresponding session entries from PostgreSQL:
psql <postgres_connection_string> -c "DELETE FROM session WHERE end_date < NOW() - INTERVAL '30 days'"
Removing Old Grammar Data (Optional)
Grammar caching means the volume of archived grammar records is typically far smaller than the volume of sessions and interactions. Cleaning grammars is therefore optional, but recommended over long retention periods.
Step 1 — Extract Grammar References
psql <postgres_connection_string> -t -A -c "SELECT storage_ref FROM storage_grammar WHERE created_date < NOW() - INTERVAL '30 days'" >> mongoIds.txt
Step 2 — Delete Aged Grammar Records from PostgreSQL
psql <postgres_connection_string> -c "DELETE FROM storage_grammar WHERE created_date < NOW() - INTERVAL '30 days'"
Removing Old SSML Data (Optional)
SSML records, like grammars, benefit from caching and grow more slowly than session data. This cleanup is optional.
Step 1 — Extract SSML References
psql <postgres_connection_string> -t -A -c "SELECT storage_ref FROM storage_ssml WHERE created_date < NOW() - INTERVAL '30 days'" >> mongoIds.txt
Step 2 — Delete Aged SSML Records from PostgreSQL
psql <postgres_connection_string> -c "DELETE FROM storage_ssml WHERE created_date < NOW() - INTERVAL '30 days'"
Removing Old Phrases Data (Optional)
Phrase records are removed directly from PostgreSQL — they do not require a corresponding MongoDB cleanup.
psql <postgres_connection_string> -c "DELETE FROM storage_phrases WHERE created_date < NOW() - INTERVAL '30 days'"
Removing the MongoDB Binary Entries
Once the PostgreSQL deletions are complete, mongoIds.txt contains the list of MongoDB references that need to be removed from the BinaryStorage collection. These references must first be converted to MongoDB's UUID binary representation before they can be used in a delete query.
Step 1 — Create the Cleanup Script
Create a file named cleanMongoDB.js with the following content:
// UUID conversion helper functions
// https://github.com/mongodb/mongo-csharp-driver/blob/master/uuidhelpers.js
function CSUUID(uuid) {
var hex = uuid.replace(/[{}-]/g, ""); // remove extra characters
var a = hex.substr(6, 2) + hex.substr(4, 2) + hex.substr(2, 2) + hex.substr(0, 2);
var b = hex.substr(10, 2) + hex.substr(8, 2);
var c = hex.substr(14, 2) + hex.substr(12, 2);
var d = hex.substr(16, 16);
hex = a + b + c + d;
var base64 = HexToBase64(hex);
return new BinData(3, base64);
}
function HexToBase64(hex) {
var base64Digits = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
var base64 = "";
var group;
for (var i = 0; i < 30; i += 6) {
group = parseInt(hex.substr(i, 6), 16);
base64 += base64Digits[(group >> 18) & 0x3f];
base64 += base64Digits[(group >> 12) & 0x3f];
base64 += base64Digits[(group >> 6) & 0x3f];
base64 += base64Digits[group & 0x3f];
} group = parseInt(hex.substr(30, 2), 16);
base64 += base64Digits[(group >> 2) & 0x3f];
base64 += base64Digits[(group << 4) & 0x3f];
base64 += "==";
return base64;
}
function main() {
const idsFilePath = 'mongoIds.txt';
// split file content into array of string guids
const ids=fs.readFileSync(idsFilePath).toString().split('\n');
console.log(db.BinaryStorage.deleteMany({_id: { $in: ids.map(CSUUID)}}));
}
main();Step 2 — Execute the Script Against MongoDB
mongosh <mongo_connection_string> cleanMongoDB.js
On successful completion, the script prints a result document similar to the following, where deletedCount reflects the number of MongoDB entries that were removed:
{ acknowledged: true, deletedCount: 0 }
