Online Archive - Automatically defragment data file and release disk space after archiving
This is a request for Atlas to automatically defragment the data file and for the WiredTiger storage engine to automatically release empty space to the operating system, for clusters with active Online Archive.
After implementing Online Archive, and seeing data be archived to S3 based on our archiving rule, we expected the available cluster storage to increase since Online Archive had effectively deleted data from the cluster.
However, MongoDB Docs states: "When you archive data, Atlas first copies the data to the cloud object storage and then deletes the data from your Atlas cluster. WiredTiger does not release the storage blocks of the deleted data back to the OS for performance reasons. However, Atlas eventually automatically reuses these storage blocks for new data. This helps the Atlas cluster to avoid fragmentation." https://docs.atlas.mongodb.com/online-archive/manage-online-archive/
We were advised to use the compact command if we wanted to immediately release disk space, but the compact command doesn't always behave as expected and we prefer to avoid manual steps.
Having Atlas automatically handle the data file defragmentation and disk space release would help us achieve the expected benefits of Online Archive: control data growth, reduce data size on cluster and potentially reduce costs.
-
Helmut commented
This is a big cost topic!
We pay money for nothing to cloud providers.
Basically the current way of releasing storage in order reduce costs is highly manual and therefore not efficient.Current process:
1. Identify releasable storage (we build now something on our own)
2. Run the compact on secondaries first and then finally switch primary.Btw, the same happens on TTL indexes when they are first applied.
Why not...
* ...provide alerts if releasable storage is above a certain threshold?
* ...provide functionality to compact
* ... reduce the storages accordinglyPartly it might be possible via AppServices.
-
Yuri commented
Similar happens when we shard a large collection and disk space is reported to be used even though colleciton is now split across shards.