Yesterday in one of our mongodb databases developer asked for unshard a sharded collection.
Yes it is possible to unshard a sharded collection but it is not a cute process 🙂   As the data has propagated between 3 servers, at first we need to take all data into primary server.
In addition to this, this process needs down time.
Before moving data to the primary shard we need to shutdown all shards and all mongos except one.
After finishing the changes at that mongos instance, wee need to restart the mongos, too.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | database = 'db' collection = database + '.fs.chunks'  sh.stopBalancer() use config primary = db.databases.findOne({_id: database}).primary  // move all chunks to primary db.chunks.find({ns: collection, shard: {$ne: primary}}).forEach(function(chunk){print('moving chunk from', chunk.shard, 'to', primary, '::', tojson(chunk.min), '-->', tojson(chunk.max));sh.moveChunk(collection, chunk.min, primary);});  // unshard db.collections.remove({ "_id" : "test.fs.chunks" }) db.chunks.remove({ ns : collection }) |
Here we can start shards and other mongos.
1 2 3 4 5 | // flush all mongos use admin db.runCommand({ flushRouterConfig: 1 }) |
As this process has many manual steps it is a little challenging process.
The size of the collection to make unsharded is not big so we used another method.
First we create a copy collection by insert.
1 2 3 4 5 6 7 8 9 | db.Collection.find({}).forEach(Â Â Â Â Â Â Â function(myDoc) {Â Â Â Â Â Â Â Â Â db.CollectionCopy.insert(myDoc); Â Â Â Â Â Â Â Â }Â ); |
After the insert process finishes, we dropped the collection named ‘Collection’.
While Dropping a sharded collection, it is important to validate the dropping process is succeeded in all the shards and in config database.
Dropping a Collection in a Sharded Cluster
1- Drop the collection using a mongos
1 2 3 | use db db.Collection.drop() |
2- Connect to each shard’s primary and verify the namespace has been dropped. If it has not, please drop it.
3- Connect to a mongos, switch to the config database and remove any reference to the removed namespace from the collections chunks, locks collections:
1 2 3 4 5 6 7 | use config db.collections.remove( { _id: "db.Collection" } ) db.chunks.remove( { ns: "db.Collection" } ) db.locks.remove( { _id: "db.Collection" } ) |
Connect to each mongos and run flushRouterConfig
After dropping is succeeded, we can change the name of the copy collection:
1 | db.Collectioncopy.renameCollection("Collection") |
As a note : rename collection is not supported in sharded collections.
After rename we can create new index on this unsharded collection.
As you know we need a little down time in this process too, because no new data must be come to existing collection and during drop and rename operations the appliacation can not find the collection. So during these operations we stopped the application.
It is a challenging operation if your collection size is big so it is always important to decide sharding status and the right shard key before your data grows.
What is the reason for Unshard here?
Can you put some green zone timing needed for this table such 100GB size?
I am not a MongoDB expert, but wonder does MongoDB is alike MS-SQL has multiple databases in a server?
Is there a design issue if a view references multiple DB and shard keys are not properly set due to multiple DB design?