MongoDB Resync stale replica member

 

In a replica set if one of the members is in RECOVERING state and in the log file of that member if you see:

 

This means that the oplog size wasn’t enough to handle the time that the replica server was unreachable. A network problem, lack of disk space or a closed host may cause these type of problems .

Oplog size must be enough during problem solving process. If the problem isn’t solved during this time, the replica will come into RECOVERING state.

The only thing you can do here is resyncing the replica member from primary.

 

There are two ways:

1- Initial sync (with no initial data)

2- sync by copying data files from another member

 

In the first method steps are:

1- shutdown the server.

 

2- remove the old data file directory

3- create an empty data file directory with the rights.

4- start the mongod process

Status of the replica will become startup2 from recovering.

 

Second method:

During the syncronization process, primary may not answer the stale secondary and this causes restarting the syncronization process.

If the primary is answering other requests at the same time and syncing process is long, these interruptions may occur repeatedly. The syncronization never ends.

In secondary log:

 

In this situation second method is more effective.

You need to copy data files from an healthy member. Before copying you need to lock the healty secondary to prevent data files from changes.

1- In healthy secondary run:

 

2- shutdown the unhealthy mongo instance

 

2- cp /data –> unhealthy server

3- start the mongod instance

4- unlock the healthy secondary:

 

After copying data files replica member will syncronize itself from primary in a short time.

Leave a Reply