In a replica set if one of the members is in RECOVERING state and in the log file of that member if you see:
1 2 3 4 5 | 2018-02-14T16:02:50.410+0300 E REPL [rsBackgroundSync] too stale to catch up -- entering maintenance mode 2018-02-14T16:02:50.410+0300 I REPL [rsBackgroundSync] our last optime : (term: 32, timestamp: Jan 12 22:42:48:1c51) 2018-02-14T16:02:50.410+0300 I REPL [rsBackgroundSync] oldest available is (term: 38, timestamp: Jan 29 13:53:34:6f) |
This means that the oplog size wasn’t enough to handle the time that the replica server was unreachable. A network problem, lack of disk space or a closed host may cause these type of problems .
Oplog size must be enough during problem solving process. If the problem isn’t solved during this time, the replica will come into RECOVERING state.
The only thing you can do here is resyncing the replica member from primary.
There are two ways:
1- Initial sync (with no initial data)
2- sync by copying data files from another member
In the first method steps are:
1- shutdown the server.
1 2 3 | use admin db.shutdownServer() |
2- remove the old data file directory
3- create an empty data file directory with the rights.
4- start the mongod process
Status of the replica will become startup2 from recovering.
Second method:
During the syncronization process, primary may not answer the stale secondary and this causes restarting the syncronization process.
If the primary is answering other requests at the same time and syncing process is long, these interruptions may occur repeatedly. The syncronization never ends.
In secondary log:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | 2018-08-28T05:17:05.817+0300 I INDEX [rsSync] build index done. scanned 138338099 total records. 5629 secs 2018-08-28T05:17:05.827+0300 I STORAGE [rsSync] copying indexes for: { name: "Announcement", options: {} } 2018-08-28T05:17:05.827+0300 I NETWORK [rsSync] Socket say send() errno:32 Broken pipe xxxxxxxxxxxxxxxxxxxx 2018-08-28T05:17:05.889+0300 E REPL [rsSync] 9001 socket exception [SEND_ERROR] server xxxxxxxxxxxxxxxxxxxxxxx 2018-08-28T05:17:05.889+0300 E REPL [rsSync] initial sync attempt failed, 9 attempts remaining 2018-08-28T05:17:10.889+0300 I REPL [rsSync] initial sync pending 2018-08-28T05:17:12.709+0300 I REPL [ReplicationExecutor] syncing from: primaryserver 2018-08-28T05:17:12.747+0300 I REPL [rsSync] initial sync drop all databases 2018-08-28T05:17:12.747+0300 I STORAGE [rsSync] dropAllDatabasesExceptLocal 4 2018-08-28T05:17:16.312+0300 I REPL [rsSync] initial sync clone all databases 2018-08-28T05:17:16.364+0300 I REPL [rsSync] fetching and creating collections for admin 2018-08-28T05:17:16.374+0300 I REPL [rsSync] fetching and creating collections for db1 2018-08-28T05:17:16.519+0300 I REPL [rsSync] fetching and creating collections for db2 2018-08-28T05:17:16.532+0300 I REPL [rsSync] initial sync cloning db: admin |
In this situation second method is more effective.
You need to copy data files from an healthy member. Before copying you need to lock the healty secondary to prevent data files from changes.
1- In healthy secondary run:
1 | db.fsyncLock() |
2- shutdown the unhealthy mongo instance
1 2 3 | use admin db.shutdownServer() |
2- cp /data
–> unhealthy server
3- start the mongod instance
4- unlock the healthy secondary:
1 | db.fsyncUnlock() |
After copying data files replica member will syncronize itself from primary in a short time.