- It is a good idea to save the database server’s log output somewhere, rather than just routing it to /dev/null. The log output is invaluable when it comes time to diagnose problems. However, the log output tends to be voluminous (especially at higher debug levels) and you won’t want to save it indefinitely. You need to “rotate” the log files so that new log files are started and old ones removed after a reasonable period of time.
- If you simply direct the stderr of postgres into a file, you will have log output, but the only way to truncate the log file is to stop and restart the server. This might be OK if you are using PostgreSQL in a development environment, but few production servers would find this behavior acceptable.
DB Corruption
- Causes
- Underlying storage failure.
- Bad disk, bad controller.
- Garbage writes during power loss.
- Battery backup that didn’t.
- Bad RAM.
- Hardware Failures
- PostgreSQL bugs.
- 9.x had a series of unfortunate replication bugs.
- Used to be extremely rare.
- With luck, will become extremely rare again.
- Operator error.
- Backups that do not include critical files.
- Backups that do not follow protocol.
- Backups that forget external table spaces
- Bungled attempts at problem recovery.
- Delete the wrong files to free space
- Prevention
- Buy good hardware, demand your cloud provider do so, or have multi-tier redundancy.
- Make backups, and test them.
- Stay up on PostgreSQL releases, and read the release notes.
- DR/HA to be in place with cluster setup
- Proper backup strategy to be in place
- Pro-Active alerting may include third party tools to avoid any untoward incident