PG Tuning, Statistics and Indexes

Vaibhav Krishna August 14, 2018 PostgreSQL

Database Tuning

Fundamentally, database performance tuning is done for two basic reasons, to reduce response time and to reduce resource usage, both of which can apply for any given situation. Julian Stuhler looks at database performance tuning, and why it remains one of the most important topics for any DBA, developer or systems administrator.

Why Bother with Database Performance?

So, what’s the big deal with database performance? Fundamentally, database performance tuning is done for two basic reasons, and either or both can apply for any given situation:

Reduce response time. With today’s internet-enabled, customer facing applications, providing sub-second response times for OLTP applications has never been more critical. A few years ago, a skilled telephone customer support operative could hide longer response times for account queries by chatting with the
caller, but now that the customer has direct access to those same systems, any delays become much more obvious. Many studies have demonstrated the link between response time and customer satisfaction for online B2C applications, and when your nearest competitor is only a click away, keeping customers happy is
paramount. Although database performance is just one component of the overall end-to-end response time experienced by the customer, it is often the most critical and variable one.

Reduce resource usage. Running any sort of query against a database generates a load on the database server, and the less efficiently that query runs the more CPU and I/O it will require. These are finite resources for a given server configuration, so the transaction throughput can be limited at peak times
(potentially leading to a response time issue as described above). Tuning a system to reduce the CPU and/or I/O resources required can have a significant impact on the transaction throughput for a given server, potentially allowing expensive upgrades to be avoided or deferred. Reducing resource usage can be
even more important in a mainframe environment, where most customers are on some form of Monthly Licence Charge (MLC) model. With MLC, the amount a System z customer pays depends on the amount of processor resource consumed during the previous period, measured in MSUs (Million Service Units). So, the
monthly cost paid by most customers for their IBM software is directly related to the amount of work their CPs process: the higher the workload, the higher the monthly cost. Tuning the application to reduce CPU costs can generate immediate savings in software licence fee charges.

Database Statistics

The pg_stats_statements module is a great place to start. It simply tracks execution statistics of SQL statements and can be an easy way to find poor performing queries.Once you have this module installed, a system view named pg_stat_statements will be available with all sorts of goodness. Once it has had a chance to collect a good amount of data, look for queries that have relatively high total_time value. Focus on these statements first.

SELECT * FROM pg_stat_statements ORDER BY total_time DESC;

1	SELECT * FROM pg_stat_statements ORDER BY total_time DESC;

Auto_explain

The auto_explain module is also helpful for finding slow queries but has 2 distinct advantages: it logs the actual execution plan and supports logging nested
statements using the log_nested_statements option. Nested statements are those statements that are executed inside a function. If your application uses many functions, auto_explain is invaluable for getting detailed execution plans. The log_min_duration option controls which query execution plans are logged, based on how long they perform. For example, if you set this to 1000, all statements that run longer than 1 second will be logged.

Optimized Index

Index Tuning/Optimization

The Postgres Statistics Collector is a first class subsystem that collects all sorts of performance statistics that are useful.

Turning this collector on gives you tons of pg_stat_… views which contain all the goodness. In particular, I have found it to be particularly useful for finding
missing and unused indexes.

Missing Indexes

Missing indexes can be one of the easiest solutions to increasing query performance. However, they are not a silver bullet and should be used properly (more on that later). If you have The Statistics Collector turned on, you can run the following query.

SELECT relname, seq_scan - idx_scan AS too_much_seq, CASE WHEN  seq_scan - idx_scan > 0
THEN 'Missing Index?'
ELSE
'OK'
END,
pg_relation_size(relname::regclass) AS rel_size, seq_scan, idx_scan FROM pg_stat_all_tables
WHERE schemaname = 'public' AND pg_relation_size(relname::regclass) > 80000
ORDER BY too_much_seq DESC;

SELECT relname, seq_scan - idx_scan AS too_much_seq, CASE WHEN seq_scan - idx_scan > 0

THEN 'Missing Index?'

ELSE

'OK'

END,

pg_relation_size(relname::regclass) AS rel_size, seq_scan, idx_scan FROM pg_stat_all_tables

WHERE schemaname = 'public' AND pg_relation_size(relname::regclass) > 80000

ORDER BY too_much_seq DESC;

This finds tables that have had more Sequential Scans than Index Scans, a telltale sign that an index will usually help. This isn’t going to tell you which columns to create the index on so that will require a bit more work. However, knowing which table(s) need them is a good first step.

Unused Indexes

Index all the things right? Did you know having unused indexes can negatively affect write performance? The reason is, when you create an index, Postgres is burdened with the task of keeping this index updated after write (INSERT / UPDATE / DELETE) operations. So, adding an index is a balancing act because they can speed up reading of data (if created properly) but will slow down write operations. To find unused indexes you can run the following command.

SELECT indexrelid::regclass as index, relid::regclass as table, 'DROP INDEX ' || indexrelid::regclass || ';' as drop_statement
FROM pg_stat_user_indexes
JOIN pg_index USING (indexrelid)
WHERE idx_scan = 0 AND indisunique is false;

SELECT indexrelid::regclass as index, relid::regclass as table, 'DROP INDEX ' || indexrelid::regclass || ';' as drop_statement

FROM pg_stat_user_indexes

JOIN pg_index USING (indexrelid)

WHERE idx_scan = 0 AND indisunique is false;

Database Tutorials MSSQL, Oracle, PostgreSQL, MySQL, MariaDB, DB2, Sybase, Teradata, Big Data, NOSQL, MongoDB, Couchbase, Cassandra, Windows, Linux

PG Tuning, Statistics and Indexes

About Vaibhav Krishna

Leave a Reply Cancel reply

Categories