PostgreSQL and Databases in general: Performance issues in PostgreSQL

PostgreSQL have supported Hash Index for a long time, but they are not much used in production mainly because they are not durable. Now, with the next version of PostgreSQL, they will be durable. The immediate question is how do they perform as compared to Btree indexes. There is a lot of work done in the coming version to make them faster. There are multiple ways in which we can compare the performance of Hash and Btree indexes, like the time taken for creation of the index, search or insertion in the index. This blog will mainly focus on the search operation. By definition, hash indexes are O(1) and Btree indexes are O(log n), however with duplicates that is not exactly true.

To start with let us see the impact of work being done to improve the performance of hash indexes. Below is the performance data of the pgbench read-only workload to compare the performance difference of Hash indexes between 9.6 and HEAD on IBM POWER-8 having 24 cores, 192 hardware threads, 492GB RAM.

The workload is such that all the data fits in shared buffers (scale factor is 300 (~4.5GB) and shared_buffers is 8GB). As we can see from the above graph, that the performance has increased at all client counts in the range of 7% to 81% and the impact is more pronounced at higher client counts. The main work which has led to this improvement is 6d46f478 (Improve hash index bucket split behavior.) and 293e24e5 (Cache hash index's metapage in rel->rd_amcache.).

The first commit 6d46f478 has changed the heavyweight locks (locks that are used for logical database objects to ensure the database ACID properties) to lightweight locks (locks to protect shared data structures) for scanning the bucket pages. In general, acquiring the heavyweight lock is costlier as compare to lightweight locks. In addition to reducing the locking cost, this also avoids locking out scans and inserts for the lifetime of the split.

The second commit 293e24e5 avoids a significant amount of contention for accessing metapage. Each search operation needs to access metapage to find the bucket that contains tuple being searched which leads to high contention around metapage. Each access to metapage needs to further access buffer manager. This work avoids that contention by caching the metapage information in backend local cache which helps bypassing all the buffer manager related work and hence the major contention in accessing the metapage.

The next graph shows how the hash index performs as compared to the btree index. In this run we have changed hash to btree index in pgbench read-only tests.

We can see here that the hash index performs better than the btree index and the performance difference is in the range of 10 to 22%. In some other workloads we have seen a better performance like with hash index on varchar columns and even in the community, it has been reported that there is performance improvement in the range of 40-60% when hash indexes are used for unique index columns.

The important thing to note about the above data is that it is only on some of the specific workloads and it mainly covers Selects as that is the main area where performance improvement work has been done for PostgreSQL10. The other interesting parameters to compare are the size of the index and update on the index which needs more study and experiments.

In the end, I would like to thank my colleagues who were directly involved in this work and my employer EnterpriseDB who has supported this work. Firstly I would like to thank, Robert Haas who has envisioned all this work and is the committer of this work, and Mithun C Y who was the author of commit 293e24e5. Also, I would like to extend sincere thanks to all the community members who are involved in this work and especially Jeff Janes and Jesper Pedersen who have reviewed and tested this work.

Currently when the PostgreSQL database becomes slow especially on systems with high load, it becomes difficult to find the exact reasons. Currently one can use tools like perf, strace, dynamic tracing (http://www.postgresql.org/docs/devel/static/dynamic-trace.html), etc. to find out the reasons of slowdown, but most of the times they are quite inconvenient to use which lead to the development of the new feature to display wait events information in pg_stat_activity view. Wait events are invented to capture the information of system blocks or waits to perform some action like waiting for another backend process to release the heavyweight or lightweight locks, waits to access data buffer when no other process can be examining the buffer, waits to read or write the data to disk, etc. As part of initial feature, we have covered some of the common wait event types due to which there are waits in system, however it is designed such that it can be extended to capture other types of wait events as well.

I will briefly explain the wait event types covered as part of this feature and then explain with examples, how one can use this feature to find stalls or waits in the system. First wait event type is lightweight lock which is used to protect a particular data structure in shared memory. Second wait event type is named lightweight lock tranche, this indicates that the server process is waiting for one of a group of related lightweight locks. Third wait event type is heavyweight lock which is used to primarily protect SQL-visible objects such as tables. Fourth type of wait event is BufferPin where the server process waits to access to a data buffer during a period when no other process can be examining that buffer. For detail explanation, refer PostgreSQL documentation at http://www.postgresql.org/docs/devel/static/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW

Now, let us try to understand with the help of simple examples, how to find waits in the system using this powerful tool.

Create table and insert data which will be used in below examples:

postgres=# create table wait_event_tbl(c1 int);

CREATE TABLE

postgres=# insert into wait_event_tbl values(1);

INSERT 0 1

wait event type - Lock (Heavyweight locks)

-------------------------------------------------

Scenario - 1

Let us try to examine the waits for a scenario where one of the session has acquired Access Exclusive Lock on a table and the other session wants to acquire Access Share Lock on the same table and is waiting for first session to complete it's transaction.

Session -1

postgres=# select pg_backend_pid();

pg_backend_pid

----------------

6088

(1 row)

postgres=# begin;

BEGIN

postgres=# Lock wait_event_tbl in Access Exclusive Mode;

LOCK TABLE

Session-2

postgres=# select pg_backend_pid();

pg_backend_pid

----------------

1152

(1 row)

postgres=# begin;

BEGIN

postgres=# Lock wait_event_tbl in Access Share Mode;

Session-3

postgres=# select pid, wait_event_type, wait_event from pg_stat_activity where wait_event is NOT NULL;

pid | wait_event_type | wait_event

------+-----------------+------------

1152 | Lock | relation

(1 row)

Here, via above statement, it is shown that session-2 is waiting for a Lock on a relation. To know more information about relation, one can add "query" column in the above statement.

Scenario - 2

Three sessions try to update the same row, first one will be successful and the other two will be waiting.

Session -1

postgres=# select pg_backend_pid();

pg_backend_pid

----------------

6088

(1 row)

postgres=# begin;

BEGIN

postgres=# update wait_event_tbl set c1 = 2 where c1=1;

UPDATE 1

Session - 2

postgres=# select pg_backend_pid();

pg_backend_pid

----------------

1152

(1 row)

postgres=# begin;

BEGIN

postgres=# update wait_event_tbl set c1 = 3 where c1 = 1;

Session - 3

postgres=# select pg_backend_pid();

pg_backend_pid

----------------

5404

(1 row)

postgres=# begin;

BEGIN

postgres=# update wait_event_tbl set c1 = 4 where c1 = 1;

Session - 4

postgres=# select pid, wait_event_type, wait_event from pg_stat_activity where wait_event is NOT NULL;

pid | wait_event_type | wait_event

------+-----------------+---------------

1152 | Lock | transactionid

5404 | Lock | tuple

(2 rows)

Here, above statement indicates that session-2 and session-3 are waiting.

To find detailed information about locks, you can join this table information with pg_locks as described in link:https://wiki.postgresql.org/wiki/Lock_Monitoring or some other similar way.

wait event type - LWLockName (Lightweight Locks)

----------------------------------------------------

One session trying to execute the update statement and other session is trying to execute select statement can block each other for short time.

Session - 1

postgres=# select pg_backend_pid();

pg_backend_pid

----------------

1152

(1 row)

postgres=# update wait_event_tbl set c1 = 2;

Session - 2

postgres=# select pg_backend_pid();

pg_backend_pid

----------------

6088

(1 row)

postgres=# select * from wait_event_tbl;

Session - 3

postgres=# select pid, wait_event_type, wait_event from pg_stat_activity where wait_event is NOT NULL;

pid | wait_event_type | wait_event

------+-----------------+---------------

1152 | LWLockNamed | ProcArrayLock

(1 row)

I have created this scenario with the help of debugger, but it is quite possible to see such wait events during high load on the system.

One point to note for users who are using "waiting" column of pg_stat_activity to find blocking statements is that they need to change their queries for next version (presumably 9.6) of PostgreSQL as waiting column is removed from pg_stat_activity. This is an intentional decision taken by PostgreSQL community for the ease of use and or understanding of this feature especially for future versions.

This feature has been committed in PostgreSQL code. For details, you can refer commit id - 53be0b1add7064ca5db3cd884302dfc3268d884e. It took us approximately 9 months to complete this feature. Thanks to all the PostgreSQL community members who have given their valuable feedback throughout the development of this feature and special thanks to Robert Haas and Ildus Kurbangaliev for giving tremendous support to me both by reviews and by helping in writing parts of code. Also Thanks to Alexander Korotkov for review and inputs for this feature and last but not least Thanks to Thom Brown for inputs in documentation of this feature.

PostgreSQL and Databases in general

Friday 17 March 2017

Hash indexes are faster than Btree indexes?

Friday 11 March 2016

Troubleshooting waits in PostgreSQL