At least in my experience, what typically happens is that some query which a user believes should have used parallel query does not do so. The user then tries to fix the problem by setting force_parallel_mode=on. In reality, this does not solve the problem at all. It only makes things worse. However, it sometimes gives users the impression that they have solved the problem, which may be why people keep doing it (or maybe we need better documentation -- suggestions welcome). Let's taken an example.
rhaas=# explain select * from pgbench_accounts;
QUERY PLAN
----------------------------------------------------------------------------
Seq Scan on pgbench_accounts (cost=0.00..105574.00 rows=4000000 width=97)
(1 row)
rhaas=# set force_parallel_mode = on;
SET
rhaas=# explain select * from pgbench_accounts;
QUERY PLAN
----------------------------------------------------------------------------------
Gather (cost=1000.00..506574.00 rows=4000000 width=97)
Workers Planned: 1
Single Copy: true
-> Seq Scan on pgbench_accounts (cost=0.00..105574.00 rows=4000000 width=97)
(4 rows)
There's a very good reason why the query planner doesn't try to use parallel query here: it won't make the query run faster. Parallel query speeds up queries where the amount of work that must be performed is large compared to the number of output tuples. Here, because there are no joins, no filter conditions, and no aggregates, there is very little work to do per output tuple: the system just needs to fetch each one and send it to the user. Adding parallel workers won't make that any faster. A single process is already very capable of sending tuples to the user as fast as the user can read them. And, even if it isn't, adding more workers won't help. When you do, instead of the main process needing to fetch each tuple from the table, it will need to fetch each tuple from a worker. It's not really doing any less work. And the workers are now using up memory and CPU time, possibly competing with the leader. So adding parallel workers to this plan will actually make it slower.
This is a pretty clear example of a case where you just shouldn't use (or want to use) parallel query, but there are other examples that are less clear-cut. If you encounter one of those cases, you might very reasonably want to convince the planner to try out a parallel plan. After all, the planner can be wrong. Maybe the plan which the planner thinks is cheapest does not use parallel query, but the plan that actually runs the fastest does use parallel query. It can be very valuable to adjust planner settings to try out alternative plans in the situation, but setting force_parallel_mode is not the right way to do it, for a couple of reasons.
First, setting force_parallel_mode will only ever give you 1 parallel worker. Second, setting force_parallel_mode creates a plan where the leader just sits around reading tuples from the worker while the worker does all of the real work. In a regular parallel query plan, all of the processes cooperates, so the leader may help to run the parallel portion of the plan if it's not too busy reading tuples, and the workers also run the parallel portion of the plan. In a plan created by setting force_parallel_mode, this doesn't happen. There's actually a clue right in the plan itself. It says "Single Copy: true". What that means is that only a single copy of the plan beneath the Gather is permitted to be executed. Therefore, the single worker is going to run the plan, and the leader is going to sit around, wait for tuples to show up, and then relay them to the client. This is slower than just having the leader do all the work itself, which is what would have happened if you had not specified force_parallel_mode.
In short, setting force_parallel_mode is useless for the purpose for which most people seem to be trying to use it. It will not make your queries run faster; it will slow them down. The correct way to test out plans that use parallelism more aggressively is to reduce the planner's estimate of how much parallelism costs. By default, parallel_setup_cost=1000 and parallel_tuple_cost=0.1. If you want more of your query plans to run in parallel, reduce these values. Let's see what happens if we try that:
rhaas=# set parallel_setup_cost = 10;
SET
rhaas=# set parallel_tuple_cost = 0.001;
SET
rhaas=# explain select * from pgbench_accounts;
QUERY PLAN
------------------------------------------------------------------------------------------
Gather (cost=10.00..86250.67 rows=4000000 width=97)
Workers Planned: 2
-> Parallel Seq Scan on pgbench_accounts (cost=0.00..82240.67 rows=1666667 width=97)
(3 rows)
In fact, because this query is unsuitable for parallel query, neither approach speeds it up. On my laptop, the first plan (non-parallel) takes 3.26 seconds, the second plan (force_parallel_mode) takes 3.49 seconds, and the third plan (real parallelism) takes 3.55 seconds. Therefore the best option here happens to be what the planner chose originally: no parallelism. Of course, results may vary in other cases, but the thing to remember is that real parallelism is what you want, and force_parallel_mode will not give it to you. If you cannot get real parallelism even after reducing parallel_setup_cost and parallel_tuple_cost, you can also try adjusting min_parallel_table_scan_size and min_parallel_index_scan_size (on 9.6, there is only one setting, and it is called min_parallel_relation_size). If you still can't get parallelism, it might be disabled, or it might not be supported for your query type. See When Can Parallel Query Be Used? in the PostgreSQL documentation for more details.
So, if force_parallel_mode doesn't make queries run faster, what does it do, and when would anyone ever want to use it? I designed force_parallel_mode not as a tool for speeding up queries, but as a tool for finding cases when parallel query doesn't give the right answers. For example, suppose you do this:
rhaas=# create or replace function give10(int) returns void as $$update pgbench_accounts set abalance = abalance + 10 where aid = $1$$ language sql parallel safe;
CREATE FUNCTION
rhaas=# select give10(aid) from pgbench_accounts where aid = 10;
give10
--------
(1 row)
This situation can be a sort of ticking time bomb. It may seem to work for a while, but the planner might later switch to a different plan, and that new plan may use parallel query where the old one did not, and now things will start to fail. In the worst case, this could result in a production outage. If I test with force_parallel_mode, I will find the problem right away:
rhaas=# set force_parallel_mode = on;
SET
rhaas=# select give10(aid) from pgbench_accounts where aid = 10;
ERROR: cannot execute UPDATE during a parallel operation
CONTEXT: SQL function "give10" during startup
parallel worker
Setting force_parallel_mode caused the query planner to place a Single-Copy Gather node on top of the plan it would otherwise have used, which caused give10(int) to execute in the worker, rather than the leader. Then it failed. Now the problem is easy to fix:
rhaas=# alter function give10(int) parallel unsafe;
ALTER FUNCTION
rhaas=# set force_parallel_mode = on;
SET
rhaas=# select give10(aid) from pgbench_accounts where aid = 10;
give10
--------
(1 row)
No comments:
Post a Comment