Thursday, September 23, 2010

Jolly times with the postgresql optimizer and explain analyze

Suddenly yesterday one of my rex functions (REX is our REST/SQL mapper tool) jumped off a cliff, performance-wise. With a hint from Nick (who will forevermore be known as "the database whisperer") I was able to narrow the performance problem down to this particular query in the function that REX is calling, specifically the family(network)=4 element of the where clause:

SELECT DISTINCT network, sector.sector::text
FROM nfa_k20_network
LEFT JOIN info_site ON info_site.siteid = nfa_k20_network.site_id
LEFT JOIN dev_location on dev_location.site_id = info_site.id
LEFT JOIN dev_classification on dev_classification.device_id =
dev_location.device_id
LEFT JOIN sector on sector.id = dev_classification.sector
WHERE sector.sector is not null and family(network) = 4;

But I was unable to get the optimizer to generate the same plan on netdbeval/allyourbase as I got on netdbdev. I'm assuming this is an indication that the data size in one of the tables tipped the query optimizer over into selecting a less ideal strategy, although I'm not clear
on how or why.

The good strategy uses hashes, the bad one a nested join and a filter.
Here's where things start to run amok:

Good (netdbdev) http://explain.depesz.com/s/ukE

Hash Join (cost=3185.37..4996.42 rows=1 width=28) (actual
time=953.056..1227.372 rows=7379 loops=1)
* Hash Cond: (public.dev_state.device_id = interface.device_id)

Bad (netdbeval) http://explain.depesz.com/s/TI6

Nested Loop (cost=1851.83..8488.08 rows=1 width=34) (actual
time=247.812..404814.965 rows=1465 loops=1)
* Join Filter: (interface.device_id = public.dev_state.device_id)

Wowsers! Check out how the actual time compares to the estimated cost!

Explicitly disabling nested loops in psql before trying this query caused it to go back from taking several minutes to taking 3 seconds, so we clearly want to get postgres to use the first strategy.

The choice seems to be driven by the estimated row count that ifc_ip will return:

ripit_2=# explain analyze select count(*) from ifc_ip where family(ip)=4;
QUERY PLAN

------------------------------------------------------------------------------------------------------------
Aggregate (cost=268.85..268.87 rows=1 width=0) (actual
time=15.433..15.434 rows=1 loops=1)
-> Seq Scan on ifc_ip (cost=0.00..268.70 rows=61 width=0) (actual
time=0.048..9.489 rows=8498 loops=1)
Filter: (family(ip) = 4)
Total runtime: 15.481 ms
(4 rows)

Note that the estimated row count is 61, while the actual row count returned is 8498.

I tinkered with a number of settings (work_mem, effective_cache_size) to see if I could drive the query planner to go back to making better choices. I also tried analyzing and reindexing the table. The basic problem seems to be how much filtering the query planner expects the "family(ip)=4" clause to result in and I don't see how the pg_statistics table could capture that without adding an index (heck I don't even see where that mysterious 61 number is coming from).

A simple and reasonable solution, however, seems to be to rewrite the query as "family(ip) != 6" (the goal of the query is to eliminate any IPv6 addresses from getting returned).

Doing this results in an estimate that captures something closer to the expected size of the ifc_ip result set (on the high end!) and has the desired effect on the overall query.

ripit_2=# explain analyze select count(*) from ifc_ip where family(ip)!=6;
QUERY PLAN

---------------------------------------------------------------------------------------------------------------
Aggregate (cost=299.00..299.01 rows=1 width=0) (actual
time=15.380..15.381 rows=1 loops=1)
-> Seq Scan on ifc_ip (cost=0.00..268.70 rows=12119 width=0) (actual
time=0.049..9.353 rows=8498 loops=1)
Filter: (family(ip) <> 6)
Total runtime: 15.432 ms
(4 rows)

The downside, of course, is that I've no idea what might cause the query planner to change its mind in future about this table or perhaps another table lurking in the query. So my sense is that it might be a more robust solution to add an index, but I'd like to understand better how the returned rows estimation is generated before I go adding more random indexes to tables which, after all, aren't free . . .

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.