🤾 📷 👨🏻‍💼 クエリの最適化について少し 👂 👶🏾 🛌🏿

簡単な例で、一見非常に単純なクエリを大幅に最適化できる場合があります。たとえば、PostgreSQL 9.3でこのようなコードを取り上げますが、この原則は、ハッシュ結合が存在するすべてのサブドに適用されます。

タスクは単純です-2つのテーブルを結合する-1つは非常に大きく、もう1つは小さい-しかし、結合は単純ではなく、ORで~~ゴールデン~~です。（実際の場合-借方と貸方の場合、勘定科目に2つのフィールドがあるため、勘定科目の転記テーブルを勘定科目自体に結合します。）

テストデータを準備します。

create table public.test1 ( id bigint, id2 bigint, value varchar(100) ); create table public.test2 ( id bigint, value varchar(100) ) ; insert into public.test1 select generate_series(1,2000000), 1, 'abcdef'; insert into public.test2 select generate_series(1,100), 'abcdefssdf'; create index ix_test2_id on public.test2 (id); /*         ,       ,        . */

そして、ここにorの条件での元の形式でのリクエストがあります。

 select * from public.test1 t1 inner join public.test2 t2 on t1.id = t2.id or t1.id2 = t2.id;

この条件での結合は、保証されたネストされたループを受け取ります。計画は次のとおりです。

 "Nested Loop (cost=0.00..3532741.25 rows=2000099 width=42) (actual time=0.043..61627.189 rows=2000099 loops=1)" " Join Filter: ((t1.id = t2.id) OR (t1.id2 = t2.id))" " Rows Removed by Join Filter: 197999901" " -> Seq Scan on test1 t1 (cost=0.00..32739.00 rows=2000000 width=23) (actual time=0.010..385.658 rows=2000000 loops=1)" " -> Materialize (cost=0.00..2.50 rows=100 width=19) (actual time=0.000..0.007 rows=100 loops=2000000)" " -> Seq Scan on test2 t2 (cost=0.00..2.00 rows=100 width=19) (actual time=0.005..0.018 rows=100 loops=1)" "Total runtime: 61717.751 ms"

小さなテーブルを200万回通過することに注意してください。これは、ネストされたループのメインのマイナスです。インデックスで200万件の検索があったとしても、それはまだ悪いです。

どうする？

結合が私たちを助けてくれたら-メモリに収まる小さなテーブルのハッシュ+大きなテーブルを1回通過-完璧。しかし、ORを使用する場合、条件は取得しないことです。抜け道があります！

異なるフィールドで2つの結合を作成し、フィルターにORを入れてみましょう。

 select * from public.test1 t1 left join public.test2 t2 on t1.id = t2.id left join public.test2 t3 on t1.id2 = t3.id where t2.id is not null or t3.id is not null;

計画はずっと速くなりました。 5kbのハッシュテーブルが必要です。2つの結合があることを考慮しても、10回勝ちます！

 "Hash Left Join (cost=6.50..67746.50 rows=2000000 width=61) (actual time=0.124..2230.636 rows=2000000 loops=1)" " Hash Cond: (t1.id2 = t3.id)" " Filter: ((t2.id IS NOT NULL) OR (t3.id IS NOT NULL))" " -> Hash Left Join (cost=3.25..40243.25 rows=2000000 width=42) (actual time=0.073..1065.822 rows=2000000 loops=1)" " Hash Cond: (t1.id = t2.id)" " -> Seq Scan on test1 t1 (cost=0.00..32739.00 rows=2000000 width=23) (actual time=0.012..338.759 rows=2000000 loops=1)" " -> Hash (cost=2.00..2.00 rows=100 width=19) (actual time=0.041..0.041 rows=100 loops=1)" " Buckets: 1024 Batches: 1 Memory Usage: 5kB" " -> Seq Scan on test2 t2 (cost=0.00..2.00 rows=100 width=19) (actual time=0.004..0.015 rows=100 loops=1)" " -> Hash (cost=2.00..2.00 rows=100 width=19) (actual time=0.039..0.039 rows=100 loops=1)" " Buckets: 1024 Batches: 1 Memory Usage: 5kB" " -> Seq Scan on test2 t3 (cost=0.00..2.00 rows=100 width=19) (actual time=0.004..0.016 rows=100 loops=1)" "Total runtime: 2318.009 ms"

ご清聴ありがとうございました。

クエリの最適化について少し

More articles: