Using Union instead of OR

Sometimes slow queries can be fixed by slightly modifying the query. One such example can be illustrated when multiple values ​​are compared in a WHERE clause using the OR or IN operator. Often an OR can cause a scan of an index or table, which may not be the preferred execution plan in terms of I / O consumption or overall query speed.



Many variables come into play when the query optimizer creates an execution plan. These variables include many hardware characteristics, instance settings, database settings, statistics (table, index, auto-generated), as well as a way to write a query. Here we change the way we write the request. No matter how unexpected this may seem, even if two different queries can return the same results, the path they follow can be completely different depending on the format of the query.



UNION vs OR



For most of my experience with SQL Server, OR is usually less efficient than UNION. What usually happens with OR is that it often causes a scan. This may sometimes be the best way for some cases, and I will leave it as a separate article, but in general, I have found that when a large number of entries are affected, this is the main reason for the slowness. So let's start our comparison.



Here is our OR statement:



SELECT SalesOrderID, * FROM sales.SalesOrderDetail WHERE ProductID = 750 OR ProductID = 953
      
      









From this execution plan, we see that we are scanning 121,000 rows. (You cannot see the number of rows, but it is).



Now we execute the same request, but written using UNION instead of OR:



 SELECT [SalesOrderID], * FROM sales.SalesOrderDetail WHERE ProductID = 750 UNION SELECT [SalesOrderID], * FROM sales.SalesOrderDetail WHERE ProductID = 953
      
      









Here we see two branches of operations. One branch affects 358 lines and the other 346 lines. Both branches are found to perform a concatenation operation that combines both sets of results. We have two separate searches, but we also have a key search to get the required SELECT list. This was not necessary for the scan operation, because we still affected all the lines in the scan operation, so the data was obtained during the scan, not after. This is due to the index and the rows we need, not to UNION or OR. However, I will say that select is also a factor in choosing a search vs scan, but we will ignore this in this article.



Explanation



Why UNION causes more searches instead of scans, because each operation must satisfy a certain selectivity requirement in order to qualify for a search. (Selectivity is the uniqueness of a particular filtered column). OR occurs in a single operation, so when the selectivity for each column is combined and it exceeds a certain percentage, then scanning is considered more efficient.



Since UNION performs a separate operation for each operator by default, the selectivity of each column is not combined, giving it a better chance of performing a search. Now, since UNION performs two operations, they must match their result sets using the concatenation operation described above. This is usually not an expensive operation.



It should also be noted that the OR clause works in the same way as the IN statement.



Hope this tip helps. I believe that this is very valuable when working with systems that require high concurrency.



All Articles