Why Multiple JOINs are bad for Query or Do Not Get in the Way of Optimizer

Total: 1 Average: 5

Recently, I came across an application that generated DB queries. I understand that there is nothing new about that, but when application began running slow and I had to find out the reason of the slowdown, I was amazed to find these queries. Here is what SQL Server sometimes has to deal with:

SELECT COUNT(DISTINCT "pr"."id") FROM  ((((((((((((((((("SomeTable" "pr"
LEFT OUTER JOIN "SomeTable1698" "uf_pr_id_698" ON "uf_pr_id_698"."request" = "pr"."id") 
LEFT OUTER JOIN "SomeTable1700" "ufref3737_i2" ON "ufref3737_i2"."request" = "pr"."id") 
LEFT OUTER JOIN "SomeTable1666" "x0" ON "x0"."request" = "ufref3737_i2"."f6_callerper")
LEFT OUTER JOIN "SomeTable1666" "uf_ufref4646_i3_f58__666" ON "uf_ufref4646_i3_f58__666"."request" = "ufref3737_i2"."f58_")
LEFT OUTER JOIN "SomeTable1694" "x1" ON "x1"."request" = "ufref3737_i2"."f38_servicep")
LEFT OUTER JOIN "SomeTable3754" "ufref3754_i12" ON "pr"."id" = "ufref3754_i12"."request")
LEFT OUTER JOIN "SomeTable1698" "uf_ufref3754_i12_reference_698" ON "uf_ufref3754_i12_reference_698"."request" = "ufref3754_i12"."reference")
LEFT OUTER JOIN "SomeTable1698" "x2" ON "x2"."request" = "ufref3737_i2"."f34_parentse")
LEFT OUTER JOIN "SomeTable4128" "ufref3779_4128_i14" ON "ufref3737_i2"."f34_parentse" = "ufref3779_4128_i14"."request")
LEFT OUTER JOIN "SomeTable1859" "x3" ON "x3"."request" = "ufref3779_4128_i14"."reference")
LEFT OUTER JOIN "SomeTable3758" "ufref3758_i15" ON "pr"."id" = "ufref3758_i15"."request")
LEFT OUTER JOIN "SomeTable1698" "uf_ufref3758_i15_reference_698" ON "uf_ufref3758_i15_reference_698"."request" = "ufref3758_i15"."reference")
LEFT OUTER JOIN "SomeTable3758" "ufref3758_i16" ON "pr"."id" = "ufref3758_i16"."request")
LEFT OUTER JOIN "SomeTable4128" "ufref3758_4128_i16" ON "ufref3758_i16"."reference" = "ufref3758_4128_i16"."request")
LEFT OUTER JOIN "SomeTable1859" "x4" ON "x4"."request" = "ufref3758_4128_i16"."reference")
LEFT OUTER JOIN "SomeTable4128" "ufref4128_i17" ON "pr"."id" = "ufref4128_i17"."request")
LEFT OUTER JOIN "SomeTable1859" "uf_ufref4128_i17_reference_859" ON "uf_ufref4128_i17_reference_859"."request" = "ufref4128_i17"."reference")
LEFT OUTER JOIN "SomeTable1666" "uf_ufref4667_i25_f69__666" ON "uf_ufref4667_i25_f69__666"."request" = "uf_pr_id_698"."f69_"
WHERE ("uf_pr_id_698"."f1_applicant" IN (248,169,180,201,203,205,209,215,223,357,371,379,3502,3503,3506,3514,3517,3531,3740,3741)
OR "x0"."f24_useracco" IN (578872,564618,565084,566420,566422,566936,567032,567260,567689,579571,580813,594452,611522,611523,615836,621430,628371,633044,634132,634136)
OR "uf_ufref4646_i3_f58__666"."f24_useracco" IN (578872,564618,565084,566420,566422,566936,567032,567260,567689,579571,580813,594452,611522,611523,615836,621430,628371,633044,634132,634136)
OR "uf_ufref4667_i25_f69__666"."f24_useracco" IN (578872,564618,565084,566420,566422,566936,567032,567260,567689,579571,580813,594452,611522,611523,615836,621430,628371,633044,634132,634136)
OR ("uf_pr_id_698"."f10_status" Is Null OR "uf_pr_id_698"."f10_status" <> 111)  AND "ufref3737_i2"."f96_" = 0   AND (("ufref3737_i2"."f17_source"  Is Null OR "ufref3737_i2"."f17_source"  <> 566425)
AND ("ufref3737_i2"."f17_source"  Is Null OR "ufref3737_i2"."f17_source"  <> 566424)  OR ("uf_pr_id_698"."f10_status" Is Null OR "uf_pr_id_698"."f10_status" <> 56) )
AND ("uf_pr_id_698"."f12_responsi" IN (578872,564618,565084,566420,566422,566936,567032,567260,567689,579571,580813,594452,611522,611523,615836,621430,628371,633044,634132,634136)
OR "x1"."f19_restrict" IN (578872,564618,565084,566420,566422,566936,567032,567260,567689,579571,580813,594452,611522,611523,615836,621430,628371,633044,634132,634136)
OR "uf_ufref3754_i12_reference_698"."f12_responsi" IN (578872,564618,565084,566420,566422,566936,567032,567260,567689,579571,580813,594452,611522,611523,615836,621430,628371,633044,634132,634136)
OR "x2"."f12_responsi" IN (578872,564618,565084,566420,566422,566936,567032,567260,567689,579571,580813,594452,611522,611523,615836,621430,628371,633044,634132,634136)
OR "x3"."f5_responsib" IN (578872,564618,565084,566420,566422,566936,567032,567260,567689,579571,580813,594452,611522,611523,615836,621430,628371,633044,634132,634136) 
OR "uf_ufref3758_i15_reference_698"."f12_responsi" IN (578872,564618,565084,566420,566422,566936,567032,567260,567689,579571,580813,594452,611522,611523,615836,621430,628371,633044,634132,634136) 
OR "x4"."f5_responsib" IN (578872,564618,565084,566420,566422,566936,567032,567260,567689,579571,580813,594452,611522,611523,615836,621430,628371,633044,634132,634136)
OR "uf_ufref4128_i17_reference_859"."f5_responsib" IN (578872,564618,565084,566420,566422,566936,567032,567260,567689,579571,580813,594452,611522,611523,615836,621430,628371,633044,634132,634136))
AND ("uf_pr_id_698"."f12_responsi"  Is Null OR "uf_pr_id_698"."f12_responsi"  <> 579420)  ) AND "pr"."area" IN (700) AND "pr"."area" IN (700) AND "pr"."deleted_by_user"=0 AND "pr"."temporary" = 0

Names of the objects have been changed.

The most striking thing was that the same table was used multiple times, and the number of parenthesis simply drew me crazy. I was not the only one who didn’t like this code, SQL Server didn’t doesn’t appreciate it as well and spent much resources for building a plan for it. The query may run for 50-150 ms, and plan creation may take up to 2.5 ms. Today, I will not consider the ways to fix the issue, but I will tell one thing – in my case, it was impossible to fix query generation in the application.

Instead, I would like to analyze the reasons why SQL Server builds the query plan for so long. In any DBMS, including SQL Sever, the main optimization issue is the method of joining tables with each other. Besides the join method, the sequence of table joins is very important.

Let’s talk about the sequence of table joins in detail. It is very important to understand that the possible number of table joins grows exponentially, not linearly. Fox example, there are only 2 possible methods to join 2 tables, and the number can reach 12 methods for 3 tables. Different join sequences can have different query cost, and SQL Server optimizer must select the most optimal method. But when the number of tables is high, it becomes a resource-intensive task. If SQL Server begins going over all possible variants, such query may never be executed. That is why, SQL Server never does it and always looks for a quite good plan, not the best plan. SQL Server always tries to reach compromise between execution time and plan quality.

Here is an example of the exponential growth of join methods. SQL Server can select various join methods (left-deep, right-deep, bushy trees). Visually, it looks in the following way:

Exponential growth of join methods

The table below shows the possible join methods when the number of tables increases:

You can get these values on your own:

For left-deep: 5! = 5 x 4 x 3 x 2 x 1 = 120

For bushy tree: (2n–2)!/(n–1)!

Conclusion: Pay special attention to the number of JOINs and do not get in the way with optimizer. If you fail to get the desired result in the query containing multiple JOINs, break it in several small queries and you will be surprised with the result.

P.S. Of course, we must understand that besides defining a sequence of table joins, the query optimizer must also select the join type, the data access method (Scan, Seek) etc.


Useful products:

SQL Completewrite, beautify, refactor your code easily and boost your productivity.

Dmitry Zaytsev

Dmitry Zaytsev

Dmitry started working with SQL Server back in 2010 and how has large experience in various aspects of working with SQL Server, including administration, development, migration, monitoring, performance, etc. Dmitry is the founder of sqlcom.ru