Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DISTINCT in subquery prevents Db2 from selecting optimal execution plan #1327

Closed
punktilious opened this issue Jul 9, 2020 · 4 comments · Fixed by #1330
Closed

DISTINCT in subquery prevents Db2 from selecting optimal execution plan #1327

punktilious opened this issue Jul 9, 2020 · 4 comments · Fixed by #1330
Assignees
Labels
bug Something isn't working performance performance
Milestone

Comments

@punktilious
Copy link
Collaborator

Describe the bug
Search request including tokens generates the following SQL:

SELECT R.RESOURCE_ID, R.LOGICAL_RESOURCE_ID, R.VERSION_ID, R.LAST_UPDATED, R.IS_DELETED,
       R.DATA, LR.LOGICAL_ID
  FROM Observation_LOGICAL_RESOURCES LR 
  JOIN Observation_RESOURCES R 
    ON R.LOGICAL_RESOURCE_ID=LR.LOGICAL_RESOURCE_ID
   AND R.RESOURCE_ID = LR.CURRENT_RESOURCE_ID
   AND R.IS_DELETED <> 'Y'
  JOIN (SELECT DISTINCT LOGICAL_RESOURCE_ID
          FROM Observation_TOKEN_VALUES  
         WHERE (PARAMETER_NAME_ID=1191 AND ((TOKEN_VALUE = ?)))
       ) Observation0
    ON Observation0.LOGICAL_RESOURCE_ID = R.LOGICAL_RESOURCE_ID
  JOIN (SELECT DISTINCT LOGICAL_RESOURCE_ID
          FROM Observation_STR_VALUES  
         WHERE (PARAMETER_NAME_ID=1396 AND (STR_VALUE = ?))
       ) Observation1 
    ON Observation1.LOGICAL_RESOURCE_ID = R.LOGICAL_RESOURCE_ID
  JOIN (SELECT DISTINCT LOGICAL_RESOURCE_ID
          FROM Observation_TOKEN_VALUES 
         WHERE (PARAMETER_NAME_ID=1008 AND ((TOKEN_VALUE = ? AND CODE_SYSTEM_ID = ?)))
       ) Observation2 
    ON Observation2.LOGICAL_RESOURCE_ID = R.LOGICAL_RESOURCE_ID
ORDER BY RESOURCE_ID ASC  LIMIT 10 OFFSET 0

Several of the sub-selects use SELECT DISTINCT. This prevents the optimizer from pushing down join predicates resulting in a less-than-optimal execution plan.

To Reproduce
Performance test with search requests.

Expected behavior
Rewrite the query to permit a more efficient execution plan.

Additional context
N/A

@punktilious punktilious added the bug Something isn't working label Jul 9, 2020
@punktilious punktilious self-assigned this Jul 9, 2020
@punktilious punktilious added the performance performance label Jul 9, 2020
@albertwang-ibm
Copy link
Contributor

Can not remember if the distinct was added to avoid the duplication issue ...

@prb112
Copy link
Contributor

prb112 commented Jul 9, 2020

Can not remember if the distinct was added to avoid the duplication issue ...

It was, Robin has an alternative.

@punktilious
Copy link
Collaborator Author

The new EXISTS queries follow a similar form to this:

SELECT R.RESOURCE_ID,MIN(S1.QUANTITY_VALUE)  
  FROM Observation_LOGICAL_RESOURCES LR 
  JOIN Observation_RESOURCES R 
    ON R.LOGICAL_RESOURCE_ID=LR.LOGICAL_RESOURCE_ID 
   AND R.RESOURCE_ID = LR.CURRENT_RESOURCE_ID 
   AND R.IS_DELETED <> 'Y'  
  AND  EXISTS (SELECT 1 FROM Observation_TOKEN_VALUES  AS V  WHERE (PARAMETER_NAME_ID=1392 AND ((TOKEN_VALUE = 'corrected'))) AND V.LOGICAL_RESOURCE_ID = R.LOGICAL_RESOURCE_ID)  
  AND  EXISTS (SELECT 1 FROM Observation_TOKEN_VALUES  AS V  WHERE (PARAMETER_NAME_ID=1073 AND ((TOKEN_VALUE = '1fEdfCQ1W8'))) AND V.LOGICAL_RESOURCE_ID = R.LOGICAL_RESOURCE_ID)  
  LEFT OUTER JOIN Observation_QUANTITY_VALUES S1 
               ON (S1.PARAMETER_NAME_ID=1093 AND S1.LOGICAL_RESOURCE_ID = R.LOGICAL_RESOURCE_ID)  
  GROUP BY R.RESOURCE_ID  ORDER BY MIN(S1.QUANTITY_VALUE) ASC NULLS LAST OFFSET 0 ROWS FETCH NEXT 100 ROWS ONLY;

  • Using fake test data to replace the parameter markers.

The EXISTS sub-queries are correlated with the main join using the predicate AND V.LOGICAL_RESOURCE_ID = R.LOGICAL_RESOURCE_ID. The optimizer can produce an execution plan minimizing the number of logical IOs because it can use the higher selectivity filters first, then use that to access parameters with lower selectivity (e.g. a _tag property) using an index that includes the resource_id.

punktilious added a commit that referenced this issue Jul 16, 2020
…execution plans

Signed-off-by: Robin Arnold <robin.arnold23@ibm.com>
punktilious added a commit that referenced this issue Jul 16, 2020
…ter order

Signed-off-by: Robin Arnold <robin.arnold23@ibm.com>
punktilious added a commit that referenced this issue Jul 16, 2020
Signed-off-by: Robin Arnold <robin.arnold23@ibm.com>
punktilious added a commit that referenced this issue Jul 16, 2020
Issue #1327 Improve execution plans for search
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working performance performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants