Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query does not match criteria, skipping PFOCR figure enrichment #837

Closed
AlexanderPico opened this issue Jul 24, 2024 · 9 comments
Closed
Assignees
Labels
bug Something isn't working On Test -> Prod

Comments

@AlexanderPico
Copy link
Collaborator

Getting this log message for queries that I would expect to see results: Query does not match criteria, skipping PFOCR figure enrichment.

Jackson noted:

I just double checked the code, and it doesn't look like it's checking for equivalent curies, just the primary curie..

Here are a few example queries:

See this gsheet for more details and examples: https://docs.google.com/spreadsheets/d/1-7pbdfiULj5PZ-eZcZDVW8ljK0KV1utyiXGW1Fi6pwc/edit?gid=0#gid=0

@AlexanderPico AlexanderPico added the bug Something isn't working label Jul 24, 2024
@tokebe
Copy link
Member

tokebe commented Jul 24, 2024

To fix this, we need to rework how BTE checks if a result is eligible for enrichment. Previously, we checked on the basis that at least 2 QNodes are bound NCBIGenes by some result.

Instead, we want to go through each result and retrieve all nodes used to support it, whether bound to the QGraph or as part of a support graph (which means we'll have to recursively traverse results for QNodes), and then check if at least 2 of those nodes are NCBIGenes. This implementation should be CURIE-prefix agnostic, meaning that we'll want to first check node primary ID for a match, and then the equivalent IDs if the primary doesn't match.

Tracking of curieCombinations and trapiResultToCurieSet in the current implementation means that we don't need to fundamentally change much, just how curieCombinations is generated (we can also skip generating & checking matchableQNodeIDs as it'll be computationally simpler to just check that curieCombinations has some size > 0). Some additional simplification/optimization on the enrichment code might be worth doing, though.

This enrichment already states its matches as curies rather than QNodes, so no structural changes to the data added to results should be necessary.

@tokebe
Copy link
Member

tokebe commented Jul 25, 2024

Also using this issue to track #838 (comment) Part 1, we should provide both figureUrl and pfocrUrl.

@colleenXu
Copy link
Collaborator

FYI: the "aux-graph" problem was noted last year as a potential issue in #613 (comment)

@AlexanderPico
Copy link
Collaborator Author

Actually, what we want here is to gather all the result nodes in a TRAPI response and perform the enrichment across all of them as a single test (not each result separately).

This is a dramatic change to the current system, so perhaps should not be pursued, especially given the other parallel efforts to perform enrichment at the ARS level.

Ok to close this ticket?

@tokebe
Copy link
Member

tokebe commented Aug 2, 2024

Note from internal discussion: The above examples are not actually addressed by the changes being tracked by this issue (those being nested support graph behavior and pfocrUrl addition), this is fine.

A new issue will be made to track adding new ID types to enrichment behavior.

@colleenXu
Copy link
Collaborator

I found a case where PFOCR result augmentation is consistently not happening, because the PFOCR queries are failing (error status code 500, connection timeout??).
A side note: I'm running with INSTANCE_ENV = CI, but it's using the "prod" url https://biothings.ncats.io/pfocr/query.

Console logs: permethrin_increases_consoleLogs.txt
Response: permethrin_increases_gene_forConsoleLogs.json.zip

Examples of what's in logs

Response: [Template-2]: Error getting PFOCR figures, results will not be enriched.

Console:

bte:biothings-explorer-trapi:pfocr Error in scrolling request AxiosError: Request failed with status code 500

    data: {
      code: 500,
      success: false,
      error: 'Internal Server Error',
      details: "'ConnectionTimeout' object has no attribute 'error'"
    }

Query I ran

from an example in #842 (comment)

{
    "message": {
        "query_graph": {
            "nodes": {
                "start": {
                    "categories": ["biolink:ChemicalEntity"],
                    "ids": ["CHEBI:34911"]
                },
                "end": {
                    "categories": ["biolink:Gene"]
                }
            },
            "edges": {
                "t_edge": {
                    "subject": "start",
                    "object": "end",
                    "predicates": ["biolink:affects"],
                    "knowledge_type": "inferred",
                    "qualifier_constraints": [
                        {
                            "qualifier_set": [
                                {
                                    "qualifier_type_id": "biolink:qualified_predicate",
                                    "qualifier_value": "biolink:causes"
                                },
                                {
                                    "qualifier_type_id": "biolink:object_aspect_qualifier",
                                    "qualifier_value": "activity_or_abundance"
                                },
                                {
                                    "qualifier_type_id": "biolink:object_direction_qualifier",
                                    "qualifier_value": "increased"
                                }
                            ]
                        }
                    ]
                }
            }
        }
    }
}


Note: this commit addresses bug where these PFOCR query failures were causing BTE to drop all results for the associated template. biothings/bte_trapi_query_graph_handler@9fad32d

@colleenXu
Copy link
Collaborator

colleenXu commented Aug 11, 2024

Notes:

  • With biothings/bte_trapi_query_graph_handler@2d754fe:
    • BTE should be using instance-specific server urls - @tokebe I can't see this from the console logs though.
    • BTE is now handling these errors with better logs and sending details to Sentry
  • Now it looks like this when PFOCR queries time out/throw errors, which makes it clearer that the issue is that queries aren't returning in 15 s
  bte:biothings-explorer-trapi:pfocr Getting PFOCR figure data +0ms
  bte:biothings-explorer-trapi:pfocr Batch window 0-1000: 2190 hits retrieved for PFOCR figure data +14s
  bte:biothings-explorer-trapi:pfocr Error in scrolling request window 0-1000, error is timeout of 15000ms exceeded +917ms
Sentry Logger [log]: [Tracing] Finishing '< unknown op >' span on transaction 'EXEC /v1/query' (a8b463c87126455a).
  bte:biothings-explorer-trapi:pfocr Error in scrolling request window 0-1000, error is timeout of 15000ms exceeded +1ms
  bte:biothings-explorer-trapi:pfocr Error in scrolling request window 0-1000, error is timeout of 15000ms exceeded +0ms

See #847 for ideas on changing result augmentation.

@tokebe
Copy link
Member

tokebe commented Aug 12, 2024

BTE should be using instance-specific server urls - @tokebe I can't see this from the console logs though.

There never was a log that declared what server enrichment was using, and it's kinda unnecessary -- updated server selection works on the instance_env environment variable same as previous updates to nodenorm, etc., which is well-tested behavior.

@tokebe tokebe added On CI -> Test On Test Related changes are deployed to Test server and removed On CI Related changes are deployed to CI server On CI -> Test labels Aug 22, 2024
@colleenXu colleenXu added On Test -> Prod and removed On Test Related changes are deployed to Test server labels Aug 28, 2024
@tokebe
Copy link
Member

tokebe commented Sep 3, 2024

Relevant changes deployed to Prod. Tracking additional work in #847

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working On Test -> Prod
Projects
None yet
Development

No branches or pull requests

3 participants