Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Joins] Join Query DSL #15450

Open
Tracked by #15451
harshavamsi opened this issue Aug 27, 2024 · 2 comments
Open
Tracked by #15451

[Joins] Join Query DSL #15450

harshavamsi opened this issue Aug 27, 2024 · 2 comments
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Search:Query Capabilities v2.19.0 Issues and PRs related to version 2.19.0

Comments

@harshavamsi
Copy link
Contributor

Is your feature request related to a problem? Please describe

Coming from #15185 , we want to introduce the join DSL format that will be used to construct the join query. It will make use of the existing QueryBuilders within OpenSearch to parse the left and right queries. We will add new logic to SearchSourceBuilder to support the new join field in the query DSL.

Describe the solution you'd like

The join field will be parsed by a new JoinBuilder in OpenSearch that will take in the following:

  • left query(just query from the builder perspective) from StreamInput and parses into Lucene query,
  • fields - left query fields to broadcast back
  • join - XContent Object containing
    • right query
      • index - right index to perform join
      • query - right lucene query
      • fields - right query fields to join on
      • type - type of join to perform(inner, outer, cross, left_join, right_join)
      • algorithm - type of join algorithm to use(hash_join, nested_join), we might support only one type to start with
      • condition - the join condition to evaluate while joining
        • left_field - left index field to use while evaluating
        • right_field - right index field to use while evaluating
        • comparator - the operator to use for the condition(<, <=, >, >=, =) // should this be just text?
      • fields - right query fields to broadcast back
      • aggs - aggregations to perform while joining / should this be outside the join clause?

Full query DSL

{  
  "query": {  
    "bool": {  
      "filter": [  
        {  
          "range": {  
            "@timestamp": {  
              "gte": "now-1h"  
            }  
          }  
        },  
        {  
          "match": {  
            "message": "error"  
          }  
        }  
      ]  
    }  
  },  
  "fields": ["instance_id", "status_code"],  
  "join": {  
    "right_query": {  
        "index": "instance_details",   
        "query": {  
          "range": {  
            "created_at": {  
              "gte": "now-1y"  
            }  
          }  
        },  
        "fields": ["instance_id", "region"]  
    },  
    "type": "inner",   
    "algorithm": "hash_join", // optional  
    "condition": {  
        "left_field": "instance_id",  
        "right_field": "instance_id",  
        "comparator": "="  
    },  
    "fields": ["region", "status_code"],  
    "aggs": {  
      "by_region": {  
        "terms": {  
          "field": "region"  
        },  
        "aggs": {  
          "by_status_code": {  
            "terms": {  
              "field": "status_code"  
            },  
            "aggs": {  
              "status_code_count": {  
                "value_count": {  
                  "field": "status_code"  
                }  
              }  
            }  
          }  
        }  
      }  
    }  
  }  
}

Related component

Search:Query Capabilities

Describe alternatives you've considered

No response

Additional context

No response

@harshavamsi harshavamsi added enhancement Enhancement or improvement to existing feature or request untriaged labels Aug 27, 2024
@harshavamsi harshavamsi added the v2.18.0 Issues and PRs related to version 2.18.0 label Aug 27, 2024
@harshavamsi harshavamsi self-assigned this Aug 27, 2024
@harshavamsi harshavamsi changed the title [RFC] [DRAFT] Join Query DSL [Feature] Join Query DSL Aug 27, 2024
@harshavamsi harshavamsi changed the title [Feature] Join Query DSL [Joins] Join Query DSL Aug 27, 2024
@smacrakis
Copy link

Small comment: why do we speak of the left and right queries? In SQL, the left and right objects are normally called "tables". The result sets to join may be defined by table names or by subqueries. The usual equivalent of "table" in OpenSearch is "index", but that term is so overloaded that it's best avoided. Wouldn't it be clearest for people who are familiar with SQL to use the standard SQL terminology, namely tables?

@harshavamsi harshavamsi moved this from Todo to Now (This Quarter) in Performance Roadmap Sep 9, 2024
@harshavamsi harshavamsi moved this from Now (This Quarter) to In Progress in Performance Roadmap Sep 9, 2024
@bowenlan-amzn
Copy link
Member

I am working on this now as part of the join request response workflow.

@bowenlan-amzn bowenlan-amzn self-assigned this Oct 8, 2024
@sandeshkr419 sandeshkr419 added v2.19.0 Issues and PRs related to version 2.19.0 and removed v2.18.0 Issues and PRs related to version 2.18.0 labels Nov 6, 2024
@harshavamsi harshavamsi moved this from In Progress to Todo in Performance Roadmap Nov 18, 2024
@bowenlan-amzn bowenlan-amzn moved this from Todo to Untriaged in Performance Roadmap Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Query Capabilities v2.19.0 Issues and PRs related to version 2.19.0
Projects
Status: Untriaged
Status: 🆕 New
Development

No branches or pull requests

4 participants