Skip to content
This repository has been archived by the owner on Feb 24, 2020. It is now read-only.

Define notation for complex, serialized queries #54

Closed
totten opened this issue May 14, 2017 · 6 comments
Closed

Define notation for complex, serialized queries #54

totten opened this issue May 14, 2017 · 6 comments

Comments

@totten
Copy link
Member

totten commented May 14, 2017

Example queries that we want to be able to write:

  1. Fetch all activities for housing support cases
  2. Fetch all activities with a blue tag; and return all tags on the activities
  3. Fetch contacts named 'Bob' and all of their blue activities
  4. Get all contacts in a zipcode and return their Home or Work email addresses
  5. Fetch all activities where Bob is the assignee or source
  6. Get all contacts which (a) have an address in zipcode 94117 or 94118 or in city "San Francisco","LA" and (b) are not deceased and (c) have a custom-field "most_important_issue=Environment".
  7. Get participants who attended CiviCon 2012 but not CiviCon 2013. Return their name and email.

We can relate/learn from some examples:

  • APIv3 chaining can do some of this, but it's decidedly non-performant, and some of the joining semantics are invisible/magic.
  • GraphQL can do some of this, but it doesn't get into filtering notations.
mickadoo added a commit that referenced this issue Jul 5, 2017
mickadoo added a commit that referenced this issue Jul 5, 2017
mickadoo added a commit that referenced this issue Jul 6, 2017
@mickadoo
Copy link
Collaborator

@totten @colemanw

I'm making my way through each case outlined in the description here. I have a question, about the expected result.

Given a database with two activities:

  1. Has tags "red" and "blue"
  2. Has tag "red"

The request

    $results = Activity::get()
      ->setCheckPermissions(FALSE)
      ->addSelect('tags.name')
      ->addWhere('tags.name', '=', 'blue')
      ->execute();

should return the activity with the blue tag, which is working now.

However would you expect the red tag to appear in the results for activity 1? My gut feeling is yes, but right now I'm recycling the main query in post processing to select related entities. This means that the result only has the blue tag because of the WHERE clause in the query.

API result:

[
  {
    "id": "1",
    "subject": "Test Housing Support Activity",
    "tags": [
      {
        "name": "blue",
        "id": "1"
      }
    ],
    "activity_type": {
      "label": "Housing Support",
      "id": "4"
    }
  }
]

I guess it should show all tags for this Activity in the result unless you think it's a valid use case of "show me all activities with the blue tag and only show me the blue tag" or "show me all contact with a work email and show me only the work email in the results"

@totten
Copy link
Member Author

totten commented Jul 12, 2017

However would you expect the red tag to appear in the results for activity 1? My gut feeling is yes, but right now I'm recycling the main query in post processing to select related entities.

I agree - all tags (including red) should be part of the returned dataset for the Activity. The addSelect() and addWhere() are conceptually independent.

re: recycling the first/main/filtering query for use in the second/data-loading/post-processing query. I think this issue goes away if you do not recycle the query verbatim. Instead, take the list of matching Activity IDs and feed that into the second query (WHERE entity_id In (...list of matching activity IDs...)).

One might feel some hesitation about passing through the list of activity IDs. I think that's misplaced for a couple reasons:

  • The recycled query is more complex (including things like joins, multiple wheres, limits, offsets). The recycled query is liable to be less-optimized (i.e. no viable indices), and it's more liable to have accidental conflicts (e.g. unintended consequences of diff table-aliases/column-names/joins).
  • The ID query is simpler and more likely to rely on indexed data. The pagination is already accounted for (i.e. the main query was paginated and only returned $X records, so we only have a limited# of activity IDs).

@mickadoo
Copy link
Collaborator

mickadoo commented Jul 12, 2017 via email

@mickadoo
Copy link
Collaborator

@totten I think one of the tests you listed conflicts with what we just talked about:

Fetch contacts named 'Bob' and all of their blue activities

I've just got it working so the WHERE and SELECT are independent, but this test case seems like it would be a request like

    $result = Contact::get()
      ->setCheckPermissions(FALSE)
      ->addSelect('first_name')
      ->addSelect('last_name')
      ->addSelect('source_activities.subject')
      ->addWhere('first_name', '=', 'Bob')
      ->addWhere('source_activities.tags.name', '=', 'blue')
      ->execute();

Given a database with 3 contacts, 3 activities, 2 of which Bob Created, of which only one has a blue tag this is what is returned:

[
  {
    "id": "3",
    "first_name": "Bob",
    "last_name": "Bobberson",
    "source_activities": [
      {
        "subject": "Test Housing Support Activity (by Bob)",
        "id": "1",
        "tags": [
          {
            "name": "blue",
            "id": "1"
          },
          {
            "name": "red",
            "id": "2"
          }
        ]
      },
      {
        "subject": "Another Bob's Activity",
        "id": "2",
        "tags": [
          {
            "name": "red",
            "id": "2"
          },
          {
            "name": "green",
            "id": "3"
          }
        ]
      }
    ]
  }
]

As you can see both of Bob's activities are returned, which is what we you said would be expected in the previous comment. Which behavior do you think is valid?

@mickadoo
Copy link
Collaborator

@totten another test from the description:

Get all contacts in a zipcode and return their Home or Work email addresses

Works so far to fetch contacts in a zipcode, but again this looks like a WHERE clause on a related entity which I thought we were trying to avoid.

Suggest changing it to

Get all contacts in a zipcode and return their email addresses.

@mickadoo
Copy link
Collaborator

The final three tests are related to OR/NOT operators

  • Fetch all activities where Bob is the assignee or source
  • Get all contacts which (a) have an address in zipcode 94117 or 94118 or in city "San Francisco","LA" and (b) are not deceased and (c) have a custom-field "most_important_issue=Environment".
  • Get participants who attended CiviCon 2012 but not CiviCon 2013. Return their name and email.

Whether we want to allow these kinds of queries is under discussion in #41 and #43. I'd like to be sure this is the road we want to go before working on this. I've got 2/3 of the tests working, but by using two requests (example) instead of a single more powerful request.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants