Define notation for complex, serialized queries #54

totten · 2017-05-14T17:21:54Z

Example queries that we want to be able to write:

Fetch all activities for housing support cases
Fetch all activities with a blue tag; and return all tags on the activities
Fetch contacts named 'Bob' and all of their blue activities
Get all contacts in a zipcode and return their Home or Work email addresses
Fetch all activities where Bob is the assignee or source
Get all contacts which (a) have an address in zipcode 94117 or 94118 or in city "San Francisco","LA" and (b) are not deceased and (c) have a custom-field "most_important_issue=Environment".
Get participants who attended CiviCon 2012 but not CiviCon 2013. Return their name and email.

We can relate/learn from some examples:

APIv3 chaining can do some of this, but it's decidedly non-performant, and some of the joining semantics are invisible/magic.
GraphQL can do some of this, but it doesn't get into filtering notations.

… results

mickadoo · 2017-07-12T16:19:04Z

@totten @colemanw

I'm making my way through each case outlined in the description here. I have a question, about the expected result.

Given a database with two activities:

Has tags "red" and "blue"
Has tag "red"

The request

    $results = Activity::get()
      ->setCheckPermissions(FALSE)
      ->addSelect('tags.name')
      ->addWhere('tags.name', '=', 'blue')
      ->execute();

should return the activity with the blue tag, which is working now.

However would you expect the red tag to appear in the results for activity 1? My gut feeling is yes, but right now I'm recycling the main query in post processing to select related entities. This means that the result only has the blue tag because of the WHERE clause in the query.

API result:

[
  {
    "id": "1",
    "subject": "Test Housing Support Activity",
    "tags": [
      {
        "name": "blue",
        "id": "1"
      }
    ],
    "activity_type": {
      "label": "Housing Support",
      "id": "4"
    }
  }
]

I guess it should show all tags for this Activity in the result unless you think it's a valid use case of "show me all activities with the blue tag and only show me the blue tag" or "show me all contact with a work email and show me only the work email in the results"

totten · 2017-07-12T17:51:51Z

However would you expect the red tag to appear in the results for activity 1? My gut feeling is yes, but right now I'm recycling the main query in post processing to select related entities.

I agree - all tags (including red) should be part of the returned dataset for the Activity. The addSelect() and addWhere() are conceptually independent.

re: recycling the first/main/filtering query for use in the second/data-loading/post-processing query. I think this issue goes away if you do not recycle the query verbatim. Instead, take the list of matching Activity IDs and feed that into the second query (WHERE entity_id In (...list of matching activity IDs...)).

One might feel some hesitation about passing through the list of activity IDs. I think that's misplaced for a couple reasons:

The recycled query is more complex (including things like joins, multiple wheres, limits, offsets). The recycled query is liable to be less-optimized (i.e. no viable indices), and it's more liable to have accidental conflicts (e.g. unintended consequences of diff table-aliases/column-names/joins).
The ID query is simpler and more likely to rely on indexed data. The pagination is already accounted for (i.e. the main query was paginated and only returned $X records, so we only have a limited# of activity IDs).

mickadoo · 2017-07-12T21:38:09Z

The reason I'm recycling the query is to avoid the expensive automatic join lookup that are required for a new query. It's also handy to have a query that uses the same aliases as the original. I've tested with replacing the WHERE part with what you suggested, the IDs returned from original query and it seems to work. It seems a bit dirty to be doing it how I am, by doing a string replace on the SELECT part of the query. I might consider altering the original query, but all the properties are private. Maybe I can do this by creating a new query and copying the required values. Regarding pagination, I think it could be very confusing if we impose a limit on the nested results, for example if I do a query for contacts and also select `emails.email` then I would expect _all_ emails in the result, even if the base result was limited to 25. If we did only show the same number I would have situations like: - I want to only fetch the first 5 contacts with all their related emails and phones but it's impossible - I'm not sure if the returned contact has 25 emails or it's because of the limit on the contacts result I know it's bad for performance, but people should know that by doing joins they're going to suffer from a performance hit. I'll clean it up tomorrow and hopefully have a PR ready for this soon.

…

On Wed, Jul 12, 2017, 18:51 Tim Otten ***@***.***> wrote: However would you expect the red tag to appear in the results for activity 1? My gut feeling is yes, but right now I'm recycling the main query in post processing to select related entities. I agree - all tags (including red) should be part of the returned dataset for the Activity. The addSelect() and addWhere() are conceptually independent. re: recycling the first/main/filtering query for use in the second/data-loading/post-processing query. I think this issue goes away if you do *not* recycle the query verbatim. Instead, take the list of matching Activity IDs and feed that into the second query (WHERE entity_id In (...list of matching activity IDs...)). One might feel some hesitation about passing through the list of activity IDs. I think that's misplaced for a couple reasons: - The recycled query is more complex (including things like joins, multiple wheres, limits, offsets). The recycled query is liable to be less-optimized (i.e. no viable indices), and it's more liable to have accidental conflicts (e.g. unintended consequences of diff table-aliases/column-names/joins). - The ID query is simpler and more likely to rely on indexed data. The pagination is already accounted for (i.e. the main query was paginated and only returned $X records, so we only have a limited# of activity IDs). — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#54 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGFCsCIsxCt-FypV8e0CGl1KI6-Xylqyks5sNQe4gaJpZM4NacHc> .

mickadoo · 2017-07-13T15:47:54Z

@totten I think one of the tests you listed conflicts with what we just talked about:

Fetch contacts named 'Bob' and all of their blue activities

I've just got it working so the WHERE and SELECT are independent, but this test case seems like it would be a request like

    $result = Contact::get()
      ->setCheckPermissions(FALSE)
      ->addSelect('first_name')
      ->addSelect('last_name')
      ->addSelect('source_activities.subject')
      ->addWhere('first_name', '=', 'Bob')
      ->addWhere('source_activities.tags.name', '=', 'blue')
      ->execute();

Given a database with 3 contacts, 3 activities, 2 of which Bob Created, of which only one has a blue tag this is what is returned:

[
  {
    "id": "3",
    "first_name": "Bob",
    "last_name": "Bobberson",
    "source_activities": [
      {
        "subject": "Test Housing Support Activity (by Bob)",
        "id": "1",
        "tags": [
          {
            "name": "blue",
            "id": "1"
          },
          {
            "name": "red",
            "id": "2"
          }
        ]
      },
      {
        "subject": "Another Bob's Activity",
        "id": "2",
        "tags": [
          {
            "name": "red",
            "id": "2"
          },
          {
            "name": "green",
            "id": "3"
          }
        ]
      }
    ]
  }
]

As you can see both of Bob's activities are returned, which is what we you said would be expected in the previous comment. Which behavior do you think is valid?

mickadoo · 2017-07-17T09:33:59Z

@totten another test from the description:

Get all contacts in a zipcode and return their Home or Work email addresses

Works so far to fetch contacts in a zipcode, but again this looks like a WHERE clause on a related entity which I thought we were trying to avoid.

Suggest changing it to

Get all contacts in a zipcode and return their email addresses.

mickadoo · 2017-07-17T13:55:43Z

The final three tests are related to OR/NOT operators

Fetch all activities where Bob is the assignee or source

Get all contacts which (a) have an address in zipcode 94117 or 94118 or in city "San Francisco","LA" and (b) are not deceased and (c) have a custom-field "most_important_issue=Environment".

Get participants who attended CiviCon 2012 but not CiviCon 2013. Return their name and email.

Whether we want to allow these kinds of queries is under discussion in #41 and #43. I'd like to be sure this is the road we want to go before working on this. I've got 2/3 of the tests working, but by using two requests (example) instead of a single more powerful request.

mickadoo added a commit that referenced this issue Jul 5, 2017

#54: Update tests to match new result format

12ead28

mickadoo added a commit that referenced this issue Jul 5, 2017

#54: Change returned result format, nesting custom values and related…

afa9070

… results

mickadoo added a commit that referenced this issue Jul 5, 2017

#54: Add link to option values if option group is set for custom field

c627351

mickadoo added a commit that referenced this issue Jul 5, 2017

#54: Cleanup api4 query class, add insertion service and tests

1d3ef31

mickadoo added a commit that referenced this issue Jul 5, 2017

#54: Update tests

b7c8733

mickadoo added a commit that referenced this issue Jul 5, 2017

#54: Remove custom field stuff from spec gatherer test

506de33

mickadoo added a commit that referenced this issue Jul 5, 2017

#54: Set join type for activity contacts shortcut

05e3be7

mickadoo added a commit that referenced this issue Jul 5, 2017

#54 Add test with multiple contacts and different custom data

5ad308c

mickadoo added a commit that referenced this issue Jul 5, 2017

#54 Refactoring

813527a

mickadoo added a commit that referenced this issue Jul 5, 2017

#54 Ignore parent ID filter if parent ID not set

14dcaa8

mickadoo added a commit that referenced this issue Jul 6, 2017

#54 Add test for selecting orphaned values

3d8ede3

mickadoo added a commit that referenced this issue Jul 6, 2017

#54 Cleanup select query

28ff1ec

mickadoo added a commit that referenced this issue Jul 6, 2017

#54 Add cache and "canJoin" method to joiner

0720af1

colemanw closed this as completed Jul 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define notation for complex, serialized queries #54

Define notation for complex, serialized queries #54

totten commented May 14, 2017 •

edited

Loading

mickadoo commented Jul 12, 2017

totten commented Jul 12, 2017

mickadoo commented Jul 12, 2017 via email •

edited

Loading

mickadoo commented Jul 13, 2017

mickadoo commented Jul 17, 2017

mickadoo commented Jul 17, 2017

Define notation for complex, serialized queries #54

Define notation for complex, serialized queries #54

Comments

totten commented May 14, 2017 • edited Loading

mickadoo commented Jul 12, 2017

totten commented Jul 12, 2017

mickadoo commented Jul 12, 2017 via email • edited Loading

mickadoo commented Jul 13, 2017

mickadoo commented Jul 17, 2017

mickadoo commented Jul 17, 2017

totten commented May 14, 2017 •

edited

Loading

mickadoo commented Jul 12, 2017 via email •

edited

Loading