Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) 'Contains' operation for searching multi-select custom fields with the API. #6612

Closed

Conversation

johanv
Copy link
Contributor

@johanv johanv commented Aug 27, 2015

I had to rebase this on my PR for CRM-17101 (#6611).

I implemented a 'contains' operator for searching values in multi-select custom fields.
I'm not sure whether this is a useful enhancement for the API, I'll leave that to you. It covers my use case, so I might as well upload it :-)

(Update: This PR is related to CRM-17096, which might be unclear because I mentioned another issue as well.)



----------------------------------------
* CRM-17101: civicrm_api3_basic_get does not work for SQL operators on custom fields.
  https://issues.civicrm.org/jira/browse/CRM-17101
----------------------------------------
* CRM-17101: civicrm_api3_basic_get does not work for SQL operators on custom fields.
  https://issues.civicrm.org/jira/browse/CRM-17101
This is what I would like to do.

----------------------------------------
* CRM-17096: A 'contains' operation for the api to search multiselect custom values.
  https://issues.civicrm.org/jira/browse/CRM-17096
----------------------------------------
* CRM-17096: A 'contains' operation for the api to search multiselect custom values.
  https://issues.civicrm.org/jira/browse/CRM-17096
@colemanw
Copy link
Member

One question I have about this is what happens if you try to use the CONTAINS operator on a non-multivalued field?
I'm guessing it won't work. Also, from your description, the LIKE operator won't work on a multivalued field.
So just a thought, but instead of inventing a new sql verb, maybe we could just get the LIKE operator to work correctly for multivalued fields using this code when appropriate?

@johanv
Copy link
Contributor Author

johanv commented Aug 28, 2015

what happens if you try to use the CONTAINS operator on a non-multivalued field?
I'm guessing it won't work.

It won't work. Just like when you use a = operator on a multivalued field, that won't work either. Maybe we should throw an exception in both cases.

So just a thought, but instead of inventing a new sql verb, maybe we could just get the LIKE operator to work correctly for multivalued fields using this code when appropriate?

When I am interested in entities that have some value selected in a multi-select custom field, I prefer to write

'custom_15' => array('CONTAINS' => 'FI')

instead of

'custom_15' => array('LIKE' => '%' . CRM_Core_DAO::VALUE_SEPARATOR . "FI" . CRM_Core_DAO::VALUE_SEPARATOR . '%')

Besides, the consumer of the API should not have to care about how multi-select custom fields are implemented. IMO, you should be able to use the API without needing CRM_Core_DAO::VALUE_SEPARATOR.

@colemanw
Copy link
Member

IMO, you should be able to use the API without needing CRM_Core_DAO::VALUE_SEPARATOR.

Absolutely agree. My suggestion was actually to make

'custom_15' => array('LIKE' => 'FI')

work correctly using your code, which would imo make the api a bit more transparent, since you would be able to use the same verb on either single or multi-valued fields, and not have to care about how the api works internally.

@totten
Copy link
Member

totten commented Aug 28, 2015

Agree that it's really good to hide VALUE_SEPARATOR / implementation-details from the API consumer.

@totten
Copy link
Member

totten commented Aug 28, 2015

The semantics of "LIKE=>FI" are different from "CONTAINS=>FI". You would expect that "LIKE=>FI" matches a record with value "FIDO", but "CONTAINS=>FI" should not match "FIDO".

@johanv
Copy link
Contributor Author

johanv commented Aug 28, 2015

Using

'custom_15' => array('LIKE' => 'FI')

can cause problems when one option value is a substring of another one. Which seems bad practice, I agree, but it is not impossible to define it like that.

'sequential' => 1,
));

// Clean up first, then assert.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this deals with the edge-case where the test fails -- you still need to cleanup even if there's a failure.

As a nitpick for future reference (not worth a special project on its own, but something for any readers to note for the future)... there's still a risk that other parts will fail (eg callApiSuccess() in L478 could throw an exception). This pattern should be a bit safer:

function testFoo() {
  $this->custom['foo'] =  $this->callApiSuccess('CustomField', 'Create', ...);
}
function tearDown() {
   foreach (array_reverse($this->custom) as $field) {
    $this->callApi('CustomField', 'Delete', ...);
   }
}

@totten
Copy link
Member

totten commented Aug 28, 2015

The PR adds a feature -- and includes unit-tests! What bliss!

@colemanw
Copy link
Member

Ok, not trying to hold up this excellent contribution, just think it through a bit. To clarify a minor misconception above, the api does not automatically append wildcards, so if you have a single-valued field custom_1 with options ['BE', 'BEE', 'BEER'] you could do:

'custom_1' => array('LIKE' => 'BE')

and it would only return fields with value "BE" (not BEE or BEER). Whereas if you entered

'custom_1' => array('LIKE' => 'BE%')

you'd get results with any of the three options. Well and good.

Now for a multivalued field. Say custom_2 has options ['one', 'two', 'three']. Records might contain any combination of these options, so how to search for them? Some ideas:

  1. To find empty records:

    'custom_2' => ''
    
  2. To find records that contain only 'two' (note this doesn't involve wildcards so it doesn't matter if there's another option named 'twofold'):

    'custom_2' => array("LIKE" => 'two')
    
  3. To find records that contain the option 'one' (among others perhaps)

    'custom_2' => array('CONTAINS' => 'one')
    
  4. To find records that contain one and two (perhaps three as well)

    'custom_2' => array('CONTAINS' => array('one', 'two'))
    
  5. To find records that are exactly one and two (not three)

    'custom_2' => array("LIKE" => array('one', 'two'))
    
  6. To find records that contain one or two:

    ???
    

So I think this PR fits into that overall picture, adding the 3rd syntax. The others, afaik, are still missing. Note: I've backtracked on using "LIKE" and "CONTAINS" interchangeably because it doesn't carry exactly the same meaning. Thoughts?

@totten
Copy link
Member

totten commented Aug 28, 2015

I like the idea of differentiating (a) exact matches (2+5) and (b) partial matches (3+4+6).

For exact matches, = or IS might be more intuitive.

Or, if we wanted nomenclature that's got more symmetry to it, maybe verbs CONTAINS-ALL, CONTAINS-ANY, CONTAINS-EXACTLY?

@johanv
Copy link
Contributor Author

johanv commented Aug 29, 2015

The PR should also implement (4), but I did not test this yet. I did it this way, because usually when you combine filters in the API, they are combined with 'AND'.

No matter what syntax we'll end up using, I think the cases @colemanw listed out are a good basis for the documentation of all this.

p.s.: I always forget to add wildcards in my sql like-expressions as well ;-)

@monishdeb
Copy link
Member

As per the 6 cases listed by @colemanw we can resolve the same treating value(s) as pattern for RLIKE eventually, as we are already doing in case of n-nary operators https://github.com/civicrm/civicrm-core/blob/4.6/CRM/Core/BAO/CustomQuery.php#L404, and in this way it will satisfy all the 6 cases as follows :-

  1. 'custom_2' => ''
    Already we have a EMPTY pseudo-operator for this 'custom_2' => array(EMPTY => 1)
  2. 'custom_2' => array("LIKE" => 'two')
    get build into 'custom_2' RLIKE '[[:<:]]two[[:>:]]'
  3. 'custom_2' => array('CONTAINS' => 'one')
    build into 'custom_2' RLIKE '[[:ctrl:]]*two[[:ctrl:]]*'
  4. 'custom_2' => array('CONTAINS' => array('one', 'two'))
    build into 'custom_2' RLIKE '[[:ctrl:]]*one[[:ctrl:]]*two[[:ctrl:]]*'

5.'custom_2' => array("LIKE" => array('one', 'two'))
build into 'custom_2' RLIKE '[[:<:]]one[[:ctrl:]]two[[:>:]]'
6. contain one or two:'custom_2' =>array('IN' => array('one', 'two'), 'wildcard' => 1)
as per https://github.com/civicrm/civicrm-core/blob/4.6/CRM/Core/BAO/CustomQuery.php#L406
it will build into 'custom_2' RLIKE '[[:cntrl:]]*one[[:cntrl:]]*|[[:cntrl:]]*two[[:cntrl:]]*'

So as per the changes we don't need to use to wildcard '%' (as value may itself contain '%' though seems peculiar) but the api param should be formatted in way that it will receive '$wildcard = 1' as in https://github.com/civicrm/civicrm-core/blob/4.6/CRM/Core/BAO/CustomQuery.php#L339
that it will support 'wildcard as key' option along with 'operator as key' like mentioned in 6 . On otherhand I think such special multi-value custom field that uses special character as n spectator should be handled via regex pattern so handle special scenario like in (1+5).

@johanv
Copy link
Contributor Author

johanv commented Aug 31, 2015

I am not familiar with rlike, but suppose that custom_2 would have options ['one', 'one-b', 'two'] selected, would this still work?

'custom_2' => array('CONTAINS' => array('one', 'two'))
build into 'custom_2' RLIKE '[[:ctrl:]]*one[[:ctrl:]]two[[:ctrl:]]*'

And what about

'custom_2' => array('CONTAINS' => array('two', 'one'))

?

@monishdeb
Copy link
Member

Yes it does.
Lets for example these are the multi-value records in custom table
total-values
As per your query 'custom_2' => array('CONTAINS' => array('one')) it will return
tregex

SELECT * FROMcivicrm_value_constituent_information_1WHEREtest_2_7REGEXP '[[:cntrl:]]*one[[:cntrl:]]*'

@monishdeb
Copy link
Member

@johanv I have updated case 4 'custom_2' => array('CONTAINS' => array('one', 'two')) build clause, as it was wrong

@JoeMurray
Copy link
Contributor

@digitlimit would you be interested in reviewing this patch?

Copy link
Member

@colemanw colemanw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this PR needs additional work before merging.

@litespeedmarc litespeedmarc changed the title 'Contains' operation for searching multi-select custom fields with the API. (WIP) 'Contains' operation for searching multi-select custom fields with the API. Sep 27, 2016
@litespeedmarc litespeedmarc added the needs-work-not-review-ready Submitter required to take action, will be closed after 1-2 weeks in this state label Sep 27, 2016
@jitendrapurohit
Copy link
Contributor

jitendrapurohit commented Apr 25, 2017

@eileenmcnaughton Can we close this as it hasn't progressed from more than a year? Also, one can always re-open this when anyone feels to work on it.

@eileenmcnaughton
Copy link
Contributor

Yes - I agree - this might be part of apiv4 ....I'll make sure there IS a link on the JIRA so all this good discussion / work is still accessible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
master needs-work-not-review-ready Submitter required to take action, will be closed after 1-2 weeks in this state
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants