Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Basic Query entry point #231

Closed
mmccool opened this issue Oct 18, 2021 · 10 comments
Closed

Create Basic Query entry point #231

mmccool opened this issue Oct 18, 2021 · 10 comments

Comments

@mmccool
Copy link
Contributor

mmccool commented Oct 18, 2021

As discussed in the F2F, we have a problem with JSONPath not being a standard in time for our expected date of REC transition. Therefore we are planning the following:

  1. Make the current JSONPath query interface non-normative. It can be implemented on an experimental basis, but implementers should be made aware it will be subject to changes in the future to bring it into alignment with the IETF standard.
  2. We can't make JSONPath mandatory if it is non-normative.
  3. It would be useful to have a mandatory query language, with a focus on simple and narrow use cases, suitable for first-time or occasional users that don't want to learn a complex system like SPARQL or Xpath, but which is still easy for TDD implementers to create (and map to what they really do internally, e.g. SPARQL or XPath).
  4. Proposal:
    • simple version for narrow use case: keyword searches
    • Purpose is to avoid simple use cases from either having to use XPath or SPARQL and/or having to read the entire directory first; example is Node-RED UI.
    • Need to identify users: Home Owner; Retail Business Owner (e.g. conv store); Smart City citizen user.
    • new entry point to avoid conflict with other query types, including JSONPath
    • Limit to keyword searches over type, description and title at the top level
    • query would be something like "/search/keyword/?query={keyword}"
    • Besides keywords, may also want time limits, e.g. an additional parameter: "?since={iso-datetime}"

Please comment below on modifications or improvements to this proposal.

@mmccool
Copy link
Contributor Author

mmccool commented Oct 18, 2021

Use case scenarios:

  • Home user wants to look at just the Thing they just added in order to set up orchestration rules using Node-RED. In this case a "since" query would find the Thing I care about.
  • Deaf home user wants to set up a rule to flash all lights in the house when the doorbell is pushed. Using Node-RED, they search for all "lights" using a keyword search for "light" which should match words both in descriptions, title, and type (e.g. saref:Light)

@k-toumura
Copy link
Contributor

In the Node-RED and directory search integration we created last year, we filtered out Things that did not have HTTP binding in their form. This is because the nodes generated by Node Generator do not support protocols other than HTTP.

I'm not sure whether this is a basic query or not, but it is one of the use cases.

@relu91
Copy link
Member

relu91 commented Nov 4, 2021

I've just found that there was already this issue trying to define a good set of requirements of this basic query language. Please read my considerations about JSONPath reported in #232 :

In the last F2F, we have discussed how to narrow the scope of TDDs to allow simpler implementations (see #208). This issue wants to explore how to reduce the feature set of JSONPath to support minimal query support for Read-only TDDs as asked in #156 (comment).

Requirements

We didn't really define a good set of requirements for this minimal query language. Here's a list of what I recall:

  • It should not require excessive computation amount
  • Fairly simple to implement
  • Support tag based searches (i.e. text search)

I am aware that those requirements are a little bit fuzzy, maybe we should work on this in this issue.

Limit JSONPath

A possible approach to define this minimal query language is to restrict JSONPath features to be less computational heavy and handier to our use case. I would start by limiting the root sintax to this rule:

json-path = root-selector *( dot-selector / index-selector / filter-selector)

This basically allows users to select subsets of TDs (similar to a json pointer implementation) plus filtering. We can limit this even more:

json-path = root-selector *( filter-selector)

Which means that users are only allowed to filter the TDD collection of TDs. In principle, this would simplify a little bit the implementation cause it will not need to support the selection and manipulation of the query result set. However, this is my conjecture and it might be actually tested/proofed with concrete data.

About supported features, we could support free text searches with JSONPath through RegEx support. However, RegEx syntax is sadly marked as TBD (it is marked as a possible future consensus in the feature matrix). Without it we can only have exact text matches in the filter selector, which might be still ok for searches like: "look for TDs with @type equal to Sensor".

Another decision point is if we want to limit also the filter-selector, but it depends on what we want to achieve:

  • Support only simple root "@type", "title", "description", exact tag filtering:
    • We need to fix the allowed path in the filter-selector
    • O(1) complexity -> a simple map (@type -> TD) can be used to resolve the query
  • Support nested "@type", "title", "description", exact tag filtering
    • Similar to the previous one, but more memory is required
  • Support RegEx text (if ever defined in the spec) searches on a set of predefined properties :
    • It's at least O(n) operation with N equal to the number of TDs.
  • Anything else?

Reference: https://www.ietf.org/archive/id/draft-ietf-jsonpath-base-02.html

@benfrancis
Copy link
Member

Although search is obviously useful, I'd like to propose that none of the search APIs should be mandatory. For some use cases (like a smart home hub) where there's usually a small number of devices (e.g. 10) it may not actually be necessary.

Anecdotally, I'm also mindful that retrofitting any of these search APIs to a simple SQL database storing whole JSON resources (as is the case with the local SQLite database used by WebThings Gateway) could be quite challenging and it would be a shame for this to block us from being compliant with the rest of the Directory Service API.

@relu91
Copy link
Member

relu91 commented Nov 8, 2021

Even though I presented the above results about JSON Path, I have to agree with @benfrancis to not have any search API mandatory. For two reasons:

  • It creates complexity in our spec document: we two define the requirements of this new basic query lang and maintain it. Even if we restrict json path functionalities, we'd have to regularly check JSON path spec to see if we are up to date.
  • I'm not 100% convinced about the use cases. Are we sure that in-home automation those filter capabilities are really required? Can't a client filter 20/50 json objects by himself? 🤔

@farshidtz
Copy link
Member

I agree. IMO, a basic but mandatory query endpoint is only appropriate if it solves the necessary requirements of all use cases. The use cases need to be formally defined and reviewed as every other WoT use case.

I insist on what was proposed above:

Make the current JSONPath query interface non-normative. It can be implemented on an experimental basis, but implementers should be made aware it will be subject to changes in the future to bring it into alignment with the IETF standard.

And to remove XPath, since it was added as a fallback for if JSONPath doesn't make it into a standard. In fact it hasn't, but a no-syntactic search fallback seems more plausible. Once JSONPath does become a standard, we can make it an optional search feature along with the existing optional SPARQL.

@mmccool
Copy link
Contributor Author

mmccool commented Nov 15, 2021

Discussion (Nov 15):

  1. IETF RFC for JSON Path is not ready. In particular the draft does not include regexs, need for substring search use cases noted above (searching for keywords in descriptions).
  2. Implementers are pushing back on full regex support
  3. Queries can in general just return ids... if what I want to do is get the full TDs. Are there use cases that collect other information from TDs, e.g. extracting just the endpoints for lightbulbs? Do we care about these use cases for a "basic" query mechanism?

Options are:

  1. Drop JSONPath, replace it with nothing. XPath 3.0 and SPARQL remain optional, so there is no mandatory query mechanism. In this case, a consumer needs to fetch all the TDs and filter them itself in general. Downside: could be large TDDs with no query mechanism. Mitigation: establish requirements for use of a query language. For instance, we could say TDDs that support more than 100 TDs SHOULD support a query language. The spec should include examples of use of XPath and SPARQL for common use cases (e.g. substring query).
  2. Support a very limited subset of JSON Path, e.g. exact string matching. We still need a set of use cases to drive what the subset should be. Give a list of "selectors". But the "filter selector" class is quite large, and includes (the currently unsupported) regex selector.
  3. Design a custom set of API entry points for query functionality for specific use cases, e.g. searching for TDs with certain types, TDs in a certain date range, and with certain keywords in descriptions and titles.

Notes:

  1. JSONPath does not seem to have a timedate comparison operator, which would be useful.
  2. Regex compare is complex, but simple substring compares could be implemented as an extension of the "in" operation.

Consensus:
Let's at least do 1 for now; however, let's keep JSON Path but make it non-normative (and, of course, non-mandatory). Examples can be added in a separate PR. Later on we can attempt 2 or 3.

@mmccool
Copy link
Contributor Author

mmccool commented Nov 22, 2021

Upon re-reading the comments above, I noticed one other suggestion made by @benfrancis - dropping XPath. It's optional, and we agreed to make all query mechanisms optional, so I'm not sure what dropping it would accomplish other than making the spec simpler (and reducing testing). IMO only having SPARQL would be annoying for simple use cases, XPath 3.0 is basically equivalent to JSONPath but is an actual standard (and has substring search). But let's discuss and come to a consensus on this... we do only have one implementation so at minimum we need to mark it as "at risk". Also we need to look into some details around available implementations (see for instance this discussion), whether we should specify XPath 3.1, etc.

@benfrancis
Copy link
Member

@mmccool wrote:

Upon re-reading the comments above, I noticed one other suggestion made by @benfrancis - dropping XPath.

For the record I didn't suggest that, I suggested making all search mechanisms optional which seems to be what has been agreed.

You may be referring to the comment by @farshidtz.

@mmccool
Copy link
Contributor Author

mmccool commented Nov 22, 2021

Propose closing: Consensus is NOT to create a separate "basic" query language, which was the original point of this issue.

Let's spin off other points, e.g. the XPath discussion, into their own issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants