Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Search Implementation #9775

Merged
merged 48 commits into from
Mar 7, 2020
Merged

Azure Search Implementation #9775

merged 48 commits into from
Mar 7, 2020

Conversation

bryevdv
Copy link
Contributor

@bryevdv bryevdv commented Feb 7, 2020

Very early WIP, current status: sync implementation sans paging (Or tests, or samples, I have been iterating in a notebook)

cc @johanste @lmazuel would appreciate a quick look for any early problems before I move on to add paging, tests, and async.

@bryevdv
Copy link
Contributor Author

bryevdv commented Feb 7, 2020

One immediate question: Just calling as_dict on the swagger search result model inserts fields for score and highlights:

>>> query = SearchQuery(search_text="Arcadia", highlight_fields="HotelName")
>>> query.select("HotelName", "Category", "Rating")
>>> query.order_by("Rating desc")
>>> client.get_search_results(query)

[{'Category': 'Suite',
  'HotelName': 'Arcadia Resort & Restaurant',
  'Rating': 3.5,
  'score': 0.4719418,
  'highlights': {'HotelName': ['<em>Arcadia</em> Resort & Restaurant']}}]

I thought this was weird and @brjohnstmsft also confirms:

Note that we don't prevent customers from naming fields "score" or "highlights", so if those annotations stay in the resulting dict, it could cause problems down the road.

How do we want to handle this in the Python SDK? Are there field names that are safe to use that we could adapt these to? Return a list of actual item objects instead of dicts? Sidecar somehow?

@brjohnstmsft
Copy link
Member

@bryevdv While I can't answer from the Python perspective, I can share prior art from Track 1 .NET and Track 2 Java. In those cases, what's returned as the results for each page is not a collection of documents, it is a collection of results, each of which has properties specifically for score, highlight, and the document itself. Accessing it might look like this in Track 1 C#:

var page = await searchIndexClient.Documents.SearchAsync<Hotel>("hello");
var results = page.Results;  // page also has facets, coverage, count, etc.
foreach (var result in results)
{
    Console.WriteLine($"Relevance score: {result.Score}");
    Console.WriteLine($"Result has {result.Highlights.Count} highlighted phrases.");
    Console.WriteLine($"Matching document: {result.Document}"); // Assuming Hotel has an intelligent implementation of ToString().
}

@bryevdv
Copy link
Contributor Author

bryevdv commented Feb 8, 2020

@brjohnstmsft That's definitely possibility. FWIW Cosmos takes the "add dict keys that can't be valid field names" approach. Would like to know what @johanste thinks about this situation, though.

@brjohnstmsft
Copy link
Member

@bryevdv You could keep everything in the dict, if you're able to leave the "@search." qualifier on the special keys. I'm not sure whether that's a better developer experience than the alternatives though.

@johanste
Copy link
Member

johanste commented Feb 8, 2020

FWIW Cosmos takes the "add dict keys that can't be valid field names" approach. Would like to know what @johanste thinks about this situation, though.

How prominent is the application usage of these fields? Making it significantly easier to get the values make sense if they are almost always used, but if it is in the less-than-25% case (picking a random number here), I think it is ok to keep them in the dict with the "@search" prefix. After all, that is how we expect people to get to the data that we are almost certain that they will use (e.g. the actual document data).

@brjohnstmsft
Copy link
Member

@johanste Although we don’t have telemetry to measure the usage of those properties, I can speculate based on what they are used for.

The document score is probably rarely used, except for trying to debug why the ranking of results isn’t what was expected. The value of the score is only meaningful in the context of a single set of results, and is a relative value, not an absolute value. So there is no meaningful calculation you can do with it.

On the other hand, hit highlights are very useful for rendering search results in an application. We get enough customer questions and feedback about highlighting that I would say it is a common feature, which in my mind would justify making it a dedicated property.

We will also be adding a third annotation to this part of the response very soon. It will contain extracted features that will be useful for machine learning based ranking models. As this is a new and experimental feature, we don’t really know how much traction it’s going to get at this point. That said, making it a property of its own would make it more discoverable.

@bryevdv
Copy link
Contributor Author

bryevdv commented Feb 14, 2020

Per discussion with @brjohnstmsft and others, re-organized this to be a single "azure-search" package.

@adxsdk6
Copy link

adxsdk6 commented Feb 27, 2020

Can one of the admins verify this patch?

@bryevdv bryevdv marked this pull request as ready for review March 5, 2020 17:37
@bryevdv
Copy link
Contributor Author

bryevdv commented Mar 5, 2020

There is more work to do before code freeze for preview 1 but I'd prefer to merge this sooner and build smaller PRs off of this starting point.

@bryevdv bryevdv requested a review from johanste March 5, 2020 17:37
kristapratico
kristapratico previously approved these changes Mar 5, 2020
__doc__ = AutocompleteRequest.__doc__


class SearchQuery(_QueryBase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to have a constructor that takes optional select, order_by and a list of filter tuples?

E.g.

q  = SearchQuery(select=['a', 'b'], order_by=['b desc'])

?

Or are we just adding too many options on how to create this puppy...

:dedent: 4
:caption: Get a auto-completions.
"""
if not isinstance(query, AutocompleteQuery):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should accept str like objects as well, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two arguments are mandatory, the search text and a suggester. I'd personally prefer to avoid the complication of dealing with combinations that don't make sense.

:dedent: 4
:caption: Get search suggestions.
"""
if not isinstance(query, SuggestQuery):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should accept str like objects as well, no?

brjohnstmsft
brjohnstmsft previously approved these changes Mar 6, 2020
sdk/search/azure-search/README.md Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants