Azure Search Implementation #9775

bryevdv · 2020-02-07T23:13:48Z

Very early WIP, current status: sync implementation sans paging (Or tests, or samples, I have been iterating in a notebook)

cc @johanste @lmazuel would appreciate a quick look for any early problems before I move on to add paging, tests, and async.

bryevdv · 2020-02-07T23:20:16Z

One immediate question: Just calling as_dict on the swagger search result model inserts fields for score and highlights:

>>> query = SearchQuery(search_text="Arcadia", highlight_fields="HotelName")
>>> query.select("HotelName", "Category", "Rating")
>>> query.order_by("Rating desc")
>>> client.get_search_results(query)

[{'Category': 'Suite',
  'HotelName': 'Arcadia Resort & Restaurant',
  'Rating': 3.5,
  'score': 0.4719418,
  'highlights': {'HotelName': ['<em>Arcadia</em> Resort & Restaurant']}}]

I thought this was weird and @brjohnstmsft also confirms:

Note that we don't prevent customers from naming fields "score" or "highlights", so if those annotations stay in the resulting dict, it could cause problems down the road.

How do we want to handle this in the Python SDK? Are there field names that are safe to use that we could adapt these to? Return a list of actual item objects instead of dicts? Sidecar somehow?

brjohnstmsft · 2020-02-07T23:48:07Z

@bryevdv While I can't answer from the Python perspective, I can share prior art from Track 1 .NET and Track 2 Java. In those cases, what's returned as the results for each page is not a collection of documents, it is a collection of results, each of which has properties specifically for score, highlight, and the document itself. Accessing it might look like this in Track 1 C#:

var page = await searchIndexClient.Documents.SearchAsync<Hotel>("hello");
var results = page.Results;  // page also has facets, coverage, count, etc.
foreach (var result in results)
{
    Console.WriteLine($"Relevance score: {result.Score}");
    Console.WriteLine($"Result has {result.Highlights.Count} highlighted phrases.");
    Console.WriteLine($"Matching document: {result.Document}"); // Assuming Hotel has an intelligent implementation of ToString().
}

bryevdv · 2020-02-08T00:16:21Z

@brjohnstmsft That's definitely possibility. FWIW Cosmos takes the "add dict keys that can't be valid field names" approach. Would like to know what @johanste thinks about this situation, though.

brjohnstmsft · 2020-02-08T00:30:56Z

@bryevdv You could keep everything in the dict, if you're able to leave the "@search." qualifier on the special keys. I'm not sure whether that's a better developer experience than the alternatives though.

johanste · 2020-02-08T03:13:13Z

FWIW Cosmos takes the "add dict keys that can't be valid field names" approach. Would like to know what @johanste thinks about this situation, though.

How prominent is the application usage of these fields? Making it significantly easier to get the values make sense if they are almost always used, but if it is in the less-than-25% case (picking a random number here), I think it is ok to keep them in the dict with the "@search" prefix. After all, that is how we expect people to get to the data that we are almost certain that they will use (e.g. the actual document data).

brjohnstmsft · 2020-02-08T03:40:59Z

@johanste Although we don’t have telemetry to measure the usage of those properties, I can speculate based on what they are used for.

The document score is probably rarely used, except for trying to debug why the ranking of results isn’t what was expected. The value of the score is only meaningful in the context of a single set of results, and is a relative value, not an absolute value. So there is no meaningful calculation you can do with it.

On the other hand, hit highlights are very useful for rendering search results in an application. We get enough customer questions and feedback about highlighting that I would say it is a common feature, which in my mind would justify making it a dedicated property.

We will also be adding a third annotation to this part of the response very soon. It will contain extracted features that will be useful for machine learning based ranking models. As this is a new and experimental feature, we don’t really know how much traction it’s going to get at this point. That said, making it a property of its own would make it more discoverable.

sdk/search/azure-search-index/azure/search/index/_index_operation.py

sdk/search/azure-search-index/HISTORY.md

sdk/search/azure-search-index/azure/search/index/_credential.py

sdk/search/azure-search-index/azure/search/index/_generated/_version.py

sdk/search/azure-search-index/azure/search/index/_generated/models/_models.py

sdk/search/azure-search-index/azure/search/index/_generated/aio/_configuration_async.py

sdk/search/azure-search/setup.py

bryevdv · 2020-02-14T00:38:16Z

Per discussion with @brjohnstmsft and others, re-organized this to be a single "azure-search" package.

adxsdk6 · 2020-02-27T23:14:48Z

Can one of the admins verify this patch?

bryevdv · 2020-03-05T17:37:50Z

There is more work to do before code freeze for preview 1 but I'd prefer to merge this sooner and build smaller PRs off of this starting point.

sdk/search/azure-search/README.md

sdk/search/azure-search/azure/search/__init__.py

sdk/search/azure-search/azure/search/index/_credential.py

sdk/search/azure-search/samples/async_samples/sample_authentication_async.py

sdk/search/azure-search/samples/sample_crud_operations.py

sdk/search/azure-search/samples/sample_autocomplete.py

sdk/search/azure-search/samples/async_samples/sample_simple_query_async.py

sdk/search/azure-search/samples/async_samples/sample_get_document_async.py

sdk/search/azure-search/samples/async_samples/sample_filter_query_async.py

sdk/search/azure-search/samples/async_samples/sample_crud_operations_async.py

sdk/search/azure-search/samples/async_samples/sample_autocomplete_async.py

sdk/search/azure-search/azure/search/_index/_credential.py

johanste · 2020-03-06T06:28:55Z

sdk/search/azure-search/azure/search/_index/_queries.py

+    __doc__ = AutocompleteRequest.__doc__
+
+
+class SearchQuery(_QueryBase):


Does it make sense to have a constructor that takes optional select, order_by and a list of filter tuples?

E.g.

q = SearchQuery(select=['a', 'b'], order_by=['b desc'])

?

Or are we just adding too many options on how to create this puppy...

johanste · 2020-03-06T06:36:27Z

sdk/search/azure-search/azure/search/_index/_search_index_client.py

+                :dedent: 4
+                :caption: Get a auto-completions.
+        """
+        if not isinstance(query, AutocompleteQuery):


We should accept str like objects as well, no?

Two arguments are mandatory, the search text and a suggester. I'd personally prefer to avoid the complication of dealing with combinations that don't make sense.

johanste · 2020-03-06T06:37:01Z

sdk/search/azure-search/azure/search/_index/_search_index_client.py

+                :dedent: 4
+                :caption: Get search suggestions.
+        """
+        if not isinstance(query, SuggestQuery):


We should accept str like objects as well, no?

sdk/search/azure-search/README.md

johanste reviewed Feb 11, 2020

View reviewed changes

sdk/search/azure-search-index/azure/search/index/_index_operation.py Outdated Show resolved Hide resolved