[RFC] Search User Behavior Logging and Data Reuse for Relevance #4619

macohen · 2022-09-28T16:08:44Z

What/Why

What are you proposing?

Currently, there is no way for users of OpenSearch to get a full picture of how search is being used without building their own logging and metrics collection system. This is a request for comments to the community to discuss needs for a standardized logging schema & collection mechanism. We want to work with the community to understand where we can make the most impactful improvements to help the most users in understanding how search is used in their applications and how they can tune results most effectively.

We believe that application builders using OpenSearch for e-commerce, product, and document based search have a common set of needs in how they collect and expose data for analytics and reuse. Regarding analytics, we believe builders, business users, and relevance engineers want to see metrics out of the box for any search application like top queries, top queries resulting in a high value action (HVA - like a purchase, stream, download, or whatever the builder defines), top queries with zero results, top abandoned queries, as well as more advanced analytics like similar queries in the long tail that may be helped by synonyms, query rewrites/expansion or other relevance tuning techniques. This same data can also be re-used to feed manual judgement and automated learning to improve relevance in the index.

What users have asked for this feature?

Highlight any research, proposals, requests or anecdotes that signal this is the right thing to build. Include links to GitHub Issues, Forums, Stack Overflow, Twitter, Etc

What problems are you trying to solve?

Template: When <a situation arises> , a <type of user> wants to <do something>, so they can <expected outcome>. (Example: When searching by postal code, a buyer wants to be required to enter a valid code so they don’t waste time searching for a clearly invalid postal code.)_

When any search results are returned, search application builders want to report on the top requested queries so that they can learn about what their users intend to find.
When users search for content, a search relevance engineer wants to feed behavioral data back into the search system for automatic reranking.
When users search for content, a search relevance engineer wants to feed behavioral data back into the search system for manual tuning of search results.

What is the developer experience going to be?

Does this have a REST API? If so, please describe the API and any impact it may have to existing APIs. In a brief summary (not a spec), highlight what new REST APIs or changes to REST APIs are planned. as well as any other API, CLI or Configuration changes that are planned as part of this feature.

Allow the user to submit an optional field containing the original, user typed query. Track that original query through all steps of querying the index: user typed 1) query -> 2) rewritten query -> 3) results from OpenSearch -> 4) reranked results outside of OpenSearch -> 5) actions taken by the end users (query again, abandon search, some other high value action).
Initially, we are focused on adoption so even if we started from the inside out with [PURIFY] Remove all trace of XPack code from High Level Rest Client #2 and [PURIFY] Remove docs directory #3 above, it would be helpful. The API change would be providing a place in the query DSL to optionally submit the original query. We could build that in as well, but only include it in logging and analysis if it is there.

Are there any security considerations?

Describe if the feature has any security considerations or impact. What is the security model of the new APIs? Features should be integrated into the OpenSearch security suite and so if they are not, we should highlight the reasons here.

New data will be logged inside OpenSearch. Possible injection attacks could occur.

Are there any breaking changes to the API

If this feature will require breaking changes to any APIs, ouline what those are and why they are needed. What is the path to minimizing impact? (example, add new API and deprecate the old one)

What is the user experience going to be?

Describe the feature requirements and or user stories. You may include low-fidelity sketches, wireframes, APIs stubs, or other examples of how a user would use the feature via CLI, OpenSearch Dashboards, REST API, etc. Using a bulleted list or simple diagrams to outline features is okay. If this is net new functionality, call this out as well.

Are there breaking changes to the User Experience?

Will this change the existing user experience? Will this be a breaking change from a user flow or user experience perspective?

No breaking changes

Why should it be built? Any reason not to?

Describe the value that this feature will bring to the OpenSearch community, as well as what impact it has if it isn't built, or new risks if it is. Highlight opportunities for additional research.

Building this feature will standardize a set of reporting and data collection needs that are common across search applications and allow software engineers and relevance engineers to focus on higher level concerns out of the box like tuning queries, query rewriting, synonyms, and results reranking.
If it isn't built, users will either have no insights into search results and how to tune them, they will keep building analytics and data collection applications without getting an understanding of what is happening inside OpenSearch.
If it is built, one technical concern is trade offs between adding latency to OpenSearch and adding complexity to the platform. Logging every request and each step like rewrites, results returned from the index, reranking, and HVAs could have impact on an OpenSearch cluster if we decide to do all of this in OpenSearch. On the other hand adding a whole new set of infrastructure to deal with this level of data collection, even with a separate OpenSearch cluster adds complexity to the architecture.

What will it take to execute?

Describe what it will take to build this feature. Are there any assumptions you may be making that could limit scope or add limitations? Are there performance, cost, or technical constraints that may impact the user experience? Does this feature depend on other feature work? What additional risks are there?

Any remaining open questions?

What are known enhancements to this feature? Any enhancements that may be out of scope but that we will want to track long term? List any other open questions that may need to be answered before proceeding with an implementation.

Questions for the Community

Do you have first (homegrown) or third party analytics tools like Google Analytics, Adobe, or others? Would it make sense for us to connect the logging and metrics we propose to deliver inside OpenSearch with the clickstream/application metrics you have in those other systems?

Review & Validate this Proposal for tracking data through OpenSearch: opensearch-project/search-processor#12

macrakis · 2023-01-20T22:39:04Z

I would split this into three parts:

Collecting and storing the basic data.
- Some users will just use existing tools to parse and analyze the data.
- This is useful even if the remaining steps aren't complete.
Making it available in an easily queryable form.
- If it were easily queryable (using SQL or DSL or whatever), that would be fantastic for all kinds of analysis and reporting.
Building some standard reports.
- Once the data is in an easily queryable form, it should be easy enough to develop standard reports using OpenSearch's standard reporting tools.

macrakis · 2023-01-20T22:41:31Z

Re "Track that original query through all steps of querying the index: user typed 1) query -> 2) rewritten query -> 3) results from OpenSearch -> 4) reranked results outside of OpenSearch -> 5) actions taken by the end users (query again, abandon search, some other high value action)."
For relevance evaluation, it is generally not all that useful to track the intermediate steps, and as you say, it potentially has very high overhead. In any case, if the configuration of the search pipeline is well-defined, then the intermediate stages should be recoverable by re-running the query.
Recording the intermediate stages is no doubt helpful for debugging, but it is not central to logging for relevance tuning. Making basic logging efficient is very important, because you'd like to always run with it on, but logging of intermediate stages presumably only gets instantiated for debugging, and so doesn't need to be very efficient.

macohen · 2023-02-03T17:12:43Z

I think I get what you mean and it helps to refine the ideas. I do agree that recording intermediate stages is not the highest priority release immediately along with the logging done outside OpenSearch. I think there's a trade-off we need to consider in logging the debugging information and we should explore this more: either we're going to go down the path of making sure the query as it ran is recoverable which means making sure we have the right versions of plugins, analyzers, indices, OpenSearch itself, the QueryDSL, rerankers, etc. Once we're talking about external rerankers, then there are more variables that we don't control. The other option would be to create a scalable system for optionally logging everything so we know what happens at every step. That will give more info, but is certainly harder to scale. Looking for more feedback and options here as well.

sathishbaskar · 2023-02-04T20:56:35Z

When running analytics app on time series data, the app owner wants to look at query count by shards and time windows to plan how to break the index and replicate further for additional throughput.
When running analytics app on time series data, the app owner wants to look at unique query patterns (e.g. filters excluding values) and their resource usage - memory used, cpu time usage, io usage, no. of segments/shards/indices hit etc, to plan which query patterns can be split into a replica cluster.

macrakis · 2023-02-06T16:59:12Z

For now, we're focusing on collecting the data for relevance tuning. In our first stage, we're looking at end-to-end behavior (user query to user actions). In our second stage, we'll be looking at the search pipeline. In both those cases, we'll be collecting data for high-level performance statistics (latency end-to-end, latency per pipeline stage), but it a future enhancement could collect shard-level data.

macrakis · 2023-02-07T14:50:31Z

To elaborate on "high value action (HVA - like a purchase, stream, download, or whatever the builder defines)", here are some actions/events that a user might want to track either within a session or beyond it:

Hover-over (presumably showing some additional information)
Unhide details (maybe more than one type/level of this), UI might be “More…” or “…” (e.g. display abstract of document)
Clickthrough to detail page (metadata, helps to determine whether the content is useful)
Clickthrough to content/display/play/download page (the thing itself -- the user is consuming the content)
Play content (hit the PLAY button)
Buy content
Add to cart from SRP (search results page)
Add to cart from detail page
Add as contact/friend
Communicate (send message)
Remove from cart
Bookmark / add to wish list
Purchase from SRP (one-click)
Purchase/checkout from cart
Click on "related" product
Rating (upvote/downvote)
Write review
Apply to job / submit proposal to buyer
Be hired for job / buyer accepts proposal

The user should be able to define their own event types as well.

Search systems exist in many domains, with different object types (product, document, person, company, ...) and different actions (buy, read, contact, apply for job, ...). Should we try to unify actions, e.g., "add as friend" = "purchase"?

Should "add to cart from SRP" vs "from detail page" be different event types or the same event type, distinguished by page type (where is that recorded?).

Should we try to align with others' definitions of actions, e.g., Google Analytics recommended events (only some of which are relevant to search)? Is there an industry standard or convention we should be following?

Can events have additional information like "dollar value of action" -- that's mostly a generic analytics issue, but even for search analytics, there may be differences in user behavior around high-priced and low-priced items.

Do we need to provide explicit support for multi-dimensional events (action=buy, pagetype=detail) or a hierarchy of events (buy is a supercategory of buy-on-srp and buy-on-detail-page)? Or should we leave this to the user?

reta · 2024-01-24T17:41:17Z

May be somewhat related to #72

smacrakis · 2024-01-24T17:44:48Z

@reta Thanks for the comment. I think #72 is more about changing the results, while this issue is about measuring the results and user interaction with them both for analytics and as input to machine learning.

ansjcy · 2024-01-24T20:59:54Z

Great proposal! I believe those are very valid user stories and adding supports mentioned in the RFC will definitely improve the visibility and the analytics experience. Also, the proposal has a bunch of overlaps with several query insights features we are building now. On a high level, in query insights we want to build a generic data collection, processing and exporting framework, adding support for query level recommendations, and also query insights dashboards to help users have better visibility into the search performance.

we believe builders, business users, and relevance engineers want to see metrics out of the box for any search application like top queries, top queries resulting in a high value action (HVA - like a purchase, stream, download, or whatever the builder defines), top queries with zero results, top abandoned queries

We are trying to cover those use cases with the Top N Queries feature! in 2.12 we are releasing the latency based top queries feature, but we will add more dimensions (like CPU, JVM usage) in the future releases. Also, "top queries with zero results, top avandoned queries" are great use cases we can consider building into the feature as well :).

more advanced analytics like similar queries in the long tail that may be helped by synonyms, query rewrites/expansion or other relevance tuning techniques

Good point! I believe finding "similar" queries, and cluster those similar queries will be super useful. It would be a valuable information to the user, furthermore, we can build query cost estimation if we have a robust query clustering method. it will facilitate a bunch of other features like query rewrite, query sandboxing and tiered caching as well, since knowing "how expensive the query would be" can be a super important metrics for those features.

Building this feature will standardize a set of reporting and data collection needs that are common across search applications and allow software engineers and relevance engineers to focus on higher level concerns out of the box like tuning queries, query rewriting, synonyms, and results reranking.

These components are actually built in the query insights framework, if would be great if we can reuse some of them.
#11429

Logging every request and each step like rewrites, results returned from the index, reranking, and HVAs could have impact on an OpenSearch cluster if we decide to do all of this in OpenSearch

Agreed! we should be careful about this and do thorough evaluations of factors like feature availability, recommendation SLA, and cost when determining what component to choose for a certain feature.

jzonthemtn · 2024-01-30T13:44:54Z

Related to #12084

macohen · 2024-01-30T16:13:13Z

Let's use this RFC as a point of historical reference for #12084

macohen self-assigned this Sep 28, 2022

saratvemulapalli added untriaged Indexing & Search enhancement Enhancement or improvement to existing feature or request feature New feature or request and removed enhancement Enhancement or improvement to existing feature or request labels Sep 29, 2022

anasalkouz removed the untriaged label Oct 12, 2022

macohen moved this from 🆕 New to Next (Next Quarter) in Search Project Board Oct 27, 2022

macohen changed the title ~~[Feature Proposal] Search Logging Metrics/Monitoring~~ [RFC] Search Logging Metrics/Monitoring Dec 6, 2022

macohen changed the title ~~[RFC] Search Logging Metrics/Monitoring~~ [RFC] Standard Search Logging Metrics and Data Reuse for Relevance Dec 6, 2022

macohen moved this from Next (Next Quarter) to 👀 In review in Search Project Board Feb 3, 2023

macohen changed the title ~~[RFC] Standard Search Logging Metrics and Data Reuse for Relevance~~ [RFC] Search User Behavior Logging and Data Reuse for Relevance Feb 24, 2023

macohen added Search Search query, autocomplete ...etc and removed Indexing & Search labels Mar 27, 2023

msfroh mentioned this issue Jul 25, 2023

[RFC] Conversations and Generative AI in OpenSearch opensearch-project/ml-commons#1150

Closed

anasalkouz added the Search:Relevance label Sep 20, 2023

jzonthemtn mentioned this issue Jan 30, 2024

[RFC] User Behavior Insights #12084

Closed

macohen closed this as not planned Won't fix, can't repro, duplicate, stale Jan 30, 2024

github-project-automation bot moved this from 👀 In review to ✅ Done in Search Project Board Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Search User Behavior Logging and Data Reuse for Relevance #4619

[RFC] Search User Behavior Logging and Data Reuse for Relevance #4619

macohen commented Sep 28, 2022 •

edited

Loading

macrakis commented Jan 20, 2023

macrakis commented Jan 20, 2023 •

edited

Loading

macohen commented Feb 3, 2023

sathishbaskar commented Feb 4, 2023

macrakis commented Feb 6, 2023

macrakis commented Feb 7, 2023 •

edited

Loading

reta commented Jan 24, 2024

smacrakis commented Jan 24, 2024

ansjcy commented Jan 24, 2024 •

edited

Loading

jzonthemtn commented Jan 30, 2024

macohen commented Jan 30, 2024

[RFC] Search User Behavior Logging and Data Reuse for Relevance #4619

[RFC] Search User Behavior Logging and Data Reuse for Relevance #4619

Comments

macohen commented Sep 28, 2022 • edited Loading

What/Why

What are you proposing?

What users have asked for this feature?

What problems are you trying to solve?

What is the developer experience going to be?

Are there any security considerations?

Are there any breaking changes to the API

What is the user experience going to be?

Are there breaking changes to the User Experience?

Why should it be built? Any reason not to?

What will it take to execute?

Any remaining open questions?

Questions for the Community

Review & Validate this Proposal for tracking data through OpenSearch: opensearch-project/search-processor#12

macrakis commented Jan 20, 2023

macrakis commented Jan 20, 2023 • edited Loading

macohen commented Feb 3, 2023

sathishbaskar commented Feb 4, 2023

macrakis commented Feb 6, 2023

macrakis commented Feb 7, 2023 • edited Loading

reta commented Jan 24, 2024

smacrakis commented Jan 24, 2024

ansjcy commented Jan 24, 2024 • edited Loading

jzonthemtn commented Jan 30, 2024

macohen commented Jan 30, 2024

macohen commented Sep 28, 2022 •

edited

Loading

macrakis commented Jan 20, 2023 •

edited

Loading

macrakis commented Feb 7, 2023 •

edited

Loading

ansjcy commented Jan 24, 2024 •

edited

Loading