-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spillover from search ticket #3106
Comments
Note: we are currently using the deprecated We should fix this |
The domains filter does not return null domains Example query on demo instance {
searchAcrossEntities(
input: {
types: [DATASET],
query: "*",
start: 0,
count: 10,
orFilters: [
{and: [{field:"domains", values: ["urn:li:domain:7d64d0fa-66c3-445c-83db-3a324723daf8"]}]}
]
}) {
start
count
total
searchResults {
entity {
urn
... on Dataset {
domain {
associatedUrn
}
}
}
}
}
} Will need to debug this further on our instance |
How the backend of search works What fields are indexed in elasticsearch?
public enum FieldType {
KEYWORD,
TEXT,
TEXT_PARTIAL,
BROWSE_PATH,
URN,
URN_PARTIAL,
BOOLEAN,
COUNT,
DATETIME,
OBJECT,
BROWSE_PATH_V2,
WORD_GRAM,
DOUBLE
} How is the Elasticsearch query built?
Observations
private static QueryBuilder buildEqualsConditionFromCriterionWithValues(
@Nonnull final String fieldName,
@Nonnull final Criterion criterion,
final boolean isTimeseries) {
if (BOOLEAN_FIELDS.contains(fieldName) && criterion.getValues().size() == 1) {
// Handle special-cased Boolean fields.
// here we special case boolean fields we recognize the names of and hard-cast
// the first provided value to a boolean to do the comparison.
// Ideally, we should detect the type of the field from the entity-registry in order
// to determine how to cast.
return QueryBuilders.termQuery(fieldName, Boolean.parseBoolean(criterion.getValues().get(0)))
.queryName(fieldName);
}
return QueryBuilders.termsQuery(
toKeywordField(criterion.getField(), isTimeseries), criterion.getValues())
.queryName(fieldName);
} |
If we need to tweak the ranking, we can customise the elasticsearch query like this |
Search highlighting in the react app uses react-highlight. The server only indicates whether there is a matched field or not. const appConfig = useAppConfig();
const enableNameHighlight = appConfig.config.visualConfig.searchResult?.enableNameHighlight;
const matchedFields = useMatchedFieldsByGroup(field);
const hasMatchedField = !!matchedFields?.length;
const normalizedSearchQuery = useSearchQuery()?.trim().toLowerCase();
const normalizedText = text.trim().toLowerCase();
const hasSubstring = hasMatchedField && !!normalizedSearchQuery && normalizedText.includes(normalizedSearchQuery);
const pattern = enableFullHighlight ? HIGHLIGHT_ALL_PATTERN : undefined;
return (
<>
{enableNameHighlight && hasMatchedField ? (
<StyledHighlight search={hasSubstring ? normalizedSearchQuery : pattern}>{text}</StyledHighlight>
) : (
text
)}
</>
); |
This issue is being marked as stale because it has been open for 60 days with no activity. Remove stale label or comment to keep the issue open. |
This issue is being closed because it has been open for a further 7 days with no activity. If this is still a valid issue, please reopen it, Thank you! |
Things that would be useful to add to the search method:
Return numAssets
Investigate why domain filters are returning things that don't have the domain set - can we control this? We can pass an addition filter for hasXXX
Add a method to get facets separately to the searchAcrossEntityCall. We need to know the domain URNs in order to pass the filters to the search call. If the URL query parameters contain only the labels then we will need to perform a lookup by label. This also gives us the option to show all possible options even when a filter is applied that excludes some of them.
The response object should make it easier to lookup facet labels by value or vice versa
There is no search highlighting yet - how do we pass this back? The default app does this client side, using react-highlight, so we would also need to this on the client side.
We need to be able to filter by ID. (Fixed by passing urn as the filter name)
When searching for a data product by name, all the datasets within it rank higher than the actual data product. We would expect this to be the top ranked result. My best guess is that datasets have more fields, and each match with the search terms increases the score of a result. We could avoid this by separating the data product and dataset queries into two queries, and mixing them on the frontend so that data products go at the top. Alternatively, we could try and customise the scoring to boost results based on
_entityType = 'DataProduct'
.The text was updated successfully, but these errors were encountered: