Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(browse): Fix browse pagination and multi-browse path issue #2984

Merged
merged 11 commits into from
Jul 30, 2021

Conversation

dexter-mh-lee
Copy link
Contributor

Fixed browse pagination bug, where pagination just doesn't work.
Also fixed the multi-browse path issue where a group entry is shown when it is actually an entity.

Also added two end points, getNumEntities and batchGetNumEntities to efficiently get number of entities per entity type.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)


return new BrowseResult().setMetadata(browseResultMetadata)
.setEntities(new BrowseResultEntityArray(browseResultEntityList))
.setNumEntities(numGroups + numEntities)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be numElements or total or some other name that indicates it is a sum of entities + groups?

I would expect numEntities to be numEntities + group.map(group -> group.count).sum()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the naming here is confusing... I'd agree I'd think num entities would be either

a. The number of concrete entities matching the exact path OR
b. The total number of entities somewhere under this path

In fact, it may be useful to include both of these in the response separately.

Can we nip this in the bud with this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately BrowseResult PDL is in datahub-gms right now. We can make the above changes after moving them over here.

Comment on lines 88 to 90
final SearchResponse entitiesResponse =
client.search(constructEntitiesSearchRequest(indexName, path, requestMap, from, size),
client.search(constructEntitiesSearchRequest(indexName, path, requestMap, entityFrom, entitySize),
RequestOptions.DEFAULT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small optimization available here- if entitySize is 0 we don't have to do this query

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or do we need to for counts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I was going to do that until you mentioned we need the whole counts

@@ -210,4 +202,17 @@
return null;
});
}

@Action(name = "getNumEntities")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to disrupt this PR too much - but what do you think about naming this getTotalEntityCount? getNumEntities is a bit too vague IMO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

*
* @param entityName name of the entity
*/
long docCount(@Nonnull String entityName);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this hold up given Timeseries aspects? Can we call this getSnapshotDocCount or something of the type to illustrate this divergence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeseries service doesn't implement SearchService. This is specific to search backend (which is only used by versioned aspects)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh okay got it

Comment on lines 396 to 402
.type("BrowseResults", typeWiring -> typeWiring
.dataFetcher("entities", new AuthenticatedResolver<>(
new EntityTypeBatchResolver(
ENTITY_TYPES.stream().collect(Collectors.toList()),
(env) -> ((BrowseResults) env.getSource()).getEntities()))
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note- we should really put these two in a configureSearchAndBrowseResolvers...

Copy link
Contributor

@gabe-lyons gabe-lyons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!!

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@shirshanka shirshanka merged commit 15c0c4d into datahub-project:master Jul 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants