Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(search): refactor NUM_RETRIES in esindexbuilder to be configurable #3870

Conversation

senni0418
Copy link
Contributor

@senni0418 senni0418 commented Jan 11, 2022

Checklist

@senni0418
Copy link
Contributor Author

This pull request is to make the NUM_RETRIES constant in the ESIndexBuilder configurable as an environment variable. The original number is set to 3 which causes the reindex failure when our team build the project. This is because our datahub has too many metadata to reindex and the NUM_RETRIES=3 is not enough for the project to be reindexed. In this case, we think other users in the community may face the same issue as well. So we decide to make this configurable so that the user can change the number as needed.

Here attach the log of reindex failure from our team for your reference.
elastic-reindex-err-gms.log

Copy link
Contributor

@dexter-mh-lee dexter-mh-lee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. So for env variables, we never change the dockerfiles directly because the variables are not easy to find. Instead we use factories that get variables from application.yaml.

Check out this factory https://github.com/linkedin/datahub/blob/master/metadata-service/factories/src/main/java/com/linkedin/gms/factory/search/ElasticSearchIndexBuilderFactory.java that creates ESIndexBuilders for all our search services. You just need to expose the num retries as an input variable to the constructor and add a new injected variable in the factory similar to this https://github.com/linkedin/datahub/blob/master/metadata-service/factories/src/main/java/com/linkedin/gms/factory/search/ElasticSearchIndexBuilderFactory.java

Then, you can change the application.yaml file that the factory is picking up https://github.com/linkedin/datahub/blob/master/metadata-service/factories/src/main/resources/application.yml#L100 You can see that env variables with defaults are pushed into the application.yml file (and thus into the factory)

@github-actions
Copy link

github-actions bot commented Jan 12, 2022

Unit Test Results (build & test)

  45 files  ±0    45 suites  ±0   11m 41s ⏱️ + 2m 20s
502 tests ±0  450 ✔️ ±0  52 💤 ±0  0 ±0 

Results for commit d2d993d. ± Comparison against base commit 6f7c212.

♻️ This comment has been updated with latest results.

@senni0418
Copy link
Contributor Author

Hi @dexter-mh-lee I have changed the way of refactor in the way you describe. feel free to add more comments if you have other concerns

@@ -100,6 +100,7 @@ elasticsearch:
prefix: ${INDEX_PREFIX:}
numShards: ${ELASTICSEARCH_NUM_SHARDS_PER_INDEX:1}
numReplicas: ${ELASTICSEARCH_NUM_REPLICAS_PER_INDEX:1}
numRetries: ${ELASTICSEARCH_NUM_RETRIES_PER_INDEX:3}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we rename this to ELASTICSEARCH_INDEX_BUILDER_NUM_RETRIES ?

Copy link
Contributor

@dexter-mh-lee dexter-mh-lee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Minor comment to make sure the env variable is easy to understand!

@senni0418
Copy link
Contributor Author

@dexter-mh-lee Thanks for your comment, the variable is renamed.

Copy link
Contributor

@dexter-mh-lee dexter-mh-lee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! cc @shirshanka

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shirshanka shirshanka merged commit 80c46f2 into datahub-project:master Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants