Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(config): configurable bootstrap policies file #8812

Conversation

sgomezvillamor
Copy link
Contributor

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the devops PR or Issue related to DataHub backend & deployment label Sep 8, 2023
@david-leifker david-leifker self-assigned this Sep 8, 2023
@sgomezvillamor
Copy link
Contributor Author

hi @david-leifker! just noted you self-assigned this PR. Some questions before opening for the review:

Do you think some tests should be included for this?
Tested locally with defaut values as well as with env var BOOTSTRAP_POLICIES_FILE=file:///test/my-policies.json, all fine! Just not sure how to cover this second scenario in a test 🤔

Also, should this be documented somewhere?

@david-leifker
Copy link
Collaborator

david-leifker commented Sep 9, 2023

I self-assigned because I think that the default policy is loaded once and then never loaded again from the jar. If I recall correctly the state of whether the bootstrap step has run is stored in the SQL database. Overriding this file after the very first run isn't going to work as expected. I wanted to validate that it will behave like that and to ask whether that is your intent (to override this policy once for a new instance)?

@sgomezvillamor
Copy link
Contributor Author

default policy is loaded once and then never loaded again from the jar.

for (final JsonNode policyObj : policiesObj) {
final Urn urn = Urn.createFromString(policyObj.get("urn").asText());
// If the info is not there, it means that the policy was there before, but must now be removed
if (!policyObj.has("info")) {
_entityService.deleteUrn(urn);
continue;
}
final DataHubPolicyInfo info =
RecordUtils.toRecordTemplate(DataHubPolicyInfo.class, policyObj.get("info").toString());
if (!info.isEditable()) {
// If the Policy is not editable, always re-ingest.
log.info(String.format("Ingesting default policy with urn %s", urn));
ingestPolicy(urn, info);
} else {
// If the Policy is editable (ie. an example policy), only ingest on a clean boot up.
if (!hasPolicy(urn)) {
log.info(String.format("Ingesting default policy with urn %s", urn));
ingestPolicy(urn, info);
} else {
log.info(String.format("Skipping ingestion of editable policy with urn %s", urn));
}
}
}

For every policy in the bootstrap file, the following criteria determines whether policy is loaded again or not:

  • if is not editable --> reload
  • else:
    • if policy not exists in backend --> reload
    • else --> skip

I recall correctly the state of whether the bootstrap step has run is stored in the SQL database.

Haven't found evidence of this. From what I see in the code and logs, IngestionPoliciesStep is processed in every boot and the criteria above is followed to determine reload or not.

Overriding this file after the very first run isn't going to work as expected.

I think it will. If we make the IngestionPoliciesStep to process a different file, as soon as we keep stable ids, the criteria above will still be valid.


The goal is to use a different file from the one in the jar while keeping all other functionality. There are some policies there we want to disable (even removing) while keeping all policies managed as code. So using a custom bootstrap policies file looks perfect match here.

Please, let me know your thoughts here

@sgomezvillamor sgomezvillamor marked this pull request as ready for review September 12, 2023 07:08
@@ -276,6 +276,8 @@ bootstrap:
enabled: ${UPGRADE_DEFAULT_BROWSE_PATHS_ENABLED:false} # enable to run the upgrade to migrate legacy default browse paths to new ones
backfillBrowsePathsV2:
enabled: ${BACKFILL_BROWSE_PATHS_V2:false} # Enables running the backfill of browsePathsV2 upgrade step. There are concerns about the load of this step so hiding it behind a flag. Deprecating in favor of running through SystemUpdate
policies:
file: ${BOOTSTRAP_POLICIES_FILE:classpath:boot/policies.json}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we've validated that this still works, right? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I tested with default values (so loading from classpath) and with ENV BOOTSTRAP_POLICIES_FILE=file:///datahub/datahub-gms/resources/custom-policies.json

It works fine in both cases

@@ -66,10 +70,10 @@ public void execute() throws IOException, URISyntaxException {
.maxStringLength(maxSize).build());

// 0. Execute preflight check to see whether we need to ingest policies
log.info("Ingesting default access policies...");
log.info("Ingesting default access policies from: {}...", _policiesResource);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this correctly print the path of the resource?

or the contents?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path

This is the output with default values

2023-09-22 10:05:22 2023-09-22 08:05:22,700 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:73 - Ingesting default access policies from: class path resource [boot/policies.json]...
2023-09-22 10:05:22 2023-09-22 08:05:22,710 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:98 - Ingesting default policy with urn urn:li:dataHubPolicy:0
2023-09-22 10:05:22 2023-09-22 08:05:22,725 [main] INFO  c.l.m.entity.EntityServiceImpl:1835 - Ingesting aspect with name dataHubPolicyKey, urn urn:li:dataHubPolicy:0
2023-09-22 10:05:22 2023-09-22 08:05:22,727 [main] INFO  c.l.m.entity.EntityServiceImpl:1835 - Ingesting aspect with name dataHubPolicyInfo, urn urn:li:dataHubPolicy:0
2023-09-22 10:05:22 2023-09-22 08:05:22,746 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:98 - Ingesting default policy with urn urn:li:dataHubPolicy:1
2023-09-22 10:05:22 2023-09-22 08:05:22,762 [main] INFO  c.l.m.entity.EntityServiceImpl:1835 - Ingesting aspect with name dataHubPolicyKey, urn urn:li:dataHubPolicy:1
2023-09-22 10:05:22 2023-09-22 08:05:22,762 [main] INFO  c.l.m.entity.EntityServiceImpl:1835 - Ingesting aspect with name dataHubPolicyInfo, urn urn:li:dataHubPolicy:1
2023-09-22 10:05:22 2023-09-22 08:05:22,810 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:106 - Skipping ingestion of editable policy with urn urn:li:dataHubPolicy:7
2023-09-22 10:05:22 2023-09-22 08:05:22,831 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:106 - Skipping ingestion of editable policy with urn urn:li:dataHubPolicy:view-entity-page-all
2023-09-22 10:05:22 2023-09-22 08:05:22,841 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:106 - Skipping ingestion of editable policy with urn urn:li:dataHubPolicy:view-dataset-sensitive
2023-09-22 10:05:22 2023-09-22 08:05:22,847 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:98 - Ingesting default policy with urn urn:li:dataHubPolicy:admin-platform-policy
2023-09-22 10:05:22 2023-09-22 08:05:22,863 [main] INFO  c.l.m.entity.EntityServiceImpl:1835 - Ingesting aspect with name dataHubPolicyKey, urn urn:li:dataHubPolicy:admin-platform-policy
...

And this is the output with ENV BOOTSTRAP_POLICIES_FILE=file:///datahub/datahub-gms/resources/custom-policies.json


2023-09-22 10:45:27 2023-09-22 08:45:27,027 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:73 - Ingesting default access policies from: URL [file:/datahub/datahub-gms/resources/custom-policies.json]...
2023-09-22 10:45:27 2023-09-22 08:45:27,032 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:98 - Ingesting default policy with urn urn:li:dataHubPolicy:XXX
2023-09-22 10:45:27 2023-09-22 08:45:27,132 [main] WARN  org.apache.velocity.deprecation:62 - configuration key 'resource.loader' has been deprecated in favor of 'resource.loaders'
2023-09-22 10:45:27 2023-09-22 08:45:27,133 [main] WARN  org.apache.velocity.deprecation:62 - configuration key 'class.resource.loader.class' has been deprecated in favor of 'resource.loader.class.class'
2023-09-22 10:45:27 2023-09-22 08:45:27,133 [main] WARN  org.apache.velocity.deprecation:62 - configuration key 'file.resource.loader.class' has been deprecated in favor of 'resource.loader.file.class'
2023-09-22 10:45:27 2023-09-22 08:45:27,133 [main] WARN  org.apache.velocity.deprecation:62 - configuration key 'file.resource.loader.path' has been deprecated in favor of 'resource.loader.file.path'
2023-09-22 10:45:27 2023-09-22 08:45:27,133 [main] WARN  org.apache.velocity.deprecation:62 - configuration key 'runtime.references.strict' has been deprecated in favor of 'runtime.strict_mode.enable'
2023-09-22 10:45:27 2023-09-22 08:45:27,627 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:115 - Successfully ingested default access policies.


// 1. Read from the file into JSON.
final JsonNode policiesObj = mapper.readTree(new ClassPathResource("./boot/policies.json").getFile());
final JsonNode policiesObj = mapper.readTree(_policiesResource.getFile());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely need to ensure that we've confirmed that this works still

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjoyce0510
Copy link
Collaborator

Overall I think this looks good. I have not used the dynamic Resource injection via spring but looks like all should work as expected!

If you provide a file, do you need to prefix the file path similar to how is done for classpath?

e.g. file:path/to/file?

If yes, I'd recommend adding a comment in application.yml to exemplify the correct format!

@sgomezvillamor
Copy link
Contributor Author

sgomezvillamor commented Sep 22, 2023

Overall I think this looks good. I have not used the dynamic Resource injection via spring but looks like all should work as expected!

I also learnt that in this contrib 😅 https://www.baeldung.com/spring-load-resource-as-string#using-spel

Overall I think this looks good. I have not used the dynamic Resource injection via spring but looks like all should work as expected!

If you provide a file, do you need to prefix the file path similar to how is done for classpath?

e.g. file:path/to/file?

If yes, I'd recommend adding a comment in application.yml to exemplify the correct format!

With file:... definitely works. I tested with ENV BOOTSTRAP_POLICIES_FILE=file:///datahub/datahub-gms/resources/custom-policies.json in my local deployment. Not sure if there are other variants that could work also.
I'm adding the comment/example in application.yml 👍 9cfae00

@vercel
Copy link

vercel bot commented Oct 3, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
docs-website ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 3, 2023 0:33am

@jjoyce0510
Copy link
Collaborator

LGTM- Once we are green we merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops PR or Issue related to DataHub backend & deployment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants