-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix lookup tables #291
fix lookup tables #291
Conversation
…zer, better decouple REST/bulk cases in Java, extend AWS to deal with multiple files
@@ -325,40 +323,38 @@ module "psoxy-bulk-to-worklytics" { | |||
}, try(each.value.settings_to_provide, {})) | |||
} | |||
|
|||
module "psoxy_lookup_tables_builders" { | |||
# BEGIN lookup builders | |||
module "lookup_output" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of a lambda for each lookup table, just an output bucket (with associated IAM policies, settings, etc).
inputs_to_build_lookups_for = toset(distinct([ for k, v in var.lookup_table_builders : v.input_connector_id ])) | ||
} | ||
|
||
resource "aws_ssm_parameter" "additional_transforms" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setting this SSM parameter for each "bulk" lambda to which we want to build an extra lookup table for - eg, but default lambda doesn't have any ADDITIONAL_TRANSFORMS
; but if you provide this SSM param, then you change the behavior of existing lambda
@@ -21,7 +21,7 @@ | |||
@EqualsAndHashCode | |||
@JsonPropertyOrder({"allowAllEndpoints", "endpoints", "defaultScopeIdForSource"}) | |||
@JsonInclude(JsonInclude.Include.NON_NULL) //NOTE: despite name, also affects YAML encoding | |||
public class Rules2 implements RuleSet, Serializable { | |||
public class Rules2 implements RESTRules { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is clearer; lets people bind to RESTRules
interface instead of Rules2
implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rules2 --> PsoxyRules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few suggestions, no blockers. Liked the split of rules + pseudonymization 👍
for_each = var.lookup_table_builders | ||
|
||
source = "../../modules/aws-psoxy-bulk-existing" | ||
# source = "git::https://github.com/worklytics/psoxy//infra/modules/aws-psoxy-bulk-existing?ref=v0.4.13" | ||
source = "../../modules/aws-psoxy-output-bucket" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this point to source when deployed?
|
||
resource "aws_s3_bucket" "output" { | ||
# note: this ends up with a long UTC time-stamp + random number appended to it to form the bucket name | ||
bucket_prefix = "psoxy-${var.instance_id}-" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we use the customer id too (if available)? from out point of view all these buckets look similar.
In any case, with the new export connectors to come should not be an issue anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is the S3 bucket that we import customer's sanitized data from. we shouldn't be touching it directly; they should copy-paste the value into the "Add Connection" flow.
Security on this is that within our platform, only the specific customer's tenant SA can access it. So even if we did for some reason configure the connection on behalf of the customer, and mixed up buckets across customer, should be 403 from AWS IAM when the tenant SA tries to assume a role which it can't assume (based on the role's config in customer AWS)
@Override | ||
public PseudonymizedIdentity pseudonymize(@NonNull String value) { | ||
return pseudonymize((Object) value); | ||
} | ||
|
||
@Override | ||
public PseudonymizedIdentity pseudonymize(@NonNull Number value) { | ||
return pseudonymize((Object) value); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really needed in the interface? they just call the Object one. That could deal internally with specialized versions per object class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, maybe can be eliminated. The Object
one was previously private
@@ -21,7 +21,7 @@ | |||
@EqualsAndHashCode | |||
@JsonPropertyOrder({"allowAllEndpoints", "endpoints", "defaultScopeIdForSource"}) | |||
@JsonInclude(JsonInclude.Include.NON_NULL) //NOTE: despite name, also affects YAML encoding | |||
public class Rules2 implements RuleSet, Serializable { | |||
public class Rules2 implements RESTRules { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rules2 --> PsoxyRules
throw new RuntimeException("Failed to parse ADDITIONAL_TRANSFORMS from config", e); | ||
} | ||
} else { | ||
return new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collections.emptyList() - immutable, guess you don't need it mutable
BulkDataRules bulkDataRules, | ||
Pseudonymizer pseudonymizer) throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of liked the Sanitizer
interface being (rules + pseudonymizer), but ok
System.out.println("Writing to: " + storageEventResponse.getDestinationBucketName() + "/" + storageEventResponse.getDestinationObjectPath()); | ||
|
||
storage.createFrom(BlobInfo.newBuilder(BlobId.of(storageEventResponse.getDestinationBucketName(), storageEventResponse.getDestinationObjectPath())) | ||
.setContentType(blobInfo.getContentType()) | ||
.build(), processedStream); | ||
|
||
System.out.println("Successfully pseudonymized " + importBucket + "/" | ||
+ sourceName + " and uploaded to " + storageEventResponse.getDestinationBucketName() + "/" + storageEventResponse.getDestinationObjectPath()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Log
calls instead of sys out?
process(importBucket, sourceKey, transform.getDestinationBucketName(), transform.getRules()); | ||
} | ||
|
||
return "Processed!"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"200-OK", not sure where this is read back (probably nowhere) but if so, easier to process
apologies for the massive refactor 😬 ; but generally we were using
Sanitizer
in bulk cases, only to get its "pseudonymize()" methods. So split out aPseudonymizer
interface/implementation, and made it much clear that theSanitizer
is really for REST APIs.NOTE: take your time, don't think need to get this into
v0.4.14
, which I want out by EoW.Fixes
Other ideas:
StorageHandler
is reallyStorageObjectSanitizer
, to be clearer - as it it implements redaction, pseudonymization, etc. but arch differs fromRESTApiSanitizer
;rules
are a method parameter toStorageHandler
atm, but are a property ofRESTApiSanitizer
instances. Which is better approach?refactorchanged my mind this, bc will necessitate a lot ofaws-psoxy-bulk
to useaws-psoxy-output-bucket
module, as currently repeats a lot of that (at risk of having hierarchical TF modules)moved
stuff in the modules for little valueChange implications