-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce a Hashing Processor #31087
Conversation
Pinging @elastic/es-core-infra |
I think replicating the logstash functionality would be a good first step. I see a lot of value in having a consistent experience for the users. I think having similar functionality out of the gate then evolving both processors would be a big win. My current understanding is the logstash functionality is acceptable for GDPR requirements. |
@talevy i assume in the first iteration the salt/hmac key will just be held as part of the ingest processor defn itself? |
@gingerwizard any private keys will probably need to be stored in the keystore. |
It makes sense to introduce new Security ingest processors (example: elastic#31087), and this change would give them a good place to be written.
It makes sense to introduce new Security ingest processors (example: #31087), and this change would give them a good place to be written.
It makes sense to introduce new Security ingest processors (example: #31087), and this change would give them a good place to be written.
2494c30
to
b7e6898
Compare
It is useful to have a processor similar to https://www.elastic.co/guide/en/logstash/6.0/plugins-filters-fingerprint.html in Elasticsearch. A processor that leverages a variety of hashing algorithms to create cryptographically-secure one-way hashes of values in documents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the ingest side it looks good. I left two minor comments. If docs and an integ test is added then it LGTM.
* A processor that hashes the contents of a field (or fields) using various hashing algorithms | ||
*/ | ||
public final class HashProcessor extends AbstractProcessor { | ||
enum Method { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe move this enum below the factory class?
public void execute(IngestDocument document) { | ||
Map<String, String> hashedFieldValues = fields.stream().map(f -> { | ||
try { | ||
String value = document.getFieldValue(f, String.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be missing field support?
thanks for the review @martijnvg, I'll follow up soon! |
hi @martijnvg thank you for the review. I've added the I could not find a great place in the security documentation to add docs for this and I do not want to block the PR because of this (I am going on vacation next week). For this reason, I will open an issue to add documentation for when I come back, what do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, assuming the PR build is green. Currently it fails because of a checkstyle violation.
I could not find a great place in the security documentation to add docs for this and I do not want to block the PR because of this (I am going on vacation next week). For this reason, I will open an issue to add documentation for when I come back, what do you think?
I'm ok with this.
@@ -277,7 +277,7 @@ public static Hasher resolve(String name) { | |||
|
|||
public abstract boolean verify(SecureString data, char[] hash); | |||
|
|||
static final class SaltProvider { | |||
public static final class SaltProvider { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We removed SaltProvider in soon to be merged #31234 and opted for generating a random byte array from SecureRandom and then Base64 encoding that to a string, so probably you need to do something similar here too. https://github.com/elastic/elasticsearch/pull/31234/files#diff-ebc23bc2cb194fa926b2cdafaedef9d4R565
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, it is also private! I see, thanks for the heads up. How soon will that be merged? One of us will have to change their code pre-pushing 😄
It is useful to have a processor similar to logstash-filter-fingerprint in Elasticsearch. A processor that leverages a variety of hashing algorithms to create cryptographically-secure one-way hashes of values in documents. This processor introduces a pbkdf2hmac hashing scheme to fields in documents for indexing
It is useful to have a processor similar to logstash-filter-fingerprint in Elasticsearch. A processor that leverages a variety of hashing algorithms to create cryptographically-secure one-way hashes of values in documents. This processor introduces a pbkdf2hmac hashing scheme to fields in documents for indexing
* master: Mute 'Test typed keys parameter for suggesters' as we await a fix. Build test: Thread linger Fix gradle4.8 deprecation warnings (#31654) Mute FileRealmTests#testAuthenticateCaching with an @AwaitsFix. Mute TransportChangePasswordActionTests#testIncorrectPasswordHashingAlgorithm with an @AwaitsFix. Build: Fix naming conventions task (#31681) Introduce a Hashing Processor (#31087)
* elastic/ccr: (30 commits) Enable setting client path prefix to / (elastic#30119) [DOCS] Secure settings specified per node (elastic#31621) has_parent builder: exception message/param fix (elastic#31182) TEST: Randomize soft-deletes settings (elastic#31585) Mute 'Test typed keys parameter for suggesters' as we await a fix. Build test: Thread linger Fix gradle4.8 deprecation warnings (elastic#31654) Mute FileRealmTests#testAuthenticateCaching with an @AwaitsFix. Mute TransportChangePasswordActionTests#testIncorrectPasswordHashingAlgorithm with an @AwaitsFix. Build: Fix naming conventions task (elastic#31681) Introduce a Hashing Processor (elastic#31087) Do not check for object existence when deleting repository index files (elastic#31680) Remove extra check for object existence in repository-gcs read object (elastic#31661) Support multiple system store types (elastic#31650) [Test] Clean up some repository-s3 tests (elastic#31601) [Docs] Use capital letters in section headings (elastic#31678) muted tests that will be replaced by the shard follow task refactoring: elastic#31581 [DOCS] Add PQL language Plugin (elastic#31237) Merge AzureStorageService and AzureStorageServiceImpl and clean up tests (elastic#31607) TEST: Fix test task invocation (elastic#31657) ...
This reverts commit 8c78fe7.
This reverts commit b296936.
There are concerns that this implementation as was merged can easily lead to incorrect usage without clear feedback. For example, secret keys may be inconsistent between ESKeyStores which will lead to inconsistent hashing across ingests depending on which node operated on the documents. For this reason, and others, This change is to be reverted. |
* 6.x: Fix rollup on date fields that don't support epoch_millis (#31890) Revert "Introduce a Hashing Processor (#31087)" (#32179) [test] use randomized runner in packaging tests (#32109) Painless: Fix caching bug and clean up addPainlessClass. (#32142) Fix BwC Tests looking for UUID Pre 6.4 (#32158) (#32169) Call setReferences() on custom referring tokenfilters in _analyze (#32157) Add more contexts to painless execute api (#30511) Add EC2 credential test for repository-s3 (#31918) Fix CP for namingConventions when gradle home has spaces (#31914) Convert Version to Java - clusterformation part1 (#32009) Fix Java 11 javadoc compile problem Improve docs for search preferences (#32098) Configurable password hashing algorithm/cost(#31234) (#32092) [DOCS] Update TLS on Docker for 6.3 ESIndexLevelReplicationTestCase doesn't support replicated failures but it's good to know what they are Switch distribution to new style Requests (#30595) Build: Skip jar tests if jar disabled Build: Move shadow customizations into common code (#32014) Painless: Add PainlessClassBuilder (#32141) Fix accidental duplication of bwc test for script behavior Handle missing values in painless (#30975) (#31903) Build: Make additional test deps of check (#32015) Painless: Fix Bug with Duplicate PainlessClasses (#32110) Adjust translog after versionType removed in 7.0 (#32020) Disable C2 from using AVX-512 on JDK 10 (#32138) [Rollup] Add new capabilities endpoint for concrete rollup indices (#32111) Mute :qa:mixed-cluster indices.stats/10_index/Index - all’ [ML] Wait for aliases in multi-node tests (#32086) Ensure to release translog snapshot in primary-replica resync (#32045) Docs: Fix missing example script quote (#32010) Add Index UUID to `/_stats` Response (#31871) (#32113) [ML] Move analyzer dependencies out of categorization config (#32123) [ML][DOCS] Add missing 6.3.0 release notes (#32099) Updates the build to gradle 4.9 (#32087) Update monitoring template version to 6040099 (#32088) Fix put mappings java API documentation (#31955) Add exclusion option to `keep_types` token filter (#32012)
* master: Painless: Simplify Naming in Lookup Package (#32177) Handle missing values in painless (#32207) add support for write index resolution when creating/updating documents (#31520) ECS Task IAM profile credentials ignored in repository-s3 plugin (#31864) Remove indication of future multi-homing support (#32187) Rest test - allow for snapshots to take 0 milliseconds Make x-pack-core generate a pom file Rest HL client: Add put watch action (#32026) Build: Remove pom generation for plugin zip files (#32180) Fix comments causing errors with Java 11 Fix rollup on date fields that don't support epoch_millis (#31890) Detect and prevent configuration that triggers a Gradle bug (#31912) [test] port linux package packaging tests (#31943) Revert "Introduce a Hashing Processor (#31087)" (#32178) Remove empty @return from JavaDoc Adjust SSLDriver behavior for JDK11 changes (#32145) [test] use randomized runner in packaging tests (#32109) Add support for field aliases. (#32172) Painless: Fix caching bug and clean up addPainlessClass. (#32142) Call setReferences() on custom referring tokenfilters in _analyze (#32157) Fix BwC Tests looking for UUID Pre 6.4 (#32158) Improve docs for search preferences (#32159) use before instead of onOrBefore Add more contexts to painless execute api (#30511) Add EC2 credential test for repository-s3 (#31918) A replica can be promoted and started in one cluster state update (#32042) Fix Java 11 javadoc compile problem Fix CP for namingConventions when gradle home has spaces (#31914) Fix `range` queries on `_type` field for singe type indices (#31756) [DOCS] Update TLS on Docker for 6.3 (#32114) ESIndexLevelReplicationTestCase doesn't support replicated failures but it's good to know what they are Remove versionType from translog (#31945) Switch distribution to new style Requests (#30595) Build: Skip jar tests if jar disabled Painless: Add PainlessClassBuilder (#32141) Build: Make additional test deps of check (#32015) Disable C2 from using AVX-512 on JDK 10 (#32138) Build: Move shadow customizations into common code (#32014) Painless: Fix Bug with Duplicate PainlessClasses (#32110) Remove empty @param from Javadoc Re-disable packaging tests on suse boxes Docs: Fix missing example script quote (#32010) [ML] Wait for aliases in multi-node tests (#32086) [ML] Move analyzer dependencies out of categorization config (#32123) Ensure to release translog snapshot in primary-replica resync (#32045) Handle TokenizerFactory TODOs (#32063) Relax TermVectors API to work with textual fields other than TextFieldType (#31915) Updates the build to gradle 4.9 (#32087) Mute :qa:mixed-cluster indices.stats/10_index/Index - all’ Check that client methods match API defined in the REST spec (#31825) Enable testing in FIPS140 JVM (#31666) Fix put mappings java API documentation (#31955) Add exclusion option to `keep_types` token filter (#32012) [Test] Modify assert statement for ssl handshake (#32072)
A hashing function that like Logstash fingerprint filter supports MURMUR3, MD5, SHA1 and SHA256 would be very useful when creating pipelines that can avoid duplicates without requiring Logstash in the mix. For this, the secrecy of the key is probably less important than when hashing from a security perspective. |
It is useful to have a processor similar to logstash-filter-fingerprint in Elasticsearch. A processor that leverages a variety of hashing algorithms to create cryptographically-secure one-way hashes of values in documents. This processor introduces a pbkdf2hmac hashing scheme to fields in documents for indexing
Is there any progress on this topic? Incorrect usage is quite a broad topic and configuration issues can arise anywhere. I don't think it should block the progress of this one. I mean as long as the functional part is working it should be fine.. |
Note that a processor suitable for content fingerprinting was added in #68415 though it is not designed for the content anonymization use case. |
It is useful to have a processor similar to
https://www.elastic.co/guide/en/logstash/6.0/plugins-filters-fingerprint.html
in Elasticsearch. A processor that leverages a variety of hashing algorithms
to create cryptographically-secure one-way hashes of values in documents.
supersedes #30790
TODO:
follow-up PR: add documentation #31694