-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexer ES document id #1028
Indexer ES document id #1028
Conversation
Signed-off-by: Mathieu Bret <mikwiss00@gmail.com>
I am wondering, if it would be enough to provide
so user's can override as needed without introducing a new config option? |
It's enough indeed ! Maybe I am the only one to want keep data :) |
Yes that would work as well
…On Thu, 5 Jan 2023, 09:25 Richard Zowalla, ***@***.***> wrote:
I am wondering, if it would be enough to provide
protected String getDocumentID(Metadata metadata, String normalisedUrl) {
//Return current default
}
so user's can override as needed without introducing a new config option?
—
Reply to this email directly, view it on GitHub
<#1028 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABVJT2RETPXTE3FU25QNZLWQ2HQPANCNFSM6AAAAAATRXCML4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Signed-off-by: Mathieu Bret <mikwiss00@gmail.com>
268d99f
to
f9d23c5
Compare
We'll need this for the other modules (SOLR, OpenSearch etc...) so I'd rather we did it for all of them in one go. |
+1 |
Signed-off-by: Mathieu Bret <mikwiss00@gmail.com>
Hi guys, I made the change, but I'm not sure about solr and sql (despite changes don't change the behavior). |
external/solr/src/main/java/com/digitalpebble/stormcrawler/solr/bolt/IndexerBolt.java
Outdated
Show resolved
Hide resolved
external/sql/src/main/java/com/digitalpebble/stormcrawler/sql/IndexerBolt.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Mathieu Bret <mikwiss00@gmail.com>
thanks @Mikwiss! |
Hi @jnioche,
I saw the #1025 issue and your answer related to #671. That's was exactly the purpose of this PR !
In our case we use a random UUID to store the content (to keep all data). We add a new boolean parameter to handle this :
es.indexer.id.random
. ThegetDocumentId()
can also be overrided to create any ID with metadata.Signed-off-by: Mathieu Bret mikwiss00@gmail.com
Thanks for contributing to StormCrawler, your efforts are appreciated!
Developer Certificate of Origin
By contributing to StormCrawler, you accept and agree to the following terms and conditions (the Developer Certificate of Origin) for your present and future contributions submitted to StormCrawler.
Please refer to the Developer Certificate of Origin section in
CONTRIBUTING.md
for details.Before opening a PR, please check that:
Thanks!