Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to set fetcherName for Tika >= 2.0.0 #33

Merged
merged 1 commit into from
Aug 24, 2023

Conversation

relthyg
Copy link
Contributor

@relthyg relthyg commented Aug 14, 2023

In Tika >= 2.0.0, fetching remote files via the server is done using so called fetchers. If you are running a Tika Server that is configured to use an HTTP fetcher, you need the client to tell the server which fetcher to use, which is done by adding the HTTP header fetcherName to the request. Furthermore, the URL of the remote file to be fetched must be passed using a fetchKey header instead of fetchUrl as in Tika 1.x.x.

This adds a public API method to set the fetcher name, and replaces the fileUrl header with fetcherName and fetchKey if a fetcher name is set. If no fetcher name is set, the fileUrl header is still added to the request as usual to keep TIKA 1.x.x compatibility.

In Tika >= 2.0.0, fetching remote files via the server is done using so called [fetchers](https://cwiki.apache.org/confluence/display/TIKA/tika-pipes). If you are running a Tika Server that is configured to use an HTTP fetcher, you need the client to tell the server which fetcher to use, which is done by adding the HTTP header `fetcherName` to the request. Furthermore, the URL of the remote file to be fetched must be passed using a `fetchKey` header instead `fetchUrl` as in Tika 1.x.x.

This adds a public API method to set the fetcher name, and replaces the `fileUrl` header with `fetcherName` and `fetchKey` if a fetcher name is set. If no fetcher name is set, the `fileUrl` header is still added to the request as usual to keep TIKA 1.x.x compatibility.
@vaites
Copy link
Owner

vaites commented Aug 14, 2023

Thanks @relthyg, please give me a few days to take a look to these Tika feature and your changes. It looks OK but I'm working on the 2.0 version of this library and want to see how to integrate on it too...

@relthyg
Copy link
Contributor Author

relthyg commented Aug 15, 2023

Thanks fore reaching out, @vaites. Let me know If I can do anything to help or improve the PR.

@mpdude
Copy link
Contributor

mpdude commented Aug 23, 2023

Hey David, if there's anything that would help you – let me know.

@vaites
Copy link
Owner

vaites commented Aug 23, 2023

Sorry for the delay @relthyg and @mpdude, I want to understand well the functionality to add some tests (and merge it on the upcoming 2.x version) and I'm taking longer than expected. This week I hope to have it ready...

@vaites vaites merged commit 18bd11a into vaites:master Aug 24, 2023
@vaites
Copy link
Owner

vaites commented Aug 24, 2023

PR is merged and version v1.3.0 is published. Thanks for your contribution 👍

@relthyg
Copy link
Contributor Author

relthyg commented Aug 25, 2023

Thank you for accepting the PR!

@relthyg relthyg deleted the set_fetcher_name branch August 25, 2023 05:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants