Skip to content

Commit

Permalink
Merge pull request #33 from relthyg/set_fetcher_name
Browse files Browse the repository at this point in the history
Add option to set `fetcherName` for Tika >= 2.0.0
  • Loading branch information
vaites authored Aug 24, 2023
2 parents d0db71f + d760b8d commit 18bd11a
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 1 deletion.
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,11 @@ You can use an URL instead of a file path and the library will download the file
**no need** to add `-enableUnsecureFeatures -enableFileUrl` to command line when starting the server, as described
[here](https://wiki.apache.org/tika/TikaJAXRS#Specifying_a_URL_Instead_of_Putting_Bytes).

If you use Apache Tika >= 2.0.0, you *can* [define an HttpFetcher](https://cwiki.apache.org/confluence/display/TIKA/tika-pipes)
and use the option `-enableUnsecureFeatures -enableFileUrl` when starting the server to make the server download remote
files when passing a URL instead of a filname to `$client->getText()`. In order to do so, you must set the name of
the HttpFetcher using `$client->setFetcherName('yourFetcherName')`.

### Methods

Here are the full list of available methods
Expand Down Expand Up @@ -254,6 +259,12 @@ $client->setOCRLanguages($languages);
$client->getOCRLanguages();
```

Set HTTP fetcher name (for Tika >= 2.0.0 only, see https://cwiki.apache.org/confluence/display/TIKA/tika-pipes)

```php
$client->setFetcherName($fetcherName)
```

### Breaking changes

Since 1.0 version there are some breaking changes:
Expand Down
24 changes: 23 additions & 1 deletion src/Clients/WebClient.php
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,13 @@ class WebClient extends Client
*/
protected $retries = 3;

/**
* Name of the fetcher to be used (for Tika >= 2.0.0 only)
*
* @var string|null
*/
protected $fetcherName = null;

/**
* Default cURL options
*
Expand Down Expand Up @@ -208,6 +215,16 @@ public function setRetries(int $retries): self
return $this;
}

/**
* Set the name of the fetcher to be used (for Tika >= 2.0.0 only)
*/
public function setFetcherName(string $fetcherName): self
{
$this->fetcherName = $fetcherName;

return $this;
}

/**
* Get all the options
*/
Expand Down Expand Up @@ -626,7 +643,12 @@ protected function getParameters(string $type, string $file = null): array

if(!empty($file) && preg_match('/^http/', $file))
{
$headers[] = "fileUrl:$file";
if($this->fetcherName) {
$headers[] = "fetcherName:$this->fetcherName";
$headers[] = "fetchKey:$file";
} else {
$headers[] = "fileUrl:$file";
}
}

switch($type)
Expand Down

0 comments on commit 18bd11a

Please sign in to comment.