Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot download gzipped objects larger than ReadChannel chunk size #982

Closed
clementdenis opened this issue May 4, 2016 · 2 comments · Fixed by #1301
Closed

Cannot download gzipped objects larger than ReadChannel chunk size #982

clementdenis opened this issue May 4, 2016 · 2 comments · Fixed by #1301
Assignees
Labels
api: storage Issues related to the Cloud Storage API.

Comments

@clementdenis
Copy link

clementdenis commented May 4, 2016

I uploaded some big files with gsutil using gzip encoding:
gsutil cp -Z bigfile gs://<bucket_name>/

This code is OK when downloading the file:

    Blob blob = storage.get(bucket, name);
    try (ReadChannel reader = blob.reader()) {
        int fileSize = blob.size().intValue();
        reader.chunkSize(fileSize); //any bigger value is also OK
        ByteStreams.copy(reader, Channels.newChannel(ByteStreams.nullOutputStream()));
    }

This code fails (chunk size is smaller the content size):

    Blob blob = storage.get(bucket, name);
    try (ReadChannel reader = blob.reader()) {
        int fileSize = blob.size().intValue();
        reader.chunkSize(fileSize - 1);
        ByteStreams.copy(reader, Channels.newChannel(ByteStreams.nullOutputStream()));
    }

with this exception

com.google.cloud.storage.StorageException
    at com.google.cloud.storage.spi.DefaultStorageRpc.translate(DefaultStorageRpc.java:98)
    at com.google.cloud.storage.spi.DefaultStorageRpc.read(DefaultStorageRpc.java:474)
    at com.google.cloud.storage.BlobReadChannel$1.call(BlobReadChannel.java:127)
    at com.google.cloud.storage.BlobReadChannel$1.call(BlobReadChannel.java:124)
    at com.google.cloud.RetryHelper.doRetry(RetryHelper.java:181)
    at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:247)
    at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:237)
    at com.google.cloud.storage.BlobReadChannel.read(BlobReadChannel.java:124)
    at com.google.common.io.ByteStreams.copy(ByteStreams.java:148)
    at ...
Caused by: java.io.EOFException
    at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:264)
    at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:255)
    at java.util.zip.GZIPInputStream.readUInt(GZIPInputStream.java:247)
    at java.util.zip.GZIPInputStream.readTrailer(GZIPInputStream.java:218)
    at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:118)
    at java.io.FilterInputStream.read(FilterInputStream.java:107)
    at com.google.api.client.util.ByteStreams.copy(ByteStreams.java:51)
    at com.google.api.client.util.IOUtils.copy(IOUtils.java:94)
    at com.google.api.client.util.IOUtils.copy(IOUtils.java:63)
    at com.google.api.client.http.HttpResponse.download(HttpResponse.java:421)
    at com.google.cloud.storage.spi.DefaultStorageRpc.read(DefaultStorageRpc.java:470)
    ... 31 more

It is of course not practical to increase the chunk size indefinitely, as the whole chunk is uncompressed in memory and might create OutOfMemoryErrors.

BTW, the generated Java API client fails with the same error (I suppose it uses the same http stack for downloads).

@mziccard
Copy link
Contributor

mziccard commented May 5, 2016

Hi @clementdenis thanks for your report.

Yes the problem here is that we are reading a GZIPInputStream that contains only part (a chunk) of the compress object. When we try to read uncompressed bytes from this "partial" stream we get the error. Unfortunately this issue can not be easily overcome while still reading the uncompressed object in chunks (i.e. using ReadChannel).

It is of course not practical to increase the chunk size indefinitely, as the whole chunk is uncompressed in memory and might create OutOfMemoryErrors.

I agree. I will think about a solution to this and keep you posted. A possible solution would be reading each chunk without using a GZIPInputStream (and thus not trying to uncompress bytes). However, with such a solution reader.read() would still return the compressed chunk.

@mziccard
Copy link
Contributor

@clementdenis while we wait for googleapis/google-api-java-client#1009 to get fixed I drafter a workaround in #1301 that allows to read a file in compressed chunks.

Feel free to give it a try and let us know.

github-actions bot pushed a commit that referenced this issue Jul 1, 2022
Making CLIRR not required. The version bumps are now controlled by the Release Please and OwlBot. The CL authors create appropriate change description to control major version bumps.
github-actions bot pushed a commit that referenced this issue Aug 16, 2022
…ow to v4.8.0 (#982)

[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [com.google.cloud:google-cloud-dialogflow](https://togithub.com/googleapis/java-dialogflow) | `4.7.5` -> `4.8.0` | [![age](https://badges.renovateapi.com/packages/maven/com.google.cloud:google-cloud-dialogflow/4.8.0/age-slim)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://badges.renovateapi.com/packages/maven/com.google.cloud:google-cloud-dialogflow/4.8.0/adoption-slim)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://badges.renovateapi.com/packages/maven/com.google.cloud:google-cloud-dialogflow/4.8.0/compatibility-slim/4.7.5)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://badges.renovateapi.com/packages/maven/com.google.cloud:google-cloud-dialogflow/4.8.0/confidence-slim/4.7.5)](https://docs.renovatebot.com/merge-confidence/) |

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Renovate will not automatically rebase this PR, because other commits have been found.

🔕 **Ignore**: Close this PR and you won't be reminded about these updates again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, click this checkbox. ⚠ **Warning**: custom changes will be lost.

---

This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://app.renovatebot.com/dashboard#github/googleapis/java-dialogflow).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzMi4xNTQuNCIsInVwZGF0ZWRJblZlciI6IjMyLjE1NC40In0=-->
github-actions bot pushed a commit that referenced this issue Sep 15, 2022
…1575) (#982)

Source-Link: googleapis/synthtool@2e9ac19
Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-java:latest@sha256:8175681a918181d306d9c370d3262f16b4c724cc73d74111b7d42fc985ca7f93
github-actions bot pushed a commit that referenced this issue Sep 15, 2022
…0.14 (#982)

[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [com.google.cloud:google-cloud-pubsub](https://togithub.com/googleapis/java-pubsub) | `1.120.13` -> `1.120.14` | [![age](https://badges.renovateapi.com/packages/maven/com.google.cloud:google-cloud-pubsub/1.120.14/age-slim)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://badges.renovateapi.com/packages/maven/com.google.cloud:google-cloud-pubsub/1.120.14/adoption-slim)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://badges.renovateapi.com/packages/maven/com.google.cloud:google-cloud-pubsub/1.120.14/compatibility-slim/1.120.13)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://badges.renovateapi.com/packages/maven/com.google.cloud:google-cloud-pubsub/1.120.14/confidence-slim/1.120.13)](https://docs.renovatebot.com/merge-confidence/) |

---

### Release Notes

<details>
<summary>googleapis/java-pubsub</summary>

### [`v1.120.14`](https://togithub.com/googleapis/java-pubsub/blob/HEAD/CHANGELOG.md#&#8203;112014-httpsgithubcomgoogleapisjava-pubsubcomparev112013v112014-2022-09-10)

[Compare Source](https://togithub.com/googleapis/java-pubsub/compare/v1.120.13...v1.120.14)

##### Dependencies

-   Update dependency com.google.cloud:google-cloud-bigquery to v2.15.0 ([#&#8203;1259](https://togithub.com/googleapis/java-pubsub/issues/1259)) ([257cb8f](https://togithub.com/googleapis/java-pubsub/commit/257cb8f1b38a885dc4c8fb473a79fee1f01a2b57))
-   Update dependency com.google.cloud:google-cloud-core to v2.8.10 ([#&#8203;1258](https://togithub.com/googleapis/java-pubsub/issues/1258)) ([37e0034](https://togithub.com/googleapis/java-pubsub/commit/37e0034660855fc327d3843f8aa78bcda03fe158))
-   Update dependency com.google.cloud:google-cloud-core to v2.8.11 ([#&#8203;1264](https://togithub.com/googleapis/java-pubsub/issues/1264)) ([a19bc7a](https://togithub.com/googleapis/java-pubsub/commit/a19bc7a6bd54a9223575c23df1cac7b2583eb61a))
-   Update dependency com.google.cloud:google-cloud-shared-dependencies to v3.0.2 ([#&#8203;1265](https://togithub.com/googleapis/java-pubsub/issues/1265)) ([52da9da](https://togithub.com/googleapis/java-pubsub/commit/52da9dae19399e03af8d20c0c29aa600b7e31ed3))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, click this checkbox.

---

This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://app.renovatebot.com/dashboard#github/googleapis/java-dlp).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzMi4xOTQuMyIsInVwZGF0ZWRJblZlciI6IjMyLjE5NC4zIn0=-->
github-actions bot pushed a commit that referenced this issue Sep 19, 2022
🤖 I have created a release *beep* *boop*
---


## [3.7.0](googleapis/java-dlp@v3.6.7...v3.7.0) (2022-09-15)


### Features

* Add Deidentify action ([#977](googleapis/java-dlp#977)) ([4f43cd5](googleapis/java-dlp@4f43cd5))


### Bug Fixes

* Update DeIdentificationTest ([#976](googleapis/java-dlp#976)) ([661f316](googleapis/java-dlp@661f316))


### Dependencies

* Update dependency com.google.cloud:google-cloud-pubsub to v1.120.11 ([#960](googleapis/java-dlp#960)) ([c66384d](googleapis/java-dlp@c66384d))
* Update dependency com.google.cloud:google-cloud-pubsub to v1.120.12 ([#969](googleapis/java-dlp#969)) ([586795f](googleapis/java-dlp@586795f))
* Update dependency com.google.cloud:google-cloud-pubsub to v1.120.13 ([#974](googleapis/java-dlp#974)) ([b37407b](googleapis/java-dlp@b37407b))
* Update dependency com.google.cloud:google-cloud-pubsub to v1.120.14 ([#982](googleapis/java-dlp#982)) ([8ac525a](googleapis/java-dlp@8ac525a))
* Update dependency com.google.cloud:google-cloud-shared-dependencies to v3.0.2 ([#980](googleapis/java-dlp#980)) ([885dc15](googleapis/java-dlp@885dc15))
* Update dependency com.google.cloud:google-cloud-shared-dependencies to v3.0.3 ([#984](googleapis/java-dlp#984)) ([8cd0a40](googleapis/java-dlp@8cd0a40))

---
This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
suztomo pushed a commit that referenced this issue Feb 1, 2023
Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants