Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gatsby-source-contentful] local image processing can break if the same image is requested multiple times #28903

Closed
ariadne-github opened this issue Jan 7, 2021 · 3 comments · Fixed by #28926
Labels
topic: source-contentful Related to Gatsby's integration with Contentful type: bug An issue or pull request relating to a bug in Gatsby

Comments

@ariadne-github
Copy link
Contributor

Description

While processing Contentful images using the tracedSVG query, the build breaks with this kind of errors:

VipsJpeg: Premature end of JPEG file
VipsJpeg: Premature end of JPEG file
VipsJpeg: out of order read at line 0
VipsJpeg: out of order read at line 0

Steps to reproduce

It's a little difficult to reproduce because it's not deterministic, but in theory you should simply use the same image in Contentful and query it in multiple places. For example use it in different entries, like in different pages, then use tracedSVG inside a query.

Expected result

The build should complete without errors.

Actual result

The build breaks with this errors:

VipsJpeg: Premature end of JPEG file
VipsJpeg: Premature end of JPEG file
VipsJpeg: out of order read at line 0
VipsJpeg: out of order read at line 0

Environment

System:
OS: Windows 10 10.0.18363
CPU: (8) x64 Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz
Binaries:
Node: 12.16.1 - C:\Program Files\nodejs\node.EXE
Yarn: 1.22.4 - ~\AppData\Roaming\npm\yarn.CMD
npm: 6.13.4 - C:\Program Files\nodejs\npm.CMD
Languages:
Python: 2.7.8
Browsers:
Chrome: 87.0.4280.88
Edge: Spartan (44.18362.449.0)
npmPackages:
gatsby: ^2.29.3 => 2.29.3
gatsby-image: ^2.8.0 => 2.8.0
gatsby-plugin-google-tagmanager: ^2.8.0 => 2.8.0
gatsby-plugin-gtag: ^1.0.13 => 1.0.13
gatsby-plugin-image: ^0.4.1 => 0.4.1
gatsby-plugin-manifest: ^2.9.1 => 2.9.1
gatsby-plugin-postcss: ^2.3.13 => 2.3.13
gatsby-plugin-preact: ^4.4.0 => 4.4.0
gatsby-plugin-react-helmet: ^3.7.0 => 3.7.0
gatsby-plugin-react-helmet-canonical-urls: ^1.4.0 => 1.4.0
gatsby-plugin-react-svg: ^3.0.0 => 3.0.0
gatsby-plugin-robots-txt: ^1.5.5 => 1.5.5
gatsby-plugin-sass: ^2.8.0 => 2.8.0
gatsby-plugin-schema-snapshot: ^1.4.0 => 1.4.0
gatsby-plugin-sharp: ^2.11.2 => 2.11.2
gatsby-plugin-sitemap: ^2.9.0 => 2.9.0
gatsby-plugin-webpack-bundle-analyser-v2: ^1.1.18 => 1.1.18
gatsby-source-contentful: ^4.3.1 => 4.3.1
gatsby-source-filesystem: ^2.8.1 => 2.8.1
gatsby-transformer-json: ^2.8.0 => 2.8.0
gatsby-transformer-remark: ^2.13.1 => 2.13.1
gatsby-transformer-sharp: ^2.9.0 => 2.9.0
gatsby-transformer-yaml: ^2.8.0 => 2.8.0
npmGlobalPackages:
gatsby-cli: 2.12.115

Notes on troubleshooting

I've already investigated the problem. First of all, the jpeg images I used are valid.

Based on my search, the root of the error is in gatsby-source-contentful, specifically in the cache-image.js file: if the file already exists, returns readily the path for sharp to process it.

const alreadyExists = await pathExists(absolutePath)

The problem is, if the same image is requested more than once in parallel, the second request tries to process the image while it's still downloading it, resulting in invalid image errors.
I think it could be fixed by adding a promise cache, so that every subsequent request just awaits the same promise until the file is completely downloaded.
Something like this:

const flyingRequests = {};
....
const alreadyExists = await pathExists(absolutePath);

if (!alreadyExists && !flyingRequests[absolutePath]) {
    flyingRequests[absolutePath] = new Promise(async (resolve, reject) => {
      const previewUrl = `http:${url}?${params.join(`&`)}`;
      const response = await axios({
        method: `get`,
        url: previewUrl,
        responseType: `stream`
      });
      const file = createWriteStream(absolutePath);
      response.data.pipe(file);
      file.on(`finish`, resolve);
      file.on(`error`, reject);
    });
  }
  if (flyingRequests[absolutePath]) {
    await flyingRequests[absolutePath];
    delete flyingRequests[absolutePath];
  }

  return absolutePath;

I've tried this on my machine and it seems to prevent the bug.
If you agree, I can open a pull request with a fix.

@ariadne-github ariadne-github added the type: bug An issue or pull request relating to a bug in Gatsby label Jan 7, 2021
@gatsbot gatsbot bot added the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label Jan 7, 2021
@pieh pieh added topic: source-contentful Related to Gatsby's integration with Contentful and removed status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer labels Jan 7, 2021
@pieh
Copy link
Contributor

pieh commented Jan 7, 2021

I wonder if #28438 which was supposed to prevent multiple concurrent fetches has some problem (the version of contentful plugin listed in details of the issue should contain this). The promise map/lookup is one level above cache-image there, but it might not catch everything?

/cc @pvdz

@ariadne-github
Copy link
Contributor Author

I see that the alghoritm in the base64 resolver is the same I suggested (so it should not suffer of this problem), but the tracedSVG resolver is not covered, it follows another path, which calls cacheImage.

const absolutePath = await cacheImage(store, image, options)

@pieh
Copy link
Contributor

pieh commented Jan 7, 2021

Oh, ok - that makes sense - then yes, please open PR with your suggested change - only thing I would change there is to use Map and not object for cache (it has better perf characteristics for lookups) and if possible use similar approach we used in PR I linked so something along those lines (it might now work as-is, that's just pseudo code-esque to fit into ways we memoise inflight promises across our code base):

const flyingRequests = new Map();

const existingFlyingRequest = flytingRequests.get(absolutePath)
if (existingFlyingRequest) {
  await existingFlyingRequest
  return absolutePath
}

const downloadPromise = new Promise(async (resolve, reject) => {
      const previewUrl = `http:${url}?${params.join(`&`)}`;
      const response = await axios({
        method: `get`,
        url: previewUrl,
        responseType: `stream`
      });
      const file = createWriteStream(absolutePath);
      response.data.pipe(file);
      file.on(`finish`, resolve);
      file.on(`error`, reject);
    });

flytingRequests.set(absolutePath, downloadPromise)
await downloadPromise
flytingRequests.delete(absolutePath)
return absolutePath

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: source-contentful Related to Gatsby's integration with Contentful type: bug An issue or pull request relating to a bug in Gatsby
Projects
None yet
2 participants