Skip to content

Commit

Permalink
Update to pdfjs v4
Browse files Browse the repository at this point in the history
requires ESM modules and dropping support for nodejs v16
  • Loading branch information
k-yle committed Jun 8, 2024
1 parent 98ce302 commit 84109cf
Show file tree
Hide file tree
Showing 11 changed files with 87 additions and 61 deletions.
1 change: 1 addition & 0 deletions .eslintrc.js → .eslintrc.cjs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ module.exports = {
extends: ["kyle"],
rules: {
quotes: "off",
"import/extensions": "off",
},
settings: {
jest: { version: 29 },
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ jobs:

strategy:
matrix:
node-version: [16.17, 18.x, 20.x, 21.x]
node-version: [18.x, 20.x, 22.x]

steps:
- name: ⏬ Checkout code
Expand Down
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## 4.0.0 (2024-06-08)

- 💥 BREAKING CHANGE: Drop support for node v16. The minimum version is now v18
- Updated pdfjs to v4

## 3.0.0 (2024-05-24)

- 💥 BREAKING CHANGE: Drop support for node v14 and v16. The minimum version is now v16.17
Expand Down
29 changes: 26 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,22 @@

Useful for unit tests of PDFs

Supports nodejs v16.17+, and comes with a CLI.
Supports nodejs v18+, and comes with a CLI.

## Install

```sh
npm install -S pdf-to-img
```

> [!IMPORTANT]
> You should use v4 by default. v4 requires nodejs v18 or later, and ESM modules.
>
> If you can't upgrade to v4 yet, you can still use v3. If you use v3, you can safely ignore `npm audit`'s [warning about pdfjs-dist](https://github.com/advisories/GHSA-wgrm-67xf-hhpq), since this library [disables `eval` by default](https://github.com/k-yle/pdf-to-img/commit/bdac3a1dcc2004c3f1fe7380bbb860086ec2746f).
## Example

NodeJS:
NodeJS (using ESM Modules):

```js
const { promises: fs } = require("node:fs");
Expand All @@ -37,10 +42,28 @@ async function main() {
main();
```

If your app does not support ESM modules, just change the import:

```diff
const { promises: fs } = require("node:fs");
- const { pdf } = require("pdf-to-img");

async function main() {
+ const { pdf } = await import("pdf-to-img");
let counter = 1;
const document = await pdf("example.pdf", { scale: 3 });
for await (const image of document) {
await fs.writeFile(`page${counter}.png`, image);
counter++;
}
}
main();
```

Using jest (or vitest) with [jest-image-snapshot](https://npm.im/jest-image-snapshot):

```js
const { pdf } = require("pdf-to-img");
import { pdf } from "pdf-to-img";

it("generates a PDF", async () => {
for await (const page of await pdf("example.pdf")) {
Expand Down
34 changes: 18 additions & 16 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 5 additions & 3 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@
"version": "3.0.0",
"author": "Kyle Hensel",
"description": "📃📸 Converts PDFs to images in nodejs",
"main": "dist",
"exports": "./dist/index.js",
"types": "dist/index.d.ts",
"type": "module",
"license": "MIT",
"files": [
"dist"
Expand All @@ -28,11 +29,12 @@
"pdf2img": "./bin/cli.mjs"
},
"engines": {
"node": ">=16.17"
"node": ">=18"
},
"engineStrict": true,
"dependencies": {
"canvas": "2.11.2",
"pdfjs-dist": "3.2.146"
"pdfjs-dist": "4.2.67"
},
"devDependencies": {
"@rushstack/eslint-patch": "^1.5.1",
Expand Down
20 changes: 9 additions & 11 deletions src/index.ts
Original file line number Diff line number Diff line change
@@ -1,13 +1,11 @@
import "./polyfill"; // do this before pdfjs

import path from "node:path";
// 🛑 inspite of esModuleInterop being on, you still need to use `import *`, and there are no typedefs
import * as _pdfjs from "pdfjs-dist/legacy/build/pdf";
import type { DocumentInitParameters } from "pdfjs-dist/types/src/display/api";
import { NodeCanvasFactory } from "./canvasFactory";
import { parseInput } from "./parseInput";
import * as pdfjs from "pdfjs-dist/legacy/build/pdf.mjs";
import type { DocumentInitParameters } from "pdfjs-dist/types/src/display/api.js";
import { NodeCanvasFactory } from "./canvasFactory.js";
import { parseInput } from "./parseInput.js";

const pdfjs: typeof import("pdfjs-dist") = _pdfjs;
const pdfjsPath = path.dirname(require.resolve("pdfjs-dist/package.json"));

/** required since k-yle/pdf-to-img#58, the objects from pdfjs are weirdly structured */
Expand Down Expand Up @@ -46,7 +44,7 @@ export type Options = {
/**
* Converts a PDF to a series of images. This returns a `Symbol.asyncIterator`
*
* @param input Either (a) the path to a pdf file, or (b) a data url, or (c) a buffer, or (d) a ReadableStream.
* @param input Either (a) the path to a pdf file, or (b) a data url, or (b) a buffer, (c) a buffer, or (e) a ReadableStream.
*
* @example
* ```js
Expand All @@ -68,7 +66,7 @@ export type Options = {
* ```
*/
export async function pdf(
input: string | Buffer | NodeJS.ReadableStream,
input: string | Uint8Array | Buffer | NodeJS.ReadableStream,
options: Options = {}
): Promise<{
length: number;
Expand All @@ -77,13 +75,15 @@ export async function pdf(
}> {
const data = await parseInput(input);

const canvasFactory = new NodeCanvasFactory();
const pdfDocument = await pdfjs.getDocument({
password: options.password, // retain for backward compatibility, but ensure settings from docInitParams overrides this and others, if given.
standardFontDataUrl: path.join(pdfjsPath, `standard_fonts${path.sep}`),
cMapUrl: path.join(pdfjsPath, `cmaps${path.sep}`),
cMapPacked: true,
isEvalSupported: false,
...options.docInitParams,
isEvalSupported: false,
canvasFactory,
data,
}).promise;

Expand All @@ -102,7 +102,6 @@ export async function pdf(

const viewport = page.getViewport({ scale: options.scale ?? 1 });

const canvasFactory = new NodeCanvasFactory();
const { canvas, context } = canvasFactory.create(
viewport.width,
viewport.height
Expand All @@ -111,7 +110,6 @@ export async function pdf(
await page.render({
canvasContext: context,
viewport,
canvasFactory,
}).promise;

return { done: false, value: canvas.toBuffer() };
Expand Down
37 changes: 16 additions & 21 deletions src/parseInput.ts
Original file line number Diff line number Diff line change
@@ -1,36 +1,31 @@
import { promises as fs } from "node:fs";
import { readFileSync } from "node:fs";
import { arrayBuffer } from "node:stream/consumers";

const PREFIX = "data:application/pdf;base64,";

async function streamToBuffer(
readableStream: NodeJS.ReadableStream
): Promise<Buffer> {
const chunks: Buffer[] = [];
for await (const chunk of readableStream) {
chunks.push(chunk as Buffer);
}
return Buffer.concat(chunks);
}

export async function parseInput(
input: string | Buffer | NodeJS.ReadableStream
): Promise<Buffer | Uint8Array> {
input: string | Uint8Array | Buffer | NodeJS.ReadableStream
): Promise<Uint8Array> {
// Buffer is a subclass of Uint8Array, but it's not actually
// compatible: https://github.com/sindresorhus/uint8array-extras/issues/4
if (Buffer.isBuffer(input)) return Uint8Array.from(input);

if (input instanceof Uint8Array) return input;

// provided with a data url or a path to a file on disk
if (typeof input === "string") {
return input.startsWith(PREFIX)
? Buffer.from(input.slice(PREFIX.length), "base64")
: new Uint8Array(await fs.readFile(input));
if (input.startsWith(PREFIX)) {
return Uint8Array.from(Buffer.from(input.slice(PREFIX.length), "base64"));
}
return new Uint8Array(readFileSync(input));
}

// provided a buffer
if (Buffer.isBuffer(input)) return input;

// provided a ReadableStream (or any object with an asyncIterator that yields buffer chunks)
if (typeof input === "object" && input && Symbol.asyncIterator in input) {
return streamToBuffer(input);
return new Uint8Array(await arrayBuffer(input));
}

throw new Error(
"pdf-to-img received an unexpected input. Provide a path to file, a data URL, a Buffer, or a ReadableStream."
"pdf-to-img received an unexpected input. Provide a path to file, a data URL, a Uint8Array, a Buffer, or a ReadableStream."
);
}
4 changes: 2 additions & 2 deletions tests/jsdom.test.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
// @vitest-environment jsdom
import { promises as fs, createReadStream } from "node:fs";
import { describe, expect, it } from "vitest";
import { pdf } from "../src";
import { pdf } from "../src/index.js";

describe("example.pdf", () => {
it("correctly generates a single png for the one page", async () => {
Expand Down Expand Up @@ -139,7 +139,7 @@ describe("invalid", () => {
async () => pdf(1)
).rejects.toThrow(
new Error(
"pdf-to-img received an unexpected input. Provide a path to file, a data URL, a Buffer, or a ReadableStream."
"pdf-to-img received an unexpected input. Provide a path to file, a data URL, a Uint8Array, a Buffer, or a ReadableStream."
)
);
});
Expand Down
2 changes: 1 addition & 1 deletion tests/node.test.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
// @vitest-environment node
import { createReadStream, promises as fs } from "node:fs";
import { describe, expect, it } from "vitest";
import { pdf } from "../src";
import { pdf } from "../src/index.js";

describe("example.pdf in node", () => {
it("correctly generates a single png for the one page in nodejs environment", async () => {
Expand Down
6 changes: 3 additions & 3 deletions tsconfig.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@
"esModuleInterop": true,
"lib": ["es2021"],
"skipLibCheck": true,
"module": "commonjs",
"target": "es2015",
"moduleResolution": "node",
"module": "node16",
"target": "es2020",
"moduleResolution": "node16",
"downlevelIteration": true,
"declaration": true,
"outDir": "temp"
Expand Down

0 comments on commit 84109cf

Please sign in to comment.