Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF copy/past text from Acrobat Reader store unicode id in clipboard #332

Closed
Nohbo opened this issue Jan 18, 2025 · 8 comments
Closed

PDF copy/past text from Acrobat Reader store unicode id in clipboard #332

Nohbo opened this issue Jan 18, 2025 · 8 comments

Comments

@Nohbo
Copy link

Nohbo commented Jan 18, 2025

Hi,
Thank you for creating this package!

I have a problem when I use Acrobat Reader on Windows (no problem with Firefox or XpdfReader), the copied text becomes unreadable and displays unicodes : 􀀁􀀂􀀃􀀄􀀅􀀃􀀄􀀅􀀁􀀂􀀃􀀁
Probably something to do with Cmap, but that's outside my field of knowledge…

I use canvas.RichText to create text.

File: doc.pdf

Thank you !

@tdewolff
Copy link
Owner

tdewolff commented Jan 20, 2025

Are you sure you used canvas and used the latest version to generate this PDF? The unique tag at the start of the file is different than what canvas generates.

EDIT: I did find a possible bug that trips up some readers, maybe that fixes it for you?

@Nohbo
Copy link
Author

Nohbo commented Jan 21, 2025

The PDF file sent may have been altered by an optimisation programme.
Here is the PDF directly from Canvas: file.pdf
The build is indeed the latest: 19559a9.

I've tested the Barlow and DejaVu fonts, with and without the subset option, on two Windows, and the problem persists.

@tdewolff
Copy link
Owner

That is still not a PDF from Canvas!

However, there may be a bug in the CIDToGID array which is invalid for embedded CFF fonts. However, the DejaVu fonts ares TrueType fonts if I remember correctly, so this may not be it. Furthermore, I'm getting Syntax Warning: Mismatch between font type and embedded font file when using Linux tools, so this may also be an indication.

@tdewolff
Copy link
Owner

I've added a change that removes the warnings for a mismatch of type and font file. Maybe this works for you now? Can you try with both a TTF and a CFF font? In this repository there is an example CFF font in resources/EBGaramond12-Regular.otf (.ttf is always TrueType, but .otf may be either).

@Nohbo
Copy link
Author

Nohbo commented Jan 21, 2025

Problem not solved. I tested both types of font, TTF and OTF.
Version: c15dfee
PDF: file.pdf
Code:

package main

import (
	"log"
	"os"
	"path/filepath"

	"github.com/tdewolff/canvas/renderers/pdf"

	"github.com/tdewolff/canvas"
)

func main() {

	dir, err := os.Getwd()
	if err != nil {
		log.Println(err)
	}

	path := filepath.Join(dir, "file.pdf")

	println(path)

	f, err := os.Create(path)
	if err != nil {
		panic(err)
	}
	defer f.Close()

	doc := pdf.New(f, 210, 297, &pdf.Options{
		Compress:    false,
		SubsetFonts: true,
	})

	ctx := canvas.NewContext(doc)
	ctx.SetCoordSystem(canvas.CartesianIV)
	ctx.SetFillColor(canvas.Black)

	barlow := canvas.NewFontFamily("Barlow")
	garamond := canvas.NewFontFamily("EBGaramond12")

	barlowRegularPath := filepath.Join(dir, "assets/Barlow-Regular.ttf")
	if err := barlow.LoadFontFile(barlowRegularPath, canvas.FontRegular); err != nil {
		panic(err)
	}

	barlowBoldPath := filepath.Join(dir, "assets/Barlow-Bold.ttf")
	if err := barlow.LoadFontFile(barlowBoldPath, canvas.FontBold); err != nil {
		panic(err)
	}

	garamondBPath := filepath.Join(dir, "assets/EBGaramond12-Regular.otf")
	if err := garamond.LoadFontFile(garamondBPath, canvas.FontBold); err != nil {
		panic(err)
	}

	barlowRegular := barlow.Face(9.0, canvas.FontRegular, canvas.FontNormal, canvas.RGBA(0, 0, 0, 1.0))
	barlowBold := barlow.Face(9.0, canvas.FontBold, canvas.FontNormal, canvas.RGBA(0, 0, 0, 1.0))
	garamondRegular := garamond.Face(9.0, canvas.FontRegular, canvas.FontNormal, canvas.RGBA(0, 0, 0, 1.0))

	richTextTTF := canvas.NewRichText(barlowRegular)
	richTextTTF.WriteString("This is ")
	richTextTTF.WriteFace(barlowBold, "TTF")

	richTextOTF := canvas.NewRichText(garamondRegular)
	richTextOTF.WriteString("This is OTF")

	ctx.DrawText(25, 25, richTextTTF.ToText(160, 25, canvas.Left, canvas.Top, 0, 0))
	ctx.DrawText(25, 50, richTextOTF.ToText(160, 25, canvas.Left, canvas.Top, 0, 0))
	doc.Close()
}

I don't know if this is a source that can help, but there was a similar problem for a javascript package: Hopding/pdf-lib#245

tdewolff added a commit that referenced this issue Jan 21, 2025
@tdewolff
Copy link
Owner

There was a bug in CMap generation, thanks for the link! Can you please try again? I can't reproduce locally, even Chrome always copied text fine.

@Nohbo
Copy link
Author

Nohbo commented Jan 21, 2025

It works great! Thank you very much for your nice work!
file.pdf

@Nohbo Nohbo closed this as completed Jan 21, 2025
@tdewolff
Copy link
Owner

Thank you for raising this issue and helping debug this on Windows!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants