-
Notifications
You must be signed in to change notification settings - Fork 1.8k
meta data for author, title, subject, generator / creator / producer #2000
Comments
This requires a change in the upstream Qt PDF generation code. |
ok closed, but what's the solution for this problem ? I think there is no solution ? is it right ? |
Correct, there is no solution unless changes are made upstream in Qt. |
@ashkulz do you know where i could write a ticket/feature request for that issue? |
On the Qt issue tracker. |
ok thanks - there is already a ticket for QPdfWriter - https://bugreports.qt.io/browse/QTBUG-44451 - however i am not sure, if this is the right class/module!? what component is the "upstream Qt PDF" part of? |
It is correct. You might want to watch the issue in Qt to get notified when it gets fixed. |
Vote for the issue in upstream: https://bugreports.qt.io/browse/QTBUG-44451 |
In case anyone else wants meta data support without having to wait for any upstream fixes: I wrote this little script. Yes, it's messy but it works at least for me (
#!/bin/bash
# run: . html2pdf.sh input.html output.pdf
if [[ "$#" > 1 ]]; then
title="$(php -r 'echo html_entity_decode($argv[1], ENT_QUOTES|ENT_HTML5, "UTF-8")."\n";' "$(sed -En -e 's/^.*<title>(.*)<\/title>.*$/\1/p' "${1}")")"
author="$(php -r 'echo html_entity_decode(rawurldecode($argv[1]), ENT_QUOTES|ENT_HTML5, "UTF-8")."\n";' "$(sed -En -e 's/^.*<link rel="author" href="mailto:([^"]*)".*/\1/p' "${1}")")"
if [ -z "${author}" ]; then
author="$(php -r 'echo html_entity_decode($argv[1], ENT_QUOTES|ENT_HTML5, "UTF-8")."\n";' "$(sed -En -e 's/^.*<meta name="author" content="([^"]*)".*/\1/p' "${1}")")"
fi
subject="$(php -r 'echo html_entity_decode($argv[1], ENT_QUOTES|ENT_HTML5, "UTF-8")."\n";' "$(sed -En -e 's/^.*<meta name="description" lang="[^"]*" content="([^"]*)".*/\1/p' "${1}")")"
keywords="$(php -r 'echo implode("|[SEPARATOR]|", preg_split("/\s*,[\s,]*/", html_entity_decode($argv[1], ENT_QUOTES|ENT_HTML5, "UTF-8"), -1, PREG_SPLIT_NO_EMPTY))."\n";' "$(sed -En -e 's/^.*<meta name="keywords" lang="[^"]*" content="([^"]*)".*/\1/p' "${1}")")"
generator="$(php -r 'echo html_entity_decode($argv[1], ENT_QUOTES|ENT_HTML5, "UTF-8")."\n";' "$(sed -En -e 's/^.*<meta name="generator" lang="[^"]*" content="([^"]*)".*/\1/p' "${1}")")"
if [[ -z "${generator}" ]]; then
generator='-'
fi
wkhtmltopdf \
--load-error-handling 'abort' --load-media-error-handling 'abort' \
--print-media-type --minimum-font-size 1 \
-B 10mm -L 10mm -R 10mm -T 10mm -O Landscape -s A4 \
--no-stop-slow-scripts \
--run-script 'window.setTimeout(function(){window.status = "FOOBAR";}, 1000);' --window-status 'FOOBAR' \
--title "${title}" "${1}" "${tmp}" \
&& exiftool \
-z -P -sep "|[SEPARATOR]|" \
-XMP:Format="application/pdf" \
-Title="${title}" \
-PDF:Subject="${subject}" -XMP:Description="${subject}" \
-PDF:Author="${author}" -XMP:Creator="${author}" \
-XMP:Keywords="${keywords//|\[SEPARATOR\]|/, }" -PDF:Keywords="${keywords//|\[SEPARATOR\]|/, }" \
-XMP:Subject="${keywords//"\""/}" -AppleKeywords="${keywords//|\[SEPARATOR\]|/, }" \
-XMP:Marked=True \
-XMP:DocumentID="$([ -f "${2}" ] && exiftool -q -z -P -s3 -XMP:DocumentID "${2}" || exiftool -q -z -P -p 'uuid:$ExifTool:newguid' "${tmp}")" \
-XMP:InstanceID="uuid:$(exiftool -q -z -P -s3 -ExifTool:newguid "${tmp}")" \
-PDF:Creator="${generator}" -XMP:CreatorTool="${generator}" \
-Producer="$(exiftool -q -z -P -p '$PDF:Creator / $PDF:Producer' "${tmp}")" \
-CreateDate="$([ -f "${2}" ] && exiftool -q -z -P -s3 -PDF:CreateDate "${2}" || exiftool -q -z -P -s3 -PDF:CreateDate "${tmp}")" '-ModifyDate<PDF:CreateDate' '-XMP:MetadataDate<PDF:CreateDate' \
-overwrite_original_in_place "${tmp}" \
&& qpdf \
--suppress-recovery \
--linearize --stream-data=compress \
--encrypt "" "$(md5 -q -s "${RANDOM}$(( x=RANDOM, y=RANDOM, x>=y?x-y:y-x ))$(( $(date +%s) % RANDOM ))$(( x=RANDOM, y=RANDOM, x>=y?x-y:y-x ))${RANDOM}")" 128 \
--accessibility=y --extract=y --print=full --modify=none -- \
"${tmp}" "${tmp2}"
if [[ $? -lt 1 ]]; then
cp -f "${tmp2}" "${2}"
else
echo 'Some error occured' 1>&2
fi
if [[ -f "${tmp}" ]]; then
rm -f "${tmp}"
fi
if [[ -f "${tmp2}" ]]; then
rm -f "${tmp2}"
fi
else
echo "2 parameters expected, only got $#" 1>&2
fi turns <!DOCTYPE html>
<html lang="de">
<head>
<meta charset="utf-8">
<title>Some Page</title>
<meta name="author" content="Foo Bar">
<link rel="author" href="mailto:Foo%20Bar%20%3cfoo%40bar.com%3e">
<meta name="description" lang="en" content="Some really nice page">
<meta name="keywords" lang="en" content="foo,bar,page,pdf,wkhtmltopdf,exiftool,qpdf">
<meta name="generator" lang="en" content="Brackets">
</head>
<body style="font-family:sans-serif; font-size:200%;">
<h1>Foobar!</h1>
<p>Lorem… <em>you know the drill.</em></p>
</body>
</html> |
can you tell me how to run into windows 7? |
helo chris can you tell me it`s work for another web? example wkhtmltopdf.exe http://google.com/ google.pdf? please answer thx |
@jeacksmcione commented:
I'm afraid I can't. But the programs I mentioned also run on windows, so you should at least be able to do each of these steps, manually. |
@mikeponco commented:
No, it does not. You could try to use a tool like |
owww thanks chris i think wktopdf same with mpdf can create metadata but great tools wktopd:D |
To avoid having to fixup half a dozen places where we're creating PDF writers, and possibly ending up with new ill-configured writers in the future, patch PyPDF2's own writer with a subclass setting /Creator and /Producer. Note that this will not affect non-post-processed PDFs generated by wkhtmltopdf. wkhtmltopdf does not allow setting these properties[0][1], so to fix this issue we'd have to alter _run_wkhtmltopdf to pass the result through PyPDF2 in order to alter its metadata. [0] wkhtmltopdf/wkhtmltopdf#2000 [1] https://bugreports.qt.io/browse/QTBUG-44451
@chris-scheurle Could you turn this into a github project and also add licensing and licensing URL? |
at least field /Application (which is filled with wkhtmltopdf 0.12.5) can be changed inside wkhtmltopdf :) |
To avoid having to fixup half a dozen places where we're creating PDF writers, and possibly ending up with new ill-configured writers in the future, patch PyPDF2's own writer with a subclass setting /Creator and /Producer. Note that this will not affect non-post-processed PDFs generated by wkhtmltopdf. wkhtmltopdf does not allow setting these properties[0][1], so to fix this issue we'd have to alter _run_wkhtmltopdf to pass the result through PyPDF2 in order to alter its metadata. [0] wkhtmltopdf/wkhtmltopdf#2000 [1] https://bugreports.qt.io/browse/QTBUG-44451
To avoid having to fixup half a dozen places where we're creating PDF writers, and possibly ending up with new ill-configured writers in the future, patch PyPDF2's own writer with a subclass setting /Creator and /Producer. Note that this will not affect non-post-processed PDFs generated by wkhtmltopdf. wkhtmltopdf does not allow setting these properties[0][1], so to fix this issue we'd have to alter _run_wkhtmltopdf to pass the result through PyPDF2 in order to alter its metadata. [0] wkhtmltopdf/wkhtmltopdf#2000 [1] https://bugreports.qt.io/browse/QTBUG-44451 closes #29460 Signed-off-by: Xavier Morel (xmo) <xmo@odoo.com>
see https://code.google.com/p/wkhtmltopdf/issues/detail?id=1095
The text was updated successfully, but these errors were encountered: