-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'java.lang.OutOfMemoryError' when using a Base64 encoded, embedded JPEG image #51
Comments
Hi @skjardenCode The pdfbox code in question is at the link below. The only thing I can see is that all the readers returned by the iterator may not have dispose called on them. It is also surprising that they always decompress the entire image even though I believe they just need metadata. I'll comment again here when I have debugged further. |
Unfortunately, I can't replicate this on mac (even with import java.io.ByteArrayInputStream;
import java.util.Base64;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.graphics.image.JPEGFactory;
public class TestUsage {
public static void main(String...args) throws Exception {
String jpeg =
"/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIs" +
"IxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIy" +
"MjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAABAAEDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAA" +
"AAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAk" +
"M2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKT" +
"lJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QA" +
"HwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdh" +
"cRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hp" +
"anN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk" +
"5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD3+iiigD//2Q==";
byte[] jpegBytes = Base64.getDecoder().decode(jpeg);
for (int i = 0; i < 10000000; i++) {
PDDocument doc = new PDDocument();
try {
JPEGFactory.createFromStream(doc, new ByteArrayInputStream(jpegBytes));
} finally {
doc.close();
}
}
}
} Thanks, |
Hey Daniel, thanks a lot for your reply. I'm currently on a trip, I'll test your code as soon as I get back home in a few hours.
You speak about the method // ....
finally
{
if (iis != null)
{
iis.close();
}
reader.dispose();
} closing the used reader. Or do you see another location where a reader does not get closed properly? ~ Timo |
Hey Daniel, I'm sorry for the late answer, had a lot of work to be finished first. I did the follwing 3 things with the given results:
Test-code: public class OpenHtmlToPdfOutOfMemoryTest2
{
public static void main( String... args ) throws Exception
{
String jpeg =
"/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIs" +
"IxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIy" +
"MjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAABAAEDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAA" +
"AAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAk" +
"M2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKT" +
"lJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QA" +
"HwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdh" +
"cRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hp" +
"anN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk" +
"5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD3+iiigD//2Q==";
String html =
"<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"de\" lang=\"de\">" +
"<body>" +
"<img src=\"data:image/jpeg;base64," + jpeg + "\" />" +
"</body>" +
"</html>";
for ( int i = 0; i < 10000000; i++ )
{
System.out.println( i );
PdfRendererBuilder builder = new PdfRendererBuilder();
builder.withHtmlContent( html, null );
builder.toStream( new ByteArrayOutputStream() );
builder.run();
}
}
} (In the code above it is still your JPEG-code due to the size) I seems that this is a system / OS depended problem, maybe with ImageIO native code of some sort. Unfortunately, I'm no expert of the Java Memory Model. The heap during the test is OK, but the used memory shown in the Windows Task Manager is just "exploding". I know that those memory values are "virtual memory usage" and not directly related to JVM heap usage. But the memory usage accumulation and the As a side note, maybe important: I'm still using Java 6 ("1.6.0_45", SUN JDK) due to project restrictions at the moment. Please let me know if I can be of any help, running some more tests etc. |
Embarrassingly, it turns out I wasn't calling dispose! I've added it and done a release |
Sorry to butt into this thread but shouldn't reader.dispose() be inside a finally clause? |
Yes, I think this should be the case. Just declare
Thanks a lot, I'll try it tomorrow and report back. |
Hey Daniel,
I tested it again and it works - no more memory leak, I can do thousands of iterations, the heap and Windows Task Manager both stay below ~ 60 MB memory usage. I've also done a re-check by commenting out the line As @MartyMcMartface mentioned, you should do the disposal inside of a Thanks for your support! |
Hello,
I recently experienced an OutOfMemory error while using the OpenHtmlToPDF framework. Our requirements are rather normal, that is, generating a PDF file out of a simple HTML file which contains only basic CSS 2.0 and XHTML - mainly tables, text and up to three images.
We ran a stress test because the framework should be integrated in our server component, which needs to convert HTML to PDF for our clients. I used a fairly simple for-loop to iterate over HTML files and for each HTML-content, we used OpenHtmlToPDF to generate a PDF file. After about 6000 iterations, the test stopped and a "java.lang.OutOfMemoryError" was shown.
I then took apart all the components, stripped away code step by step to reproduce the OutOfMemoryError with minimal test-code and the result was this simple test case:
The file html-with-embedded-jpg.html is a simple HTML with a img-Tag with embedded JPEG image (Base64 encoded). You can display that HTML file with the image with any browser.
Running the above test, one can see in the Windows Task Manager, how the occupied memory grows rapidly (interestingly, the Java heap space is doing "ok"). In iteration 5000, it was at ~ 1,6 GB.
After about iteration 6000, the "java.lang.OutOfMemoryError" occurs, with the following stack trace:
I digged into the code and ended up in
PdfBoxOutputDevice.realizeImage(PdfBoxImage)
where I found the following lines:So there is a condition where
JPEGFactory.createFromStream
is used, if the image is an JPEG, otherwiseImageIO.read
is used.So I changed my test-html-file to embed a PNG instead of an JPEG image - and the OutOfMemory error was gone. Java heap is doing fine, the Windows Task Manager shows only ~ 80 MB memory usage for the Java process no matter how many iterations I run.
Doing a simple seach I came across this:
Maybe there is a problem in PDFBox or in the way, the PDFBox-API is used to integrate an JPEG image into a PDDocument, I'm not sure.
So, the workaround for me is to not use the JPEG image format when embedding an image into the HTML code, but instead using PNG.
I wanted to post this issue here first. I'm sure you know how to debug the code better than me, but I hope I could help a bit with the above information.
Hope to hear from you and that there is an easy fix for it. Or maybe this is a bug in PdfBox eventually.
Thanks a lot!
The text was updated successfully, but these errors were encountered: