Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executable JAR application startup is slower after 3.2.0 when Hibernate scanner is not disabled #40125

Closed
deathy opened this issue Mar 27, 2024 · 6 comments
Labels
type: regression A regression from a previous release
Milestone

Comments

@deathy
Copy link

deathy commented Mar 27, 2024

I investigated some slow application startup behavior and it seems Executable JAR can be extremely slower than alternatives.

The docs mention "slight" differences but it seems there are some cases where this can be a big difference

Unpacking the Executable JAR
This is actually slightly faster on startup (depending on the size of the jar)

Times observed:

  • executable jar java -jar myapp.jar
    • started in 53 seconds
  • unpacked jar with jar -xf myapp.jar
  • JarLauncher java org.springframework.boot.loader.launch.JarLauncher
    • started in 12.8 seconds
  • natural main method java -cp "BOOT-INF/classes:BOOT-INF/lib/*" com.example.MyApplication
    • started in 12.0 seconds

So 4-5X difference between executable jar and JarLauncher, minimal difference unpacked with/without JarLauncher

I dug into my logs and saw that there was a 3X jump in November last year that coincided with upgrading to Spring Boot 3.2.0
That lead me to possible cause from rewriting of nested jar support in #37668

Another big jump I observed afterwards was after adding more static resources in /static/ (CSS/JS/source maps, fonts .ttf/.woff2).
The application does have a lot of static resources inside, especially in development/test (source maps, unminified JS/CSS).

Profiled using async-profiler launched as agent on startup. Lots of C2/other tasks but biggest part of spring init was this:

image

So the initial trigger is EntityManager lookup and Hibernate scanning classpath for entities.
But most of the time is spent in org.springframework.book.loader.zip. DataBlockInputStream and VirtualDataBlock which leads again to #37668 where those were created.

Actual disk reads are minimal and hibernate parsing is a thin slice of it. Hibernate is only parsing .class files not static resources but it seems the boot loader fully reads all static resources and also creates a lot of virtual blocks.

The difference is quite large so I think it merits some further investigation.
It's also quite counter-intuitive, adding static resources can drastically increase time for EntityManager initialization.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Mar 27, 2024
@philwebb philwebb added type: regression A regression from a previous release and removed status: waiting-for-triage An issue we've not yet triaged labels Mar 28, 2024
@philwebb philwebb added this to the 3.2.x milestone Mar 28, 2024
@philwebb
Copy link
Member

Thanks for reporting this @deathy. Do you by any chance have a sample application that replicates the performance issues you're seeing?

@deathy
Copy link
Author

deathy commented Mar 28, 2024

Yes I'll try to create a proper sample in the next days. Might take some time, have to see with different numbers/sizes of files and need to see if it's the same with an in-memory database.

Meanwhile I have some numbers from debugging my app startup:

  • 1823 files and 236 directories inside executable jar
  • in org.springframework.boot.loader.zip.VirtualZipDataBlock#VirtualZipDataBlock constructor I have:
    • centralRecords.length = 1754
    • parts.size = 14033
      • ByteArrayDataBlock: 5263 items
      • VirtualZipDataBlock$DataPart: 8770 items
    • centralParts.size = 5262
    • sizeOfCentralDirectory = 176218

Grouping the 14033 parts by DataBlock::size, top 5:

  • size 0: 3508 blocks
  • size 46: 1804 blocks
  • size 30: 1776 blocks
  • size 16: 1756 blocks
  • size 2: 215 blocks

Also ran async-profiler with event=alloc:

image

@deathy
Copy link
Author

deathy commented Mar 28, 2024

Observed during debugging:

JDK JarInputStream/ZipInputStream called from hibernate loader read sequentially.

Reading sequentially next 512 or how ever many bytes (saw some 3 byte reads for iterating entries) re-incur the whole traversal that was already done since for (DataBlock part : this.parts) inside org.springframework.boot.loader.zip.VirtualDataBlock#read is a linear lookup?

It can't skip ahead even though pos is passed in as param.
That matches with java/util/ImmutableCollections$ListItr.next appearing in profile.

image

@philwebb
Copy link
Member

philwebb commented Mar 28, 2024

@deathy just to double check, are you using the latest 3.2.x release?

@deathy
Copy link
Author

deathy commented Mar 28, 2024

@philwebb yes I'm on 3.2.4 with original application.

I finally managed to make a sample: https://github.com/deathy/spring-startup-test
Has a bunch of randomly generated files and basic config to trigger visible difference (although much lower than original app since it's simpler). Measured startup times are in README there.
And through it discovered another condition of this happening: actually having the hibernate scanner active.

In automatic configuration since #15565 and #15321 the Hibernate Scanner is disabled by default.
But it is not disabled if you're actually using EntityManagerFactoryBuilder manually unless you pass in specific property.

In my original app I added .properties(Map.of("hibernate.archive.scanner","org.hibernate.boot.archive.scan.internal.DisabledScanner")) and that completely eliminated time from Hibernate Scanner.

There I also had two persistence contexts (separate DBs) so the scan was being performed twice.
I still see a significant difference compared to running unpacked but 2-3X not 4-5X and profiling so far hasn't shown any big obvious block like before, everything is more spread out.

The sequential iteration through parts in VirtualDataBlock#read still feels wrong but I have a workaround for scanner and not sure anything else uses it that heavily.

Hibernate Scanner questions:

  • should JpaBaseConfiguration#entityManagerFactory and EntityManagerFactoryBuilder have the same default meaning disabling Hibernate Scanner?
    • if not, maybe potential footgun should be documented somewhere?

@philwebb philwebb changed the title Executable JAR application startup is much slower after 3.2.0 Executable JAR application startup is slower after 3.2.0 when Hibernate scanner is not disabled Apr 17, 2024
@philwebb
Copy link
Member

@deathy I've added a few optimization but unfortunately I don't think there is much that we can do to solve this one. It looks like StandardArchiveDescriptorFactory is creating a JarInputStreamBasedArchiveDescriptor which results in a ZipInputStream being used to read the jar entries.

If the Hibernate Scanner could be changed, I think it could be a lot more optimal with our nested jars. Specifically, if they could do url.openConnection() then if the connection is an instance of JarURLConnection call connection.getJarFile() we wouldn't need to create the virtual file structure at all.

This might be something that could be prototyped, but hopefully a typical Boot application won't hit the problem.

For your application, you might want to consider using moving to the exploded archive format. We've been doing some work in Spring Boot 3.3 that should make exploded archives easier to create. See https://docs.spring.io/spring-boot/reference/deployment/efficient.html#deployment.efficient.unpacking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: regression A regression from a previous release
Projects
None yet
Development

No branches or pull requests

3 participants