Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds -H:+GenerateBuildArtifactsFile, copies .so from remote container #41201

Merged
merged 3 commits into from
Jul 10, 2024

Conversation

Karm
Copy link
Member

@Karm Karm commented Jun 14, 2024

fixes #41020

This PR adds to all native builds the option -H:+GenerateBuildArtifactsFile that produces a small JSON file during the native-image build process. The JSON file contains a list of built artifacts.

We use that file as a manifest of what was built and what we need to copy from the remote container where the remote build took place. If no such file exists or if there are no extra artifacts to copy, we silently do nothing.

What the new file looks like

~/workspaceRH/reproducers/poi$ cat ./build/poi-1.0.0-SNAPSHOT-native-image-source-jar/build-artifacts.json |jq
{
  "build_info": [
    "poi-1.0.0-SNAPSHOT-runner-build-output-stats.json"
  ],
  "executables": [
    "poi-1.0.0-SNAPSHOT-runner"
  ],
  "jdk_libraries": [
    "libawt.so",
    "libawt_headless.so",
    "libawt_xawt.so",
    "libfontmanager.so",
    "libfreetype.so",
    "libjavajpeg.so",
    "liblcms.so",
    "libmlib_image.so",
    "libjava.so",
    "libjvm.so"
  ]
}

Why don't you podman cp?

Podman cp does not support globs, i.e. cp /project/*.so won't work, which I found surprising so I duly noted that in the comments in the code so as to save time for others who might be tempted to refactor it later.

@Karm Karm requested review from galderz and geoand June 14, 2024 09:50
@Karm Karm self-assigned this Jun 14, 2024
@Karm Karm requested a review from gsmet June 14, 2024 09:50
@Karm
Copy link
Member Author

Karm commented Jun 14, 2024

Open for discussion, I think I could add a test, something like

integration-tests/awt-packaging/aws-lambda
integration-tests/awt-packaging/remote-container

to test that this PR keeps working and that this #35718 doesn't break. It would create two additional builds, taking more time though. Perhaps a thing suited for quickstarts better. Not sure.

@geoand
Copy link
Contributor

geoand commented Jun 14, 2024

Do we need both those new integration tests? Can't we just have one?

}
try (BufferedReader reader = new BufferedReader(new InputStreamReader(process.getInputStream()))) {
return reader.readLine();
return reader.lines().toList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation behind this change? I understand that it's more generic this way but since we only use the method in a single place where this is not needed it seems to unnecessarily (for the time being) complicate things.

Copy link
Member Author

@Karm Karm Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zakkak I used that to read the output of exec for find to list the .so files in the started container. Then I abandoned that approach in favor of using the -H:+GenerateBuildArtifactsFile. So strictly speaking, we can leave in the thing that returns only the first line of output, however confusing that might be. Would rename the method though then, to ...readOneLineOutput...

Copy link
Contributor

@zakkak zakkak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Karm!

LGTM but it feels like touching more things than needed. I added some comments for your consideration.

Comment on lines -61 to +104
copyFromContainerVolume(outputDir, "sources", "Failed to copy sources from container volume back to the host.");
String symbols = String.format("%s.debug", nativeImageName);
copyFromContainerVolume(outputDir, symbols, "Failed to copy debug symbols from container volume back to the host.");
copyFromContainerVolume(outputDir, "sources",
"Failed to copy sources from container volume back to the host.");
final String symbols = String.format("%s.debug", nativeImageName);
copyFromContainerVolume(outputDir, symbols,
"Failed to copy debug symbols from container volume back to the host.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are not necessary to fix the issue.

@Karm
Copy link
Member Author

Karm commented Jun 14, 2024

@geoand

Do we need both those new integration tests? Can't we just have one?

I wonder how to jam both into a single run. Perhaps we can. Like a project that has quarkus.native.remote-container-build=true in application.properties but the final artifact is .zip file "function" for AWS Lambda, right?

Could be one 👍

Copy link
Member

@gsmet gsmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will let you adjust with Foivos comments but it's really cool that we can fix this.

This won't solve the case when you build your container using the Dockerfile though, right?

@quarkus-bot

This comment has been minimized.

Test Lambda and remote container packaging
@Karm
Copy link
Member Author

Karm commented Jun 14, 2024

@geoand

Commit 59b63bb adds a single new integration test that has in its application.properties:

quarkus.lambda.handler=test
quarkus.native.remote-container-build=true

It tests that both #41020 is fixed and that #35718 has no regression.

Tested as

$ ./mvnw clean verify -f integration-tests -pl awt-packaging -Dnative -Dnative.surefire.skip=false

It passes, although it produces these Lambda server related warnings, I guess it could have something to do with the fact that the LambdaClient is deprecated...?

I tried to ask about it on Zulip: https://quarkusio.zulipchat.com/#narrow/stream/187038-dev/topic/Testing.20AWS.20Lambda.20locally/near/444730456

[INFO] --- failsafe:3.2.5:integration-test (default) @ quarkus-integration-test-awt-packaging ---
[INFO] Using auto detected provider org.apache.maven.surefire.junitplatform.JUnitPlatformProvider
[INFO] Using auto detected provider org.apache.maven.surefire.junitplatform.JUnitPlatformProvider
[INFO] Using auto detected provider org.apache.maven.surefire.junitplatform.JUnitPlatformProvider
[INFO] 
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running io.quarkus.it.jaxb.AwtJaxbTestIT
getting verticle
getting verticle
getting verticle
getting verticle
getting verticle
getting verticle
getting verticle
getting verticle
getting verticle
getting verticle
getting verticle
getting verticle
getting verticle
getting verticle
getting verticle
getting verticle
Executing "/home/karm/workspaceRH/quarkus/integration-tests/awt-packaging/target/quarkus-integration-test-awt-packaging-999-SNAPSHOT-runner -Dquarkus.http.port=8081 -Dquarkus.http.ssl-port=8444 -Dtest.url=http://localhost:8081 -Dquarkus.log.file.path=/home/karm/workspaceRH/quarkus/integration-tests/awt-packaging/target/quarkus.log -Dquarkus.log.file.enable=true -Dquarkus.log.category."io.quarkus".level=INFO -Dquarkus-internal.aws-lambda.test-api=localhost:5387"
__  ____  __  _____   ___  __ ____  ______ 
 --/ __ \/ / / / _ | / _ \/ //_/ / / / __/ 
 -/ /_/ / /_/ / __ |/ , _/ ,< / /_/ /\ \   
--\___\_\____/_/ |_/_/|_/_/|_|\____/___/   
2024-06-14 18:42:08,760 INFO  [io.quarkus] (main) quarkus-integration-test-awt-packaging 999-SNAPSHOT native (powered by Quarkus 999-SNAPSHOT) started in 0.015s. Listening on: http://0.0.0.0:8081
2024-06-14 18:42:08,760 INFO  [io.quarkus] (main) Profile prod activated. 
2024-06-14 18:42:08,760 INFO  [io.quarkus] (main) Installed features: [amazon-lambda, awt, cdi, resteasy, resteasy-jaxb, smallrye-context-propagation, vertx]
2024-06-14 18:42:08,760 INFO  [io.qua.ama.lam.run.AbstractLambdaPollLoop] (Lambda Thread (NORMAL)) Listening on: http://localhost:5387/2018-06-01/runtime/invocation/next
2024-06-14 18:42:09,032 WARN  [io.qua.ama.lam.tes.LambdaClient] (main) LambdaClient has been deprecated and will be removed in future Quarkus versions.  You can now invoke using a built in test http server.  See docs for more details
2024-06-14 18:42:09,664 INFO  [io.qua.it.jax.Resource] (executor-thread-1) Received book: io.quarkus.it.jaxb.Book@2a3173ef
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.712 s -- in io.quarkus.it.jaxb.AwtJaxbTestIT
2024-06-14 18:42:12,661 ERROR [io.qua.ama.lam.run.AbstractLambdaPollLoop] (Lambda Thread (NORMAL)) Error running lambda (NORMAL): java.net.SocketException: Connection reset
        at java.base@21.0.3/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:318)
        at java.base@21.0.3/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:346)
        at java.base@21.0.3/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:796)
        at java.base@21.0.3/java.net.Socket$SocketInputStream.read(Socket.java:1099)
        at java.base@21.0.3/java.io.BufferedInputStream.fill(BufferedInputStream.java:291)
        at java.base@21.0.3/java.io.BufferedInputStream.read1(BufferedInputStream.java:347)
        at java.base@21.0.3/java.io.BufferedInputStream.implRead(BufferedInputStream.java:420)
        at java.base@21.0.3/java.io.BufferedInputStream.read(BufferedInputStream.java:399)
        at java.base@21.0.3/sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:827)
        at java.base@21.0.3/sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:759)
        at java.base@21.0.3/sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:952)
        at java.base@21.0.3/sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:759)
        at java.base@21.0.3/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1690)
        at java.base@21.0.3/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1599)
        at java.base@21.0.3/sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLConnection.java:3235)
        at io.quarkus.amazon.lambda.runtime.AbstractLambdaPollLoop$1.run(AbstractLambdaPollLoop.java:95)
        at java.base@21.0.3/java.lang.Thread.runWith(Thread.java:1596)
        at java.base@21.0.3/java.lang.Thread.run(Thread.java:1583)
        at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:896)
        at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:872)

2024-06-14 18:42:12,661 ERROR [io.qua.run.StartupContext] (Lambda Thread (NORMAL)) Running a shutdown task failed [Error Occurred After Shutdown]: java.lang.RuntimeException: java.lang.InterruptedException
        at io.quarkus.vertx.http.runtime.VertxHttpRecorder$13.run(VertxHttpRecorder.java:888)
        at io.quarkus.runtime.StartupContext.runAllAndClear(StartupContext.java:87)
        at io.quarkus.runtime.StartupContext.close(StartupContext.java:79)
        at io.quarkus.runner.ApplicationImpl.doStop(Unknown Source)
        at io.quarkus.runtime.Application.stop(Application.java:208)
        at io.quarkus.runtime.Application.stop(Application.java:155)
        at io.quarkus.amazon.lambda.runtime.AbstractLambdaPollLoop$1.run(AbstractLambdaPollLoop.java:165)
        at java.base@21.0.3/java.lang.Thread.runWith(Thread.java:1596)
        at java.base@21.0.3/java.lang.Thread.run(Thread.java:1583)
        at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:896)
        at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:872)
Caused by: java.lang.InterruptedException
        at java.base@21.0.3/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1100)
        at java.base@21.0.3/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:230)
        at io.quarkus.vertx.http.runtime.VertxHttpRecorder$13.run(VertxHttpRecorder.java:886)
        ... 10 more

2024-06-14 18:42:12,662 INFO  [io.quarkus] (Lambda Thread (NORMAL)) quarkus-integration-test-awt-packaging stopped in 0.001s
2024-06-14 18:42:12,664 WARN  [io.net.boo.ServerBootstrap] (vert.x-acceptor-thread-0) Failed to register an accepted channel: [id: 0x94896071, L:/127.0.0.1:5387 ! R:/127.0.0.1:48170]: java.lang.IllegalStateException
        at io.vertx.core.net.impl.VertxEventLoopGroup.next(VertxEventLoopGroup.java:37)
        at io.vertx.core.net.impl.VertxEventLoopGroup.register(VertxEventLoopGroup.java:53)
        at io.netty.bootstrap.ServerBootstrap$ServerBootstrapAcceptor.channelRead(ServerBootstrap.java:241)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
        at io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:97)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:553)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:840)

[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] 

@Karm
Copy link
Member Author

Karm commented Jun 14, 2024

@zakkak
I adjusted the changes, namely:

  • switching from regex to full blown Json reader

I left the changes regarding reading output there, both the error/debug ones and the runCommandAndReadOutput.
If you find it important that those are not touched, I find renaming those an alternative.

@Karm
Copy link
Member Author

Karm commented Jun 14, 2024

@gsmet

I will let you adjust with Foivos comments but it's really cool that we can fix this.

Thx. I originally thought this would be 1 line of :-D

 containerRuntime.getExecutableName(), "cp",
                containerId + ":/project/*.so",
                outputDir.toAbsolutePath().toString()

but then

Further note that podman cp does not support globbing (e.g., cp dir/*.txt). To copy multiple 
files from the host to the container use xargs(1) or find(1) (or similar tools for chaining commands) 
in conjunction with podman cp. To copy multiple files from the container to the host, use podman 
mount CONTAINER and operate on the returned mount point instead (see ALTERNATIVES below).

Source: https://docs.podman.io/en/latest/markdown/podman-cp.1.html#description

This won't solve the case when you build your container using the Dockerfile though, right?

This PR deals the scenario when you run your build "remotely" in a container, rather than just calling native-image from a container. The OP had a problem with Gradle, but it's not Gradle specific.

If you are building your runtime image using a Dockerfile, you still need to have a line in the Dockerfile that copies those files, e.g. https://github.com/quarkusio/quarkus/blob/main/integration-tests/awt/src/main/docker/Dockerfile.native#L10

Is there some kind of "Default" Dockerfile I can edit?

It used to be a problem because not everybody was using GraalVM/Mandrel 23.0+ and older versions did not produce additional .so files. Docker's COPY of non-existing files then triggered errors. That required "hacks" like copying an unnecessary, but always present file, such as some .properties file, so as the COPY always copies something:
https://github.com/quarkusio/quarkus-quickstarts/blob/main/awt-graphics-rest-quickstart/src/main/docker/Dockerfile.native#L31

WDYT?

@quarkus-bot quarkus-bot bot added the area/infra-automation anything related to CI, bots, etc. that are used to automated our infrastructure label Jun 14, 2024
@gsmet
Copy link
Member

gsmet commented Jun 14, 2024

It used to be a problem because not everybody was using GraalVM/Mandrel 23.0+ and older versions did not produce additional .so files. Docker's COPY of non-existing files then triggered errors. That required "hacks" like copying an unnecessary, but always present file, such as some .properties file, so as the COPY always copies something

Yeah I think we should go with it if we are sure we always have a least one properties file. Be aware that we might not have an application.properties for instance. What was the file you counted on?

The Dockerfiles template for newly generated projects are in tools/base-codestarts/src/main/resources/codestarts/quarkus/tooling/dockerfiles.

@Karm
Copy link
Member Author

Karm commented Jun 14, 2024

@gsmet

It used to be a problem because not everybody was using GraalVM/Mandrel 23.0+ and older versions did not produce additional .so files. Docker's COPY of non-existing files then triggered errors. That required "hacks" like copying an unnecessary, but always present file, such as some .properties file, so as the COPY always copies something

Yeah I think we should go with it if we are sure we always have a least one properties file. Be aware that we might not have an application.properties for instance. What was the file you counted on?

It must have been one of the artifacts in target/, so I think quarkus-artifact.properties.
Is there a scenario where it wouldn't be present?

The Dockerfiles template for newly generated projects are in tools/base-codestarts/src/main/resources/codestarts/quarkus/tooling/dockerfiles.

Gonna take a look.

@Karm
Copy link
Member Author

Karm commented Jun 14, 2024

Ad "getting verticle", a message per my workstation core, why is it printing there? It's gonna be 256 messages on dualsocket Altra Max :)

[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running io.quarkus.it.jaxb.AwtJaxbTestIT
getting verticle
...
getting verticle

...and indeed, I've juts tried on an 80 core machine and there is 80 getting verticle messages.

@quarkus-bot
Copy link

quarkus-bot bot commented Jun 15, 2024

Status for workflow Quarkus CI

This is the status report for running Quarkus CI on commit 3150791.

✅ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

You can consult the Develocity build scans.


Flaky tests - Develocity

⚙️ JVM Tests - JDK 17

📦 extensions/smallrye-reactive-messaging/deployment

io.quarkus.smallrye.reactivemessaging.hotreload.ConnectorChangeTest.testUpdatingConnector - History

  • Expecting actual: ["-6","-8","-9","-10","-11","-12","-13","-14"] to start with: ["-6", "-7", "-8", "-9"] - java.lang.AssertionError
java.lang.AssertionError: 

Expecting actual:
  ["-6","-8","-9","-10","-11","-12","-13","-14"]
to start with:
  ["-6", "-7", "-8", "-9"]

	at io.quarkus.smallrye.reactivemessaging.hotreload.ConnectorChangeTest.testUpdatingConnector(ConnectorChangeTest.java:41)

📦 integration-tests/reactive-messaging-kafka

io.quarkus.it.kafka.KafkaConnectorTest.testFruits - History

  • Assertion condition defined as a Lambda expression in io.quarkus.it.kafka.KafkaConnectorTest expected: <6> but was: <5> within 10 seconds. - org.awaitility.core.ConditionTimeoutException
org.awaitility.core.ConditionTimeoutException: Assertion condition defined as a Lambda expression in io.quarkus.it.kafka.KafkaConnectorTest expected: <6> but was: <5> within 10 seconds.
	at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:167)
	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119)
	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31)
	at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:1006)
	at org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:790)
	at io.quarkus.it.kafka.KafkaConnectorTest.testFruits(KafkaConnectorTest.java:63)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)

⚙️ JVM Tests - JDK 21

📦 extensions/smallrye-reactive-messaging-kafka/deployment

io.quarkus.smallrye.reactivemessaging.kafka.deployment.dev.KafkaDevServicesDevModeTestCase.sseStream - History

  • Assertion condition defined as a Lambda expression in io.quarkus.smallrye.reactivemessaging.kafka.deployment.dev.KafkaDevServicesDevModeTestCase Expecting size of: [] to be greater than or equal to 2 but was 0 within 10 seconds. - org.awaitility.core.ConditionTimeoutException
org.awaitility.core.ConditionTimeoutException: 
Assertion condition defined as a Lambda expression in io.quarkus.smallrye.reactivemessaging.kafka.deployment.dev.KafkaDevServicesDevModeTestCase 
Expecting size of:
  []
to be greater than or equal to 2 but was 0 within 10 seconds.
	at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:167)
	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119)
	at org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31)

@geoand
Copy link
Contributor

geoand commented Jun 17, 2024

@Karm I guess you overcame the native image test you were having :)

Copy link
Contributor

@zakkak zakkak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT

@Karm
Copy link
Member Author

Karm commented Jun 17, 2024

@Karm I guess you overcame the native image test you were having :)

Well, I wish I had. I used the deprecated API LambdaClient, which feels suboptimal for a new test...

@geoand
Copy link
Contributor

geoand commented Jun 17, 2024

Sure yeah, but we can figure it out later

@Karm
Copy link
Member Author

Karm commented Jul 8, 2024

Hello, are there any action items for me?

@zakkak zakkak merged commit 57e1bd6 into quarkusio:main Jul 10, 2024
52 checks passed
@quarkus-bot quarkus-bot bot added this to the 3.13 - main milestone Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core area/infra-automation anything related to CI, bots, etc. that are used to automated our infrastructure kind/bugfix triage/flaky-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remote Container build does not copy all build artifacts
4 participants