Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: FileSystems have a high risk of leaking resources due to the repeated invocation of FileSystems.setDefaultPipelineOptions #26321

Closed
1 of 15 tasks
mosche opened this issue Apr 18, 2023 · 0 comments · Fixed by #26694

Comments

@mosche
Copy link
Member

mosche commented Apr 18, 2023

What happened?

The underlying issue here is the repeated initialization of filesystems triggered in SerializablePipelineOptions (#18430).
Every call to FileSystems.setDefaultPipelineOptions() will silently replace any existing previous file systems without cleaning up resources.

In the case of S3FileSystem this can leak threads if the file system was already in use.
The simple test below shows the every increasing number of threads in the system.

public class S3FileSystemTest {
  

  @Test
  public void testSetDefaultPipelineOptions() throws Exception {
    S3Options s3Options = s3Options();
    s3Options.setAwsCredentialsProvider(StaticCredentialsProvider.create(AwsBasicCredentials.create("key", "secret")));

    for (int i = 0; i < 1000; i++) {
      FileSystems.setDefaultPipelineOptions(s3Options);
      FileSystems.match(ImmutableList.of("s3://bucket/key*"));
      System.out.println("Number of threads " + Thread.activeCount());
    }
  }
}  

This is particularly bad with some runners, e.g. see #17645:

A normal Go Flink ValidatesRunner execution would contain the logging line from this class nearly 80,000 times, making it much more difficult to read the logs.

Issue Priority

Priority: 3 (minor)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant