ls: cannot open directory '...': Transport endpoint is not connected #630

tchaton · 2023-11-24T12:57:43Z

Mountpoint for Amazon S3 version

1.1.1 with caching

AWS Region

us-east-1

Describe the running environment

Running on Amazon EC2

What happened?

This is happening quite frequentally ~ 7/10 for us in our filesystem tests.

ls: cannot open directory `....`: Transport endpoint is not connected

Relevant log output

The only log line I can see is the following.

«2023-11-24T13:46:44.170754913Z 2023-11-24T13:46:44.170582Z  WARN lookup{req=44 ino=1 name="Uploads"}: mountpoint_s3::fuse: lookup failed: inode error: file does not exist
¾2023-11-24T13:46:44.458244689Z 2023-11-24T13:46:44.458094Z  WARN lookup{req=46 ino=2 name="01hg0s363ta4kkvwyhcgvk83zc"}: mountpoint_s3::fuse: lookup failed: inode error: file does not exist
¾2023-11-24T13:46:51.310283712Z 2023-11-24T13:46:51.310113Z  WARN readdirplus{req=52 ino=1 fh=2 offset=1}: mountpoint_s3::fuse: readdirplus failed: out-of-order readdir, expected=4, actual=1

cc @dannycjones @passaro

tchaton · 2023-11-24T14:48:36Z

Addtitionally, we are observing a CPU spike every minute with --enable-metadata-caching --metadata-cache-ttl 60. I was hoping the listing would be lazy e.g if the users don't list or interact with the mount, no listing is done.

passaro · 2023-11-27T15:21:53Z

Hi @tchaton, thanks for raising the issue. I see you were using a custom build of 1.1.1 with caching. Have you since upgraded to 1.2.0? Note that the flags to configure caching are different from the pre-release version. Once you upgrade, could you report if you are still observing the issue on 1.2.0?

Are you able to share more details on the workload you ran before seeing the error on ls? Do you get similar errors when running other commands? Or just ls? Is the mount-s3 process still running when the error occurs?

EDIT: for help with the new configuration flags, see this section in the docs.

passaro · 2023-11-27T15:32:25Z

About the CPU spikes: Mountpoint does not proactively refresh metadata when it expires. So it should behave just as you were expecting. I suspect that the activity you are observing is due to applications accessing the filesystem and the kernel in turn requesting updated metadata from Mountpoint.

tchaton · 2023-11-30T09:46:40Z

Hey @passaro Let me update and give you more feedbacks.

tchaton · 2023-11-30T19:42:05Z

@passaro But if you want to see some failures, you can do something like this.

Create 1 bucket with 1M files with random sizes ranging from 100kb to 10GB.

And copy all the files from the mount to another bucket while trying to maximize the CPU usage of the machine to 100%( I am using a machine with 32 or 64 CPU cores).

docker run --rm -v ~/.aws:/root/.aws -v /{mount_to_bucket_1}/:/data/ peakcom/s5cmd --numworkers {2 * cpu_cores} cp /data/ s3://bucket_2

This always fails for me. However, other open source solutions are more reliable under that same stress.

passaro · 2023-12-05T18:16:19Z

@tchaton, unfortunately, I was not able to reproduce the issue with the command you suggested. It may depend on specific factors like the content of your bucket or the load on your instance.

However, my (unconfirmed) suspicion is that you are seeing the result of an out of memory issue, similar to that reported in #502.
Would you be able to verify if your syslog contains lines similar to these (once you reproduce the Transport endpoint is not connected error):

kernel: Out of memory: Killed process 2684 (mount-s3)
systemd[1]: session-1.scope: A process of this unit has been killed by the OOM killer. 
systemd[1]: session-1.scope: Killing process 3172 (docker) with signal SIGKILL.

tchaton · 2023-12-13T13:46:09Z

Hey @passaro I will try again. For the syslog, what do you mean exactly ? How can check them ?

passaro · 2023-12-13T14:34:42Z

You can probably use journalctl. For example, the lines I copied above were extracted from the output of this command:

journalctl -t systemd -t kernel

journalctl should be available on most modern Linux distributions, including Amazon Linux. On other systems, syslog entries are likely written to a file such as /var/log/syslog.

nguyenminhdungpg · 2024-05-07T14:58:03Z

I also encountered this error when using s3fs and now mountpoint-s3.

I am applying a solution that I described in this comment: s3fs-fuse/s3fs-fuse#2356 (comment)

unexge · 2024-10-15T15:50:37Z

Mountpoint v1.10.0 has been released with some prefetcher improvements and might reduce memory usage. Could you please try upgrading to see if it provides any improvements for you?

jmccl · 2024-10-27T17:40:45Z

I'm getting the same issue and just tried v.1.10.0. It doesn't appear to have helped.

(I can reproduce by copying a file - using 'cp' - from a mounted S3 bucket to the local filesystem where the file is about the same size as the free available RAM on the system. At some point part way through the copy it fails and I get "Transport endpoint is not connected" displayed as the error.)

The 'temporary workaround' in #1021 does address the issue.

vladem · 2024-10-28T18:35:36Z

Hey, @jmccl! Thanks for taking time to report an issue that you're facing in connection with memory usage on version 1.10.0, we're particularly interested in this. It seems that you've found a workaround that is suitable for your use case, but please note, that this approach is not stable, meaning that extra caution should be exercised when updating Mountpoint on workloads that use it.

In case, your problem isn't solved or you'll be interested to help us improve the memory limiting in Mountpoint, consider opening a new bug report. It would be helpful if you could describe in more detail the environment where you're facing the problem:

is Mountpoint running in a container or not?
1. does container have a memory limit configured?
are there any signs of Mountpoint getting killed by OOM?
1. kernel message buffer may contain relevant information, which may be checked with dmesg -T | egrep -i 'Out of memory'
does an Transport endpoint is not connected error occur while the file is being read or on some other action, e.g. listing a directory or writing to a file?
is it possible for other workloads on your host to use more than 5% of host's installed RAM?
relevant metrics would be useful (may be obtained by usage of --debug --log-directory <dir> CLI flags):
1. process.memory_usage
2. prefetch.bytes_in_queue
3. prefetch.bytes_reserved

vladem · 2024-10-28T18:37:06Z

Closing this issue since there is no activity on the original problem from 2023. We suspect that the crash was occurring because of Mountpoint getting killed by OOM killer.

Starting from version 1.10.0 Mountpoint will target to use no more than 95% of the installed memory on the host, which may solve the problem in some cases.

tchaton added the bug Something isn't working label Nov 24, 2023

daltschu22 mentioned this issue Dec 11, 2023

Memory usage grows but never shrinks when running find command #674

Open

dannycjones mentioned this issue Sep 18, 2024

Add temporary way to configure amount of data prefetched per file handle #1021

Merged

vladem closed this as completed Oct 28, 2024

jmccl mentioned this issue Oct 28, 2024

Error reading file from S3 bucket with "Transport endpoint is not connected" #1089

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ls: cannot open directory '...': Transport endpoint is not connected #630

ls: cannot open directory '...': Transport endpoint is not connected #630

tchaton commented Nov 24, 2023 •

edited

Loading

tchaton commented Nov 24, 2023

passaro commented Nov 27, 2023 •

edited

Loading

passaro commented Nov 27, 2023

tchaton commented Nov 30, 2023

tchaton commented Nov 30, 2023 •

edited

Loading

passaro commented Dec 5, 2023

tchaton commented Dec 13, 2023

passaro commented Dec 13, 2023

nguyenminhdungpg commented May 7, 2024

unexge commented Oct 15, 2024

jmccl commented Oct 27, 2024 •

edited

Loading

vladem commented Oct 28, 2024 •

edited

Loading

vladem commented Oct 28, 2024

ls: cannot open directory '...': Transport endpoint is not connected #630

ls: cannot open directory '...': Transport endpoint is not connected #630

Comments

tchaton commented Nov 24, 2023 • edited Loading

Mountpoint for Amazon S3 version

AWS Region

Describe the running environment

What happened?

Relevant log output

tchaton commented Nov 24, 2023

passaro commented Nov 27, 2023 • edited Loading

passaro commented Nov 27, 2023

tchaton commented Nov 30, 2023

tchaton commented Nov 30, 2023 • edited Loading

passaro commented Dec 5, 2023

tchaton commented Dec 13, 2023

passaro commented Dec 13, 2023

nguyenminhdungpg commented May 7, 2024

unexge commented Oct 15, 2024

jmccl commented Oct 27, 2024 • edited Loading

vladem commented Oct 28, 2024 • edited Loading

vladem commented Oct 28, 2024

tchaton commented Nov 24, 2023 •

edited

Loading

passaro commented Nov 27, 2023 •

edited

Loading

tchaton commented Nov 30, 2023 •

edited

Loading

jmccl commented Oct 27, 2024 •

edited

Loading

vladem commented Oct 28, 2024 •

edited

Loading