-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Teleport consuming all disk space with multipart files in /tmp directory #3182
Comments
@keitharogers Could you please share your Teleport config file (with any tokens etc redacted)? |
Sure @webvictim , here is my current running config:
In an effort to rectify the issue, I had added the S3 / DynamoDB config. My previous running config (which has exactly the same issue as described) was:
|
Ummm, maybe this a bug prior to sending data to DynamoDB / S3. |
@benarent : This behaviour (as mentioned) was occurring both before and after switching to utilising DynamoDB / S3. I was trying to fix the bug by utilising store in AWS instead of locally but it made no difference. Any ideas on what I can do to fix this? |
I've not seen this problem before - Teleport never even writes any files to Any thoughts @klizhentas? |
Yep, it was definitely happening before switching to S3 @webvictim . |
I'm actually seeing the exact same problem on the auth server:
Environment:
upon deleting all of the I am not seeing this on the proxy server, and thankfully not on the target boxes.
and it's definitely teleport doing it. using a watch command on |
something interesting I found, at least in my setup, the auth server was being sent screen recordings multiple times, even within the same minute. for massive recordings (hours) this is extremely wasteful, with disk space, cpu cycles and network usage. I have added s3 as a storage service, and am waiting for the auth server to redirect everything there in hopes that this slows down the traffic to the auth server, on top of that I believe after a fair amount of investigation that the files will clean themselves up if the upload process completes but as hours of screen records can be GBs this can take some time to resolve itself. continued digging, found that the proxy server was throwing an error trying to upload the file:
as of now I'm continuing on the assumption that the proxy server's upload was successful even though it was "timing out" and then continues to blast the auth server with more upload requests which continues to DoS the auth server from itself. the assumption was correct, I tripled most of the default timeouts and was able to get the file across the network before the timeout occurred and the proxy reported it as sent, deleted the file and the auth server is now not getting filled with the same files 👍. Might want to add a limit to the amount of tries to attempting to upload a file to the auth server to prevent this from happening. |
This is helpful. Adding backoff and a better warning message is something we can add in the context of this issue. |
Hey there, so is there a solution to this problem? is it safe to delete these files? |
I would also like to know if there is a solution to this, I'm still sat with no available space after 2 months and this thread seems to have died. |
I notice that we fixed a similar issue back with the release of Teleport 2.7.5, as tracked in #2250 - I wonder if this could be something similar again. The problem here is that several of us internally have tried to reproduce this issue but haven't had any success. If anyone on the thread is able to get into a situation where this issue is guaranteed to occur and provide detailed repro steps so we can make it happen too, we'll be able to get to it more quickly. |
I just ended up adding a cronjob to auto-delete any multipart file older than six hours. So far I haven't experienced any side effects but I'm not sure yet if this might cause any logs to be dropped.
|
So I just looked at my Dynamo metrics and I noticed that I was going over my read capacity. I also went a bit over my write capacity as well. I'm not sure if this would be a contributing factor to the multipart file situation but I have increased the capacities in the meantime. Maybe you also ran into this issue. |
This issue as originally raised by myself was present before even using DynamoDB, so it's not related to Dynamo. At least, not only related to usage of it. It is very annoying though. And FWIW, if you delete those files, they simply come back again later. |
With all that being said, I have deleted the files again and restarted Teleport and the problem doesn't seem to be reoccurring. I can only imagine that it was trying to do something based on old information which has since been purged from the SQLite DB or something. I honestly don't know... |
I can confirm that the issue still persists in teleport 4.2.2 - i've lost 90GB worth of space because of this. Eventually teleport can just crash your machine and if teleport is the only way into infrastructure - this becomes frustrating (considering that we have teleport enterprise and it costs shitload of money). |
We are refactoring session upload right now, will be released in 4.3. |
This was too big a change to get into 4.3, so it will be coming out with 4.4. |
What happened:
Teleport keeps populating the /tmp directory with large 'multipart-' files. If I delete these, Teleport restarts on a loop every 15-20 seconds and recreates the files one-by-one. Eventually it stays running after having recreated all these files.
What you expected to happen:
I don't expect this to consume all available space and I expect them to stay deleted.
How to reproduce it (as minimally and precisely as possible):
Environment:
teleport version
): Teleport v4.1.4 git:v4.1.4-0-gc487a75c go1.13.2tsh version
): Teleport v4.1.4 git:v4.1.4-0-gc487a75c go1.13.2Browser environment
Relevant Debug Logs If Applicable
N/A
The text was updated successfully, but these errors were encountered: