-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deleting a file during post-processing messes up the containing folder #7909
Comments
Maybe the main problem is that "context canceled" that shows up in the logs. Assuming the context is attached to the ocdav request, I'm not sure what is the consequence of canceling that context. Probably the request itself keeps going (unless there is a thread checking the "done" channel of the context, but I haven't seen any), but further request from the ocdav service to a different service using the same context might be blocked (because the context is canceled). This will likely cause problems because some operations among the services won't take place. As for why the context is canceled, my guess is that the connection gets cut. Assuming the ocdav service makes some requests (using that context) in a different thread, it might happen that the request itself finishes but the secondary thread making the asynchronous request is still running. The client would cut the connection after getting the server response, which could cancel / done the context, so this async request in the secondary thread might get on trouble due to the canceled context. The other thing I've noticed is that the postprocessing service is mono-threaded (at least the processing part). This means that the service itself will process the events one by one, so if one event takes a lot of time to be processed for whatever reason, all the events after it could be severely delayed. |
From another perspective, the errors seems to indicate that the The other more plausible explanation is that the target node doesn't exists and we fallback to the space root |
I could observe a similar think when trying to debug https://drone.owncloud.com/owncloud/ocis/30185/27/7 It seems I have a dangling symlink: ls -l ~/.ocis/storage/users/spaces/so/me-admin-user-id-0000-000000000000/nodes/so/me/-a/dm/in-user-id-0000-000000000000*
-rw------- 1 jfd jfd 0 24. Nov 16:32 /home/jfd/.ocis/storage/users/spaces/so/me-admin-user-id-0000-000000000000/nodes/so/me/-a/dm/in-user-id-0000-000000000000.mlock
-rw------- 1 jfd jfd 376 18. Dez 15:59 /home/jfd/.ocis/storage/users/spaces/so/me-admin-user-id-0000-000000000000/nodes/so/me/-a/dm/in-user-id-0000-000000000000.mpk
/home/jfd/.ocis/storage/users/spaces/so/me-admin-user-id-0000-000000000000/nodes/so/me/-a/dm/in-user-id-0000-000000000000:
insgesamt 0
lrwxrwxrwx 1 jfd jfd 55 6. Dez 20:27 'Gregorz Rutkowski' -> ../../../../../66/46/ef/54/-420a-4481-ac07-42e3506d950e
lrwxrwxrwx 1 jfd jfd 55 18. Dez 15:58 'New file (1).txt' -> ../../../../../96/8e/bf/ad/-3a2e-4a47-b46a-ca7f2cd44dc9
lrwxrwxrwx 1 jfd jfd 55 18. Dez 15:48 'New file.txt' -> ../../../../../38/4d/8d/9f/-80e6-4e5c-b8d0-0eab4682dcac
produced by the decomposedfs node.go with attrs, err := n.Xattrs(ctx)
switch {
case metadata.IsNotExist(err):
return n, nil // swallow not found, the node defaults to exists = false
case err != nil:
return nil, err
}
n.Exists = true
n.Name = attrs.String(prefixes.NameAttr)
n.ParentID = attrs.String(prefixes.ParentidAttr)
if n.ParentID == "" {
d, _ := os.ReadFile(lu.MetadataBackend().MetadataPath(n.InternalPath()))
if _, ok := lu.MetadataBackend().(metadata.MessagePackBackend); ok {
appctx.GetLogger(ctx).Error().Str("path", n.InternalPath()).Str("nodeid", n.ID).Interface("attrs", attrs).Bytes("messagepack", d).Msg("missing parent id")
}
return nil, errtypes.InternalError("Missing parent ID on node")
} It was trying to read the metadata for non existing node in a directory ... I'm still debugging other postrocessing stuff. Feels related. I will keep an eye on this. |
https://github.com/cs3org/reva/blob/edge/pkg/storage/utils/decomposedfs/upload/processing.go#L327 seems suspicious. As far as I know, the "acquireLock" parameter should be always true, otherwise there could be problems with race conditions trying to write extended attributes in the file if there are multiple threads / processes competing to write in the same file
A haven't found conclusive evidence on whether the extended attributes are set atomically or not. If the operations aren't atomic, race conditions might happen when we set the extended attributes from multiple sides, which could cause problems with what is written. |
@individual-it Because the delay is so long in the test case The error |
So is this an unrealistic test-case? |
Sounds to me unrealistic. But we still need to fix an error that appeared if we put the file and delete it immediately. The postprocessing has to handle it properly. |
Here is a draft PR. @butonic @kobergj |
Describe the bug
When a file gets deleted during the post-processing the whole folder can become unusable
Steps to reproduce
Expected behavior
ff2.txt and ff3.txt should be getting out of post-processing eventually
Actual behavior
ff2.txt and ff3.txt never finish post-processing
Setup
Please describe how you started the server and provide a list of relevant environment variables or configuration files.
POSTPROCESSING_DELAY=30s PROXY_ENABLE_BASIC_AUTH=true OCIS_INSECURE=true PROXY_HTTP_ADDR=0.0.0.0:9200 OCIS_URL=https://host.docker.internal:9200 PROXY_TRANSPORT_TLS_KEY=./ocis.pem PROXY_TRANSPORT_TLS_CERT=./ocis.crt ./ocis/bin/ocis server
Additional context
logs:
The text was updated successfully, but these errors were encountered: