From d8224d5eda3365de736905e4085957d8c9b10696 Mon Sep 17 00:00:00 2001 From: Benedikt Kulmann Date: Wed, 22 Jul 2020 13:29:20 +0200 Subject: [PATCH 1/3] Fix typos in storage doc --- docs/storages.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/storages.md b/docs/storages.md index e7821b8..aecb60d 100644 --- a/docs/storages.md +++ b/docs/storages.md @@ -13,7 +13,7 @@ geekdocFilePath: storages.md ## Storage providers -To manage the file tree ocis uses reva *storage providers* that are accessing the underlying storage using a *storage driver*. The driver can be used to change the implementation of a storage aspect to better reflect the actual underlying storage capabilities. As an example a move operation on a POSIX filesystem ([theoretically](https://danluu.com/deconstruct-files/)) is an atomic operation. When trying to implement a file tree on top S3 there is no native move operation that can be used. A naive implementation might fall bak on a COPY and DELETE. Some S3 implementations provide a COPY operation that uses an existing key as the source, so the file at least does not need to be reuploaded. In the worst case scenario, the rename of a folder with hundreds of thousands of objects, a reupload for every file has to be made. Instead of hiding this complexity a better choice might be to disable renaming of files or at least folders on S3. There are however implementations of filesystems on top of S3 that store the tree metadata in dedicated objects or use a completely different persistance mechanism like a distributed key value store to implement the file tree aspect of a storage. +To manage the file tree ocis uses reva *storage providers* that are accessing the underlying storage using a *storage driver*. The driver can be used to change the implementation of a storage aspect to better reflect the actual underlying storage capabilities. As an example a move operation on a POSIX filesystem ([theoretically](https://danluu.com/deconstruct-files/)) is an atomic operation. When trying to implement a file tree on top of S3 there is no native move operation that can be used. A naive implementation might fall back on a COPY and DELETE. Some S3 implementations provide a COPY operation that uses an existing key as the source, so the file at least does not need to be reuploaded. In the worst case scenario, to rename of a folder with hundreds of thousands of objects, a reupload for every file has to be made. Instead of hiding this complexity a better choice might be to disable renaming of files or at least folders on S3. There are however implementations of filesystems on top of S3 that store the tree metadata in dedicated objects or use a completely different persistence mechanism like a distributed key value store to implement the file tree aspect of a storage. {{< hint info >}} @@ -45,7 +45,7 @@ The tree can keep track of how many bytes are stored in a folder. Similar to ETa {{< hint info >}} **ETag and Size propagation** -When propagating the ETag (mtime) and size changes up the tree the question is where to stop. If all changes need to be propagated to the root of a storage then the root or busy folders will become a hotspot. There are two things to keep in mind: 1. propagation only happens until the root of a single space (a user private drive or a single group drive), 2. no cross storage propagation. The latter was used in oc10 to let clients detect when a file in a received shared folder changed. This functionality is moving to the storage registry which caches the ETag for every root so clients can discover if and which storage changed. +When propagating the ETag (mtime) and size changes up the tree the question is where to stop. If all changes need to be propagated to the root of a storage then the root or busy folders will become a hotspot. There are two things to keep in mind: 1. propagation only happens up to the root of a single space (a user private drive or a single group drive), 2. no cross storage propagation. The latter was used in oc10 to let clients detect when a file in a received shared folder changed. This functionality is moving to the storage registry which caches the ETag for every root so clients can discover if and which storage changed. {{< /hint >}} #### Rename @@ -60,7 +60,7 @@ Technically, [S3 has no rename operation at all](https://docs.aws.amazon.com/sdk In addition to well known metadata like name size and mtime, users might be able to add arbitrary metadata like tags, comments or [dublin core](https://en.wikipedia.org/wiki/Dublin_Core). In POSIX filesystems this maps to extended attributes. ### Grant persistence -The CS3 API uses grants to describe access permissions. Storage systems have a wide range of permissions granularity and not all grants may be supported by every storage driver. POSIX ACLs for example have no expiry. If the storage system does not support certain grant properties, eg. expiry, then the storage driver may choose to implement them in a different way. Expiries could be persisted in a different way and checked periodically to remove the grants. Again: every decision is a tradeoff. +The CS3 API uses grants to describe access permissions. Storage systems have a wide range of permissions granularity and not all grants may be supported by every storage driver. POSIX ACLs for example have no expiry. If the storage system does not support certain grant properties, e.g. expiry, then the storage driver may choose to implement them in a different way. Expiries could be persisted in a different way and checked periodically to remove the grants. Again: every decision is a tradeoff. ### Trash persistence After deleting a node the storage allows listing the deleted nodes and has an undo mechanism for them. @@ -70,7 +70,7 @@ A user can restore a previous version of a file. {{< hint info >}} **Snapshots are not versions** -Modern POSIX filesystems support snapshotting of volumes. This is different from keeping track of versions to a file or folder, but might be another implementation strategy for a storage driver allow users to restore content. +Modern POSIX filesystems support snapshotting of volumes. This is different from keeping track of versions to a file or folder, but might be another implementation strategy for a storage driver to allow users to restore content. {{< /hint >}} ### Activity History @@ -90,7 +90,7 @@ The *minimal* storage driver for a POSIX based filesystem. It literally supports - no native ETag propagation, five options are available: - built in propagation (changes bypassing ocis are not picked up until a rescan) - built in inotify (requires 48 bytes of RAM per file, needs to keep track of every file and folder) - - external inotify (same RAM requirement, but could be triggered by external tools, eg. a workflow engine) + - external inotify (same RAM requirement, but could be triggered by external tools, e.g. a workflow engine) - kernel audit log (use the linux kernel audit to capture file events on the storage and offload them to a queue) - fuse filesystem overlay - no subtree accounting, same options as for ETag propagation @@ -188,4 +188,4 @@ We are planning to further separate the concerns and use a local storage provide It would allow us to extend the local storage driver with missing storage aspects while keeping a tree like filesystem that end users are used to see when sshing into the machine. ### Upload to Quarantine area -Antivirus scanning of random files uploaded from untrusted sources and executing metadata extraction or thumbnail generation should happen in a sandboxed system to prevent malicious users from gaining any information about the system. By spawning a new container with access to only the uploaded data we can further limit the attack surface. \ No newline at end of file +Antivirus scanning of random files uploaded from untrusted sources and executing metadata extraction or thumbnail generation should happen in a sandboxed system to prevent malicious users from gaining any information about the system. By spawning a new container with access to only the uploaded data we can further limit the attack surface. From 71647fdf1e67394d50cc4a098f33838e3fea0a9a Mon Sep 17 00:00:00 2001 From: Benedikt Kulmann Date: Wed, 22 Jul 2020 13:32:46 +0200 Subject: [PATCH 2/3] Add changelog --- changelog/unreleased/update-docs.md | 6 ++++++ 1 file changed, 6 insertions(+) create mode 100644 changelog/unreleased/update-docs.md diff --git a/changelog/unreleased/update-docs.md b/changelog/unreleased/update-docs.md new file mode 100644 index 0000000..9e01c40 --- /dev/null +++ b/changelog/unreleased/update-docs.md @@ -0,0 +1,6 @@ +Enhancement: Update storage documentation + +We added details to the documentation about storage requirements known from ownCloud 10, the local storage driver and the ownCloud storage driver. + +https://github.com/owncloud/ocis-reva/pull/384 +https://github.com/owncloud/ocis-reva/pull/390 From 79068fe85f82e02581141fc70750b518c2c1d6c7 Mon Sep 17 00:00:00 2001 From: Benedikt Kulmann Date: Thu, 23 Jul 2020 08:27:00 +0200 Subject: [PATCH 3/3] Fix one more wording issue --- docs/storages.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/storages.md b/docs/storages.md index aecb60d..bc465e4 100644 --- a/docs/storages.md +++ b/docs/storages.md @@ -13,7 +13,7 @@ geekdocFilePath: storages.md ## Storage providers -To manage the file tree ocis uses reva *storage providers* that are accessing the underlying storage using a *storage driver*. The driver can be used to change the implementation of a storage aspect to better reflect the actual underlying storage capabilities. As an example a move operation on a POSIX filesystem ([theoretically](https://danluu.com/deconstruct-files/)) is an atomic operation. When trying to implement a file tree on top of S3 there is no native move operation that can be used. A naive implementation might fall back on a COPY and DELETE. Some S3 implementations provide a COPY operation that uses an existing key as the source, so the file at least does not need to be reuploaded. In the worst case scenario, to rename of a folder with hundreds of thousands of objects, a reupload for every file has to be made. Instead of hiding this complexity a better choice might be to disable renaming of files or at least folders on S3. There are however implementations of filesystems on top of S3 that store the tree metadata in dedicated objects or use a completely different persistence mechanism like a distributed key value store to implement the file tree aspect of a storage. +To manage the file tree ocis uses reva *storage providers* that are accessing the underlying storage using a *storage driver*. The driver can be used to change the implementation of a storage aspect to better reflect the actual underlying storage capabilities. As an example a move operation on a POSIX filesystem ([theoretically](https://danluu.com/deconstruct-files/)) is an atomic operation. When trying to implement a file tree on top of S3 there is no native move operation that can be used. A naive implementation might fall back on a COPY and DELETE. Some S3 implementations provide a COPY operation that uses an existing key as the source, so the file at least does not need to be reuploaded. In the worst case scenario, which is renaming a folder with hundreds of thousands of objects, a reupload for every file has to be made. Instead of hiding this complexity a better choice might be to disable renaming of files or at least folders on S3. There are however implementations of filesystems on top of S3 that store the tree metadata in dedicated objects or use a completely different persistence mechanism like a distributed key value store to implement the file tree aspect of a storage. {{< hint info >}}