Skip to content
This repository has been archived by the owner on Jan 18, 2021. It is now read-only.

Fix doc typos #390

Merged
merged 3 commits into from
Jul 24, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions changelog/unreleased/update-docs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Enhancement: Update storage documentation

We added details to the documentation about storage requirements known from ownCloud 10, the local storage driver and the ownCloud storage driver.

https://github.com/owncloud/ocis-reva/pull/384
https://github.com/owncloud/ocis-reva/pull/390
12 changes: 6 additions & 6 deletions docs/storages.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ geekdocFilePath: storages.md

## Storage providers

To manage the file tree ocis uses reva *storage providers* that are accessing the underlying storage using a *storage driver*. The driver can be used to change the implementation of a storage aspect to better reflect the actual underlying storage capabilities. As an example a move operation on a POSIX filesystem ([theoretically](https://danluu.com/deconstruct-files/)) is an atomic operation. When trying to implement a file tree on top S3 there is no native move operation that can be used. A naive implementation might fall bak on a COPY and DELETE. Some S3 implementations provide a COPY operation that uses an existing key as the source, so the file at least does not need to be reuploaded. In the worst case scenario, the rename of a folder with hundreds of thousands of objects, a reupload for every file has to be made. Instead of hiding this complexity a better choice might be to disable renaming of files or at least folders on S3. There are however implementations of filesystems on top of S3 that store the tree metadata in dedicated objects or use a completely different persistance mechanism like a distributed key value store to implement the file tree aspect of a storage.
To manage the file tree ocis uses reva *storage providers* that are accessing the underlying storage using a *storage driver*. The driver can be used to change the implementation of a storage aspect to better reflect the actual underlying storage capabilities. As an example a move operation on a POSIX filesystem ([theoretically](https://danluu.com/deconstruct-files/)) is an atomic operation. When trying to implement a file tree on top of S3 there is no native move operation that can be used. A naive implementation might fall back on a COPY and DELETE. Some S3 implementations provide a COPY operation that uses an existing key as the source, so the file at least does not need to be reuploaded. In the worst case scenario, which is renaming a folder with hundreds of thousands of objects, a reupload for every file has to be made. Instead of hiding this complexity a better choice might be to disable renaming of files or at least folders on S3. There are however implementations of filesystems on top of S3 that store the tree metadata in dedicated objects or use a completely different persistence mechanism like a distributed key value store to implement the file tree aspect of a storage.


{{< hint info >}}
Expand Down Expand Up @@ -45,7 +45,7 @@ The tree can keep track of how many bytes are stored in a folder. Similar to ETa

{{< hint info >}}
**ETag and Size propagation**
When propagating the ETag (mtime) and size changes up the tree the question is where to stop. If all changes need to be propagated to the root of a storage then the root or busy folders will become a hotspot. There are two things to keep in mind: 1. propagation only happens until the root of a single space (a user private drive or a single group drive), 2. no cross storage propagation. The latter was used in oc10 to let clients detect when a file in a received shared folder changed. This functionality is moving to the storage registry which caches the ETag for every root so clients can discover if and which storage changed.
When propagating the ETag (mtime) and size changes up the tree the question is where to stop. If all changes need to be propagated to the root of a storage then the root or busy folders will become a hotspot. There are two things to keep in mind: 1. propagation only happens up to the root of a single space (a user private drive or a single group drive), 2. no cross storage propagation. The latter was used in oc10 to let clients detect when a file in a received shared folder changed. This functionality is moving to the storage registry which caches the ETag for every root so clients can discover if and which storage changed.
{{< /hint >}}

#### Rename
Expand All @@ -60,7 +60,7 @@ Technically, [S3 has no rename operation at all](https://docs.aws.amazon.com/sdk
In addition to well known metadata like name size and mtime, users might be able to add arbitrary metadata like tags, comments or [dublin core](https://en.wikipedia.org/wiki/Dublin_Core). In POSIX filesystems this maps to extended attributes.

### Grant persistence
The CS3 API uses grants to describe access permissions. Storage systems have a wide range of permissions granularity and not all grants may be supported by every storage driver. POSIX ACLs for example have no expiry. If the storage system does not support certain grant properties, eg. expiry, then the storage driver may choose to implement them in a different way. Expiries could be persisted in a different way and checked periodically to remove the grants. Again: every decision is a tradeoff.
The CS3 API uses grants to describe access permissions. Storage systems have a wide range of permissions granularity and not all grants may be supported by every storage driver. POSIX ACLs for example have no expiry. If the storage system does not support certain grant properties, e.g. expiry, then the storage driver may choose to implement them in a different way. Expiries could be persisted in a different way and checked periodically to remove the grants. Again: every decision is a tradeoff.

### Trash persistence
After deleting a node the storage allows listing the deleted nodes and has an undo mechanism for them.
Expand All @@ -70,7 +70,7 @@ A user can restore a previous version of a file.

{{< hint info >}}
**Snapshots are not versions**
Modern POSIX filesystems support snapshotting of volumes. This is different from keeping track of versions to a file or folder, but might be another implementation strategy for a storage driver allow users to restore content.
Modern POSIX filesystems support snapshotting of volumes. This is different from keeping track of versions to a file or folder, but might be another implementation strategy for a storage driver to allow users to restore content.
{{< /hint >}}

### Activity History
Expand All @@ -90,7 +90,7 @@ The *minimal* storage driver for a POSIX based filesystem. It literally supports
- no native ETag propagation, five options are available:
- built in propagation (changes bypassing ocis are not picked up until a rescan)
- built in inotify (requires 48 bytes of RAM per file, needs to keep track of every file and folder)
- external inotify (same RAM requirement, but could be triggered by external tools, eg. a workflow engine)
- external inotify (same RAM requirement, but could be triggered by external tools, e.g. a workflow engine)
- kernel audit log (use the linux kernel audit to capture file events on the storage and offload them to a queue)
- fuse filesystem overlay
- no subtree accounting, same options as for ETag propagation
Expand Down Expand Up @@ -188,4 +188,4 @@ We are planning to further separate the concerns and use a local storage provide
It would allow us to extend the local storage driver with missing storage aspects while keeping a tree like filesystem that end users are used to see when sshing into the machine.

### Upload to Quarantine area
Antivirus scanning of random files uploaded from untrusted sources and executing metadata extraction or thumbnail generation should happen in a sandboxed system to prevent malicious users from gaining any information about the system. By spawning a new container with access to only the uploaded data we can further limit the attack surface.
Antivirus scanning of random files uploaded from untrusted sources and executing metadata extraction or thumbnail generation should happen in a sandboxed system to prevent malicious users from gaining any information about the system. By spawning a new container with access to only the uploaded data we can further limit the attack surface.