From 4a59c8c0961790a8b34637d1434f269cca9e9193 Mon Sep 17 00:00:00 2001 From: Michael Barz Date: Fri, 4 Feb 2022 15:55:39 +0100 Subject: [PATCH 1/4] add first draft of metadata storage options --- docs/ocis/adr/0016-files-metadata.md | 72 ++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) create mode 100644 docs/ocis/adr/0016-files-metadata.md diff --git a/docs/ocis/adr/0016-files-metadata.md b/docs/ocis/adr/0016-files-metadata.md new file mode 100644 index 00000000000..a5085840c49 --- /dev/null +++ b/docs/ocis/adr/0016-files-metadata.md @@ -0,0 +1,72 @@ +--- +title: "16. Storage for Files Metadata" +--- + +* Status: proposed +* Deciders: @butonic, @dragotin, @micbar, @C0rby +* Date: 2022-02-04 + +## Context and Problem Statement + +In addition to the file content we need to store metadata which is attached to a file. Metadata describes additional properties of a file. These properties need to be stored as close as possible to the file content to avoid inconsistencies. Metadata are key to workflows and search. We consider them as an additional value which enhances the file content. + +## Decision Drivers + +* Metadata will become more important in the future +* Metadata are key to automated data processing +* Metadata storage should be as close as possible to the file content +* Metadata should be always in sync with the file content + +## Considered Options + +* Database +* Extended file attributes +* Metadata file next to the file content + +## Decision Outcome + +Chosen option: "Extended File Attributes", because we guarantee the consistency of data and have arbitrary simple storage mechanism. + +### Positive Consequences + +* Metadata is always attached to the file itself +* We can store arbitrary key/values +* No external dependencies are needed + +### Negative consequences + +* The storage inside extended file attributes has limits +* Changes to extended attributes are not atomic + +## Pros and Cons of the Options + +### Database or Key-Value Store + +Use a Database or an external key/value store to persist metadata. + +* Good, because it scales well +* Good, because databases provide efficient lookup mechanisms +* Bad, because the file content and the metadata could run out of sync + +### Extended File Attributes + +Extended File Attributes allow to store arbitrary properties. There are 4 namespaces `user`, `system`, `trusted` and `security`. We can safely use the `user` namespace. An example attribute name would be `user.ocis.owner.id`. The linux kernel has length limits on attribute names and values. + +From Wikipedia on [Extended file attributes](https://en.wikipedia.org/wiki/Extended_file_attributes#Linux): + +> The Linux kernel allows extended attribute to have names of up to 255 bytes and values of up to 64 KiB,[14] as do XFS and ReiserFS, but ext2/3/4 and btrfs impose much smaller limits, requiring all the attributes (names and values) of one file to fit in one “filesystem block” (usually 4 KiB). Per POSIX.1e,[citation needed] the names are required to start with one of security, system, trusted, and user plus a period. This defines the four namespaces of extended attributes. + +* Good, because metadata is stored in the filesystem +* Good, because consistency is easy to maintain +* Good, because the data is attached to the file and survives file operations like copy and move +* Bad, because we could hit the filesystem limit +* Bad, because changes to extended attributes are not atomic + +### Metadata File + +We could store metadata in a metadata file next to the file content which has a structured content format like .json, .yaml or .toml. That would give us more space to store bigger amounts of metadata. + +* Good, because there are no size limits +* Good, because there is more freedom to the content format +* Bad, because it doubles the amount of read / write operations +* Bad, because it needs additional measures against concurrent overwriting changes From a6299697e33ece18b0e43a71b6aaba23ed485bf8 Mon Sep 17 00:00:00 2001 From: Michael Barz Date: Mon, 7 Feb 2022 09:10:30 +0100 Subject: [PATCH 2/4] Add 4th option MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Jörn Friedrich Dreyer --- docs/ocis/adr/0016-files-metadata.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/docs/ocis/adr/0016-files-metadata.md b/docs/ocis/adr/0016-files-metadata.md index a5085840c49..56dc2eec1a6 100644 --- a/docs/ocis/adr/0016-files-metadata.md +++ b/docs/ocis/adr/0016-files-metadata.md @@ -36,7 +36,7 @@ Chosen option: "Extended File Attributes", because we guarantee the consistency ### Negative consequences * The storage inside extended file attributes has limits -* Changes to extended attributes are not atomic +* Changes to extended attributes are not atomic and need file locks ## Pros and Cons of the Options @@ -70,3 +70,11 @@ We could store metadata in a metadata file next to the file content which has a * Good, because there is more freedom to the content format * Bad, because it doubles the amount of read / write operations * Bad, because it needs additional measures against concurrent overwriting changes + +### Link metadata with an id in the extended attributes + +To link metadata to file content a single extended attribute with a file id (unique per storage space) is sufficient. This would also allow putting metadata in better suited storage systems like SQLite or a key value store. + +* Good, because it avoids extended attribute limits +* Good, because the same mechanism could be used to look up files by id, when the underlying filesystem is an existing POSIX filesystem. +* Bad, because backup needs to cover the metadata as well. Could be mitigated by sharing metadata per space and doing space wide snapshots. From 8fad5f173b580b438370eb97fc74846202a7e87f Mon Sep 17 00:00:00 2001 From: Michael Barz Date: Mon, 7 Feb 2022 09:15:56 +0100 Subject: [PATCH 3/4] add backup bullet points --- docs/ocis/adr/0016-files-metadata.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/docs/ocis/adr/0016-files-metadata.md b/docs/ocis/adr/0016-files-metadata.md index 56dc2eec1a6..87c28c15cc2 100644 --- a/docs/ocis/adr/0016-files-metadata.md +++ b/docs/ocis/adr/0016-files-metadata.md @@ -10,7 +10,7 @@ title: "16. Storage for Files Metadata" In addition to the file content we need to store metadata which is attached to a file. Metadata describes additional properties of a file. These properties need to be stored as close as possible to the file content to avoid inconsistencies. Metadata are key to workflows and search. We consider them as an additional value which enhances the file content. -## Decision Drivers +## Decision Drivers * Metadata will become more important in the future * Metadata are key to automated data processing @@ -22,6 +22,7 @@ In addition to the file content we need to store metadata which is attached to a * Database * Extended file attributes * Metadata file next to the file content +* Linked metadata in separate file ## Decision Outcome @@ -47,6 +48,7 @@ Use a Database or an external key/value store to persist metadata. * Good, because it scales well * Good, because databases provide efficient lookup mechanisms * Bad, because the file content and the metadata could run out of sync +* Bad, because a storage backup doesn't cover the file metadata ### Extended File Attributes @@ -59,6 +61,7 @@ From Wikipedia on [Extended file attributes](https://en.wikipedia.org/wiki/Exten * Good, because metadata is stored in the filesystem * Good, because consistency is easy to maintain * Good, because the data is attached to the file and survives file operations like copy and move +* Good, because a storage backup also covers the file metadata * Bad, because we could hit the filesystem limit * Bad, because changes to extended attributes are not atomic @@ -68,6 +71,7 @@ We could store metadata in a metadata file next to the file content which has a * Good, because there are no size limits * Good, because there is more freedom to the content format +* Good, because a storage backup also covers the file metadata * Bad, because it doubles the amount of read / write operations * Bad, because it needs additional measures against concurrent overwriting changes From 395ed935d0840eb791f99d62fea335e44f039b79 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=B6rn=20Friedrich=20Dreyer?= Date: Mon, 7 Feb 2022 10:48:48 +0100 Subject: [PATCH 4/4] Update docs/ocis/adr/0016-files-metadata.md Co-authored-by: Klaas Freitag --- docs/ocis/adr/0016-files-metadata.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/ocis/adr/0016-files-metadata.md b/docs/ocis/adr/0016-files-metadata.md index 87c28c15cc2..b9a3089aa48 100644 --- a/docs/ocis/adr/0016-files-metadata.md +++ b/docs/ocis/adr/0016-files-metadata.md @@ -82,3 +82,4 @@ To link metadata to file content a single extended attribute with a file id (uni * Good, because it avoids extended attribute limits * Good, because the same mechanism could be used to look up files by id, when the underlying filesystem is an existing POSIX filesystem. * Bad, because backup needs to cover the metadata as well. Could be mitigated by sharing metadata per space and doing space wide snapshots. +* Bad, because it is a bit more effort to access it to read or index it.