-
Notifications
You must be signed in to change notification settings - Fork 3
Conversation
draft.md
Outdated
|
||
- CBOR Map | ||
- Key: CBOR Byte or Text String: File Name | ||
- Value: CBOR Array of: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am really concerned having attributes being represented as an array:
- it hampers extensibility - we are certainly forgetting something today, which means almost certainly yet another optional map in the future, and then which one is which?
- It makes parsing difficult when represented as JSON
Why not just a map? With all keys being pre-agreed upon multicodec-style: i.e. it must exist in one of the centralized spec tables in order to be recognized by anyone
We can still declare some of the keys as mandatory, and it is at the discretion of gateways/nodes/etc to decide what to do with "obviously malformed" blocks. We already have this with protobuf/unixfs: if one uploads a link-block with only "type 2" fields, and no "data" - everything rejects it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CBOR defines sorting logic for canonicalizing maps, but not for arrays, and a canonical representation for unixfs directory should be a must.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just a map?
I covered the reasons below in the notes section.
@ehmry I don't see how your comment applies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not tied to this idea. The amount of space saving is something that can be calculated once we determine what the keys will be if we use a map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevina I assumed you meant an array of [key, value] tuples. CBOR is supposed to be schema-less and ordered values seem like schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if we go with a map for directory entries certain keys will be required in order for the directory entry to be well defined so that is also a schema in a way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, assigning integer keys to the spec attributes would order them in the same way and just be one byte of overhead for each attribute in a map.
draft.md
Outdated
* The key type can either be a byte or text string as POSIX makes no | ||
requirements that file names be utf-8 and it is important that any | ||
file name can be faithfully represented, if the string is utf-8 | ||
then the type will be Text. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes an implicit decision regarding the question I posed at the end of ipfs/kubo#4292 (comment): we shift the onus of "check that the name is safe to use/dipslay" to the consumers. Are we ready to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is important to represent any valid file in a POSIX system. I am against restricting the range in the spec. Yes I think it is the consumer job to make sure that the filename is safe to display.
draft.md
Outdated
### Notes | ||
|
||
* Rather than have a special attribute for an executable bit it is more compact if we just make this a different type | ||
* It is very useful to be able to determine if a link is a directory or an ordinary file so I made it as separate type, also there can be multiple ways to define a file size for a directory so it is best to just leave it out as it is of limited usefulness |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there can be multiple ways to define a file size for a directory
Actually there are only 2 ways - either you only count the logical bytes ( what the Windows properties
UI does ), or you take into account the allocation overhead of the filesystem - the blocks taken by both the directories and the files themselves rounded up ( what the unixish du
does ).
Given that in the context of IPFS the DAG is completely decoupled from the storage ( it may be files, it may be badger, etc ), the only sensible way to define a file size for a directory is to count the logical bytes, which I've done in my prototypes.
I would be sad if I can't express these cumulative values as part of every link within an FS tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unix has it's own (basically useless) way of defining the size of a directory.
I am not totally against included the cumulative size of a directory if we can agree on how to define it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know there was a review button.
draft.md
Outdated
|
||
- CBOR Map | ||
- Key: CBOR Byte or Text String: File Name | ||
- Value: CBOR Array of: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CBOR defines sorting logic for canonicalizing maps, but not for arrays, and a canonical representation for unixfs directory should be a must.
draft.md
Outdated
* `l`, `symlink`: symbolic link. The second field is the contents of the link | ||
* `o`, `other`: link to other ipld object, links followed for GC and related operations | ||
* `u`, `unknown`: link to unknown objects, links not followed | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an integer enumeration was used rather than ascii characters, the canonical CBOR representation would be packed to one byte rather than two. Given that the CID will be in raw representation, I don't think clarity would suffer by an enumeration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not agents this.
draft.md
Outdated
|
||
### Notes | ||
|
||
* Rather than have a special attribute for an executable bit it is more compact if we just make this a different type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If file types are enumerated then the high bit in a one-byte packed CBOR integer (0b10000) could be an informative bit that would make regular files (type 0) into executable files (type 16).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could work.
Everyone. I just rewrote the draft now that I have a better idea what people are looking for. |
The value of the map is another CBOR map with the following standard fields: | ||
|
||
- `type` | ||
- `exe`: CBOR boolean: executable bit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of just having executable here, we should do a full rwxrwxrwx unix permissions set (a uint32)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is the full unix set of permissions does not have a lot of meaning on other operating systems. Even within unix systems it has limited meaning when stored in an archive.
Others may have stronger opinions on this than me. In particular see #1 (comment) by @ehmry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of just having executable here, we should do a full rwxrwxrwx unix permissions set (a uint32)
@whyrusleeping the full st_mode
( entry type + permissions ) is in fact only 16 bits ( within a typically-32-bit-aligned struct member ). If we reuse it as-is we gain some extra bit of interop with everything that understands st_mode
( git
took this path: https://stackoverflow.com/questions/737673/how-to-read-the-mode-field-of-git-ls-trees-output/8347325#8347325 ).
Of course this makes direct-query of type a bit harder, but then again every libc provides S_IF...
macros
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is no harm of having this additional data stored. Systems that have no meaning of those bits will skip them, systems that have will use them and allow for preservation.
It is very similar with uid and gid. They have no meaning on some systems, they may have no meaning on different machine with same system (different uid/gid mappings) but they are crucial if I wanted to, for example in future, use IPFS for /home
storage in managed multi-user multi-workstation environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should support the full range, but maybe not change the default behaviour?
or maybe we only record user executable by default.
thinking about it a bit more, the 'readable' flag really doesnt make a lot of sense in this context. I can read anything thats in ipfs...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storing the full st_mode
by default just feels wrong to me and I will go as far as it could create additional complications down the road (what those complications are I am not sure).
One possible complication is how to handle the writable bit in st_mode
, should it always be set or not by default. If it is not set then should the st_mode be honored when extracting files. If it is then that could be annoying as all files will end up readonly. Or maybe only the executable bit should be honored by default, in that case it just seams better just to store that single bit.
For this version of the standard I feel rather strongly we should stick to just the executable bit as it was stated in the requirements (#1), or nothing at all (as @lgierth suggested we don't add additional meatadata). The full st_mode
can be included in a later version of the standard.
- `data`: normally a CBOR link, but can be other types depending on the value of the `type` field | ||
- `size`: cumulative size of `data` | ||
- `fsize`: (file size) cumulative size of the payload of `data` | ||
- `fname`: CBOR byte string: original filename if it differs from the key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unnecessary (though I admit i've missed a lot of the conversation from over in the other issue). Why would this differ from the map key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the discussion on #3 is rather long. It may differ because unix filenames are not required to be UTF-8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whyrusleeping two things are at play / in conflict:
- Desire for full POSIX compatibility mandates names to allow any sequence of bytes excluding
0x00
and0x2f
- The current proto-IPLD spec mandates keys ( i.e. names ) to be unicode: the mandate flows from the spec declaring a strict superset of RFC 7049
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, I don't have a lot of strong opinions here. I will defer to @lgierth @diasdavid and @Stebalien
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this field going include which character set should be used to interpret it somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Just the raw byte string if it differs from the key (which is the filename) that is in utf-8. For display the key should be used.
* _omitted_: regular file | ||
* `dir`: directory entry | ||
* `special`: special file type (fifo, device, etc). | ||
The `data` field is a CBOR Map with at least one field to describe the type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should overload 'data', especially avoid making it have different types based on the value of a key in the parent level. That sort of parsing is hard to do efficiently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the simplicity of having the contents of the file entry always be in the same field, for most types it is an IPLD link, for symbolic links it is the target, for special file types in a CBOR map with the details of the special file. My thinking was the type would just be an interface and then cast the correct type once it is known.
I can instead have the following fields:
link
: CBOR link when applicabletarget
: symbolic link targetdata
: a CBOR map that contains additional data for the directory entry that is not alink
ortarget
I rather not provide special fields to describe the content of all the different types of special
files.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're going to have to inspect fields to determine what to do with things anyways. Overloading things doesnt really save us much in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whyrusleeping I am having a hard time interpreting that comment, are you okay with my proposal (link
, target
, data
fields) are you saying we should create special fields for each and every special file type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More to the point, I don't want to enumerate the required fields in the first version of the spec. I like the data
field abstraction because we can defer that part for later, it also allows us to to define a standard set of tags so that an implementation can error out of it find something it doesn't understand.
|
||
- `data`: link | ||
- `size`: cumulative size of `data` | ||
- `fsize`: (file size) cumulative size of the payload of `data` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevina this seems backwards. The size
of the protocol-wrapped objects is currently optional for all intents and purposes. On the other hand the actual fsize
is mandatory if what you are expressing is a "top node of a file DAG".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear fsize
is the logical bytes while size
is the direct or size that included the overhead of interior nodes. fsize <= size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevina precisely. The logical bytes ( fsize
) are interesting because they are needed to calculate buffers / HTTP header / etc
The node overhead is only useful for estimating how much local storage would be necessary to grab the entire DAG locally, but given the overhead is never more than ~10 bytes per node, it becomes ( from my PoV at least ) needless cruft.
I've done a lot of tests over the last year with DAGs specifically excluding mention of what you refer to as size
. Everything works fine. I strongly believe the overhead-including-size should not be a mandatory part of any future spec ( and thus should be the last item in the CBOR array, not an intermediate one )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly believe the overhead-including-size should not be a mandatory part of any future spec
Note: This is not a CBOR array but a map the structure is {type: "file", data: [{/*link*/}, {}, {} ...]}
.
I don't have a strong opinion on which sizes to include. @whyrusleeping thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think size
and fsize
are both useful - download progress would depend on size
, for example, as you still need to fetch the wrapper bytes.
@@ -0,0 +1,101 @@ | |||
# Draft IPLD Unixfs Spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's stop calling this IPLD Unixfs, the current Unixfs is already IPLD. This proposal is to:
- Move away from dag-pb to dag-cbor
- Leverage the flexibility of dag-cbor to add more Metadata
- Improve Unixfs and remove some of the limitations we found while using unixfs with dag-pb.
One of the design goals of the new Unixfs is that it should be 100% interopable with the old (a directory of Unixfs2 should be able to have a file of Unixfs1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name "IPLD Unixfs" was @whyrusleeping idea and I just went along with it. We could call it "Unixfs V2", although I am not sure how much with want to stick with the unix filesystem structure as a model (I personally thing we should move away from it and focus on the compartments that are important to a generic archive structure).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine not calling it ipld unixfs. @diasdavid is right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a directory of Unixfs2 should be able to have a file of Unixfs1
It's probably worth mentioning this in the spec.
Also, what about Unixfs1 directories?
Anyone following up on this spec? |
I am working on a utility for manipulated Unixfs structures independent on IPFS, but as a client of the IPFS API. But its not actually an IPFS client right now, and it doesn't conform to this spec yet. https://github.com/ehmry/nim-ipld/ |
@diasdavid it was stalled, partly because I still don't have a clear picture what others are looking for and partly because I was waiting feedback, some more recent issues that have come to my mind:
@whyrusleeping others thoughts |
|
For (1) and (2) are you saying we should just not worry about how to traverse a raw cbor dag and something that needs to can just traverse the cbor structure looking for the special link type? For (3) I honestly don't understand the hamt structure, but I think we should avoid the issue of having to flip-flop between a shared and non-shared directory. If a directory becomes too big, we split it and create a special entry to point to the shared parts, if it shrinks we inline (so to say) the directories back into the root. (I hope this is making sense, if not I can spell it out with examples because I am struggling to explain what seams like a simple concept for a lack of the right terminology) [Edited to remove my comments on (4) as it was not well thought out.] |
I think I'd like to narrow down the scope of this. Metadata is a pretty I propose for now we look only into migrating unixfs from dag-pb to dag-cbor, Moving unixfs to dag-cbor is a pressing issue, and additional metadata is not. We should concentrate on making resolution and traversal work nicely. |
@lgierth I am trying to minimize the fields included in this standard, I am not sure sticking with what we currently have is the best action is it does not meet the requirements stated in #1, in particular the executable bit, and being able to distinctest directories from files. My previous comments (1) (2) (3) concern the data structure not the metadata. |
A Unixfs is either a file or a directory. | ||
The top level IPLD object is a CBOR map with at least two fields: `type` and `data` | ||
and maybe a few other such as a version string or a set of flags. | ||
The `type` field is either `file` or `dir`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I say better to define a CBOR tag for files and and a tag for directories, and define a file as a tagged array and a dir as a tagged map. That makes it clear from the first atomic in the CBOR that you are parsing UnixFS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ehmry we can do that, but do we then need to register the tags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, I was thinking just picking two random uint64 tags.
What I'm trying to say is: let's narrow the requirements for the first stage. We can add metadata at the second stage. A CBOR-based unixfs is much more approachable if we don't mix it with extensions of the contained metadata right away. Trying to do everything at once will make this harder and longer to pull off. Moving to #1 |
An easy thing to do would be to define three basic integer keys for metadata: type:1, CID:2, and size:3. A file DAG is basically a list of maps with keys CID and size and leave type as reserved for future use. A directory DAG is a map of text keys to maps with keys type, CID, and size. Type is be a field signifying a directory or a file, CID a raw block or CBOR dag, and size a logical file size or number or subdirectories. Additional metadata keys and file types can be added in later specs. |
|
||
If an IPLD file is a leaf its CID type is `raw` (0x55) and has no structure. | ||
Otherwise its CID type is `dag-cbor` (0x71). | ||
The `type` field is set to `file` and the `data` field is an CBOR array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current dag-cbor
and bitswap
implementation it's quite limiting to require that this be an array of links. There's a block size limit of 2megs across the board. An array of links when serialized to CBOR can't be more than ~2,500 links before the node itself is larger than 2megs.
Unless we bake in a way to shard this we'll be limited to files that are smaller than ~5GB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'm a lot less concerned about files under 5GB than I am concerned with not being able to develop a smart chunker for fear that if I don't use the max blocks size the node will be too large.
I'm starting to think about developing a chunker for javascript bundles that uses the sourcemap from the bundler to chunk it into blocks built from each file. This should greatly reduce the new blocks that need to be pushed when new bundles are created, but the number of chunks will easily be greater 2,500.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan is to use a CHAMP instead of an array of links, like we do today for sharded directories. I have an implementation of a CHAMP (HAMT) in ipld here: https://github.com/ipfs/go-hamt-ipld
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JS implementation for unixfs sharding can be found at https://github.com/ipfs/js-ipfs-unixfs-engine/tree/master/src/hamt. It was built by @pgte a while ago.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something, but I've only seen hash map trees used for named key values, not for an ordered array. If we're using this as a replacement for a CBOR Array
is there key semantics that need to be defined here in the spec for that? I'm only seeing the current JS hamt implementation used for sharded directories, not sharded file parts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the short answer is "don't include that many file parts in a single node"
This isn't sufficient for at least 3 use cases I can think of.
- Bundles Tarballs, webpack and browserify output, etc. The ideal way to chunk these files is to chunk around the boundaries of the originating files so that changes to the bundle translate into a relatively small number of part changes. Tarballs that pack many small files, and pretty much any modern front-end bundle, is created from more than 2,500 original files so this puts it easily above the limit.
- Compressed Files Gzipped content, but especially streaming media files: ogg, mpeg, etc. You want the chunker to create blocks that respect the compression windows. This results in faster performance throughout the stack but is especially important when seeking within that content as the codec will always ask for the whole window that seekpoint is from and if it's in the middle of the block this translates into a delayed seek while the content buffers. These compression windows are configurable and sometimes the windows are very small, so a relatively small video file could be more than 2,500K parts.
- Files Larger than 5GB As discussed here there's currently a 2MB bitswap block limit. There's talk of supporting larger blocks but the larger the block the less efficient the transport will be.
It's fine if we just want to say that these use cases are out of scope for unixfsv2
. Pushing them out of scope just means that we have some time to see how people solve these issues outside of unixfsv2
and it may help us get things shipped faster. But we're probably signing up for a unixfsv3
at some point in the future if we don't have any other way to work around this limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikeal look at how the importers work and structure the graph of file parts. The answer is still 'dont include that many file parts in a single node'. Ipfs chunks and structures things into a recursive tree, not just a single level with a flat array of links.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Someone in Berlin mentioned that it was effectively an "array of arrays."
One thing to consider with this design, range requests don't work without loading every part of the file from the beginning to the start of the range. There's no information about the size of the individual parts so the only way to know how to seek is to load them all in serial. Really not ideal, especially for media uses cases because it makes seeking quite slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current design has range information, seeking is efficient and only has to load the required nodes for that graph traversal. I assume we would do exactly the same in V2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see our fix to this involves moving from data: [chunks]
to data: {parts: [chunks], partLengths: []}
because If we make the data attribute an object we can add other relevant information, like the type of chunking algorithm used, which would allow us to implement more efficient syncing between clients.
The `type` field is set to `file` and the `data` field is an CBOR array. | ||
Each element of the array is CBOR map with the following fields: | ||
|
||
- `data`: link |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why this field isn't called link
given that it's always a link?
|
||
An IPLD `dir` represents a directory. | ||
Its CID type is `dag-cbor` (0x71). | ||
The `type` field set to `dir` and the data field is an CBOR map. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small typo, 'an CBOR map'
|
||
If an IPLD file is a leaf its CID type is `raw` (0x55) and has no structure. | ||
Otherwise its CID type is `dag-cbor` (0x71). | ||
The `type` field is set to `file` and the `data` field is an CBOR array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small typo, 'an CBOR array'
When extracting implications SHOULD use the IPLD name and not `fname` unless a special flag is given. | ||
|
||
* To save space fields of a directory may be assigned integer values. | ||
Integers have the added benefit of conveying additional meaning based on there values; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small typo: 'there' vs 'their'.
I had some spare cycles today while I'm waiting for a graph to build so I wrote a quick implementation of the draft spec in JavaScript. |
Stripping this field MUST not change the meaning of the directory entry. | ||
These attributes SHOULD be passed along but do not have to be understood. | ||
|
||
Possible entries: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would "Extended Attributes" be a good place to optionally store explicit media type for problematic data types, as noted in #11 ?
At DWeb we discussed the fact that this PR is impossible to follow at this point. @Stebalien suggested something the Rust community does, which is to close the PR and immediately re-open with a list of the unresolved issues in the body of the PR. |
I'm going to try and move things along and finalize a draft spec by the end of Q4. This thread is a bit too large to continue. There's many conversations about many changes without a clear owner. My plan is to:
|
Hey @mikeal, all! We're nearly a year later now, when's the spec moving forward? We want decent file support in IPFS! |
Our priorities have been pulled in a lot of directions. The good news is, the IPLD folks (including me) are getting ready to take a look at this again starting next week, and with all the new learnings and tools we have from working on IPLD schemas and IPLD selectors over the last couple of months, which will hopefully make a big difference in how tractable the overall thing is, and how quickly, tersely, and confidently we can settle a new language-agnostic spec. I think the other already-merged PRs that referenced this one recently have also already carried the ball a bit further, and I think @mikeal has some other PoC code as well. This issue will probably be closed soon, per the reasoning two comments back, but we're planning a housecleaning and close-a-thon for lots of overgrown issues tomorrow, so I'll leave this to get one more review and be done in that batch. |
@dokterbob We’ve got a working implementation in JavaScript https://github.com/ipld/js-unixfsv2 and expect one in Go fairly soon. We’re also working on standardizing schemas/collections/hamt which is a pre-requisite to shipping this new version (although we can start doing the integration work before this is finished). Another thing to note about these new implementation is that they are being designed to be used outside of IPFS as well, so if you want to start using them before they are fully integrated you can. |
Here is a draft spec so we have something to discuss.
I included justifications for many of my choices in the spec itself, but right now they are just my opinion and represent no authority on the mater.
I do not intend to do any forced updates on this draft so we have a clean record of all decisions made.