Skip to content
This repository has been archived by the owner on Aug 2, 2021. It is now read-only.

SIP - Changes in manifest handling (the trailing slash problem and other issues) #178

Open
cobordism opened this issue Dec 20, 2017 · 8 comments

Comments

@cobordism
Copy link

cobordism commented Dec 20, 2017

Manifest traversal, from trailing slashes to over/under matched paths has been a headache for a while. We have gotten bugs, we have gotten unexpected behaviour, we have gotten confused.

This issue is created as a placeholder for the following discussion:

"Manifests should treat / as a special character and should always break on / and not on any substring."

@cobordism
Copy link
Author

Examples of unexpected behaviour curently:


compare with/without trailing /
http://swarm-gateways.net/bzz:/b7395fd6a3165b0166cb27c7cb3f5b15be90158bc742fb5d36d420d295d8465d/img
to
http://swarm-gateways.net/bzz:/b7395fd6a3165b0166cb27c7cb3f5b15be90158bc742fb5d36d420d295d8465d/img/
the first give error 500, the second gives 300 (correctly, but links are wrong)


over/underspecified paths
http://swarm-gateways.net/bzz:/fc3f49ff5fa3ce86e97f081c9ac74751b48be3a4cccb54a7aed4c5feb2574d69/img/thumbs/THUMB_Alexey_1.png (image as expected)
http://swarm-gateways.net/bzz:/fc3f49ff5fa3ce86e97f081c9ac74751b48be3a4cccb54a7aed4c5feb2574d69/img/thumbs/THUMB_Alexey_1.pn (works)
http://swarm-gateways.net/bzz:/fc3f49ff5fa3ce86e97f081c9ac74751b48be3a4cccb54a7aed4c5feb2574d69/img/thumbs/THUMB_Alexey_1.pngg (error 404)

I bring this up because previously we had the opposite problem where the third version worked but the second did not (hint, they should both not work)


missing manifest entries
When we try to request a missing entry such has: http://swarm-gateways.net/bzz:/fc3f49ff5fa3ce86e97f081c9ac74751b48be3a4cccb54a7aed4c5feb2574d69/img/thumbs/missing.png we get a manifest... why?

@cobordism cobordism changed the title SIP - manifest entries should break on '/' and not arbitrary common prefixes SIP - Changes in manifest handling (entries should break on '/' and not arbitrary common prefixes) Dec 20, 2017
@cobordism
Copy link
Author

For reference - earlier discussion on handling manifests

@cobordism
Copy link
Author

cobordism commented Dec 26, 2017

From Gitter:

Lewis Marshall @lmars 14:33

@zelig
In order to support RESTFUL APIs via client side js we need to support fallback to longest existing prefix ... This has been a conscious feature from day 1. @lmars why does this shake the rock solid foundation you have in mind?

So it isn't specifically the feature which shakes the foundation, I am trying to convince you guys that the current implementation is giving us constant headaches and head-scratching moments because we don't know what the features are, so we keep breaking those features as they are untested and we don't know when we break them until someone comes along and says they were a feature since day 1 .
My focus has been trying to come up with a solution which is simpler (both as a model, Unix filesystem, and in code, my example above), and is easier for us to reason about so that we can easily spot when the code gets broken, whereas currently the code seems very fragile.
I admit my example falls down for large directories, but this is a well researched area (e.g. ext4 supports large directories, so does IPFS which already serves Wikipedia).
Whether it's an arbitrary prefix-trie or branching on / like a filesystem, let's document all the features it should support, write some more exhaustive tests and let's stop breaking it
I'll start by listing some of the features which have been mentioned:

  • have a catch-all function where the same content is served with an arbitrary suffix added to the path (e.g. a client-side REST API can be deployed at / and then paths like /user/1 will return the app, leaving the app to further process the path and act accordingly)
  • efficiently serve sites with large directories of files like Wikipedia
  • mount a manifest like a filesystem using FUSE (either just hide paths that end in /, don't allow any paths in the manifest to contain /, add options to control the behaviour)
  • match on path?query from the URI rather than just path so that different content can be served using the query string (to support map tile APIs as mentioned by @nagydani)

EXT4: https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Hash_Tree_Directories
IPFS: ipfs/notes#76

@cobordism
Copy link
Author

From the EXT4 link:

A linear array of directory entries isn't great for performance, so a new feature was added to ext3 to provide a faster (but peculiar) balanced tree keyed off a hash of the directory entry name. If the EXT4_INDEX_FL (0x1000) flag is set in the inode, this directory uses a hashed btree (htree) to organize and find directory entries.

So that would correspond to two different types of manifest, yes?

@cobordism
Copy link
Author

cobordism commented Dec 26, 2017

What should the behaviour be?

  1. a manifest contains a hash of an html file entry for the empty path and also contains an entry for ?x=y. A user requests hash-of-manifest?x=y
  2. a manifest contains a hash of an html file entry for the empty path and also contains an entry for ?x=y. A user requests hash-of-manifest/?x=y
  3. a manifest contains entries for i and images and the i manifest contains an entry for mages. A user requests images.
  4. a manifest contains a hash of an html file as default (empty string) entry and the hash of something else at /. The manifest hash is saved as name.eth. A user calls bzz:/name.eth/
  5. a manifest contains a single entry for file, a user requests fil
  6. a manifest contains a single entry for file, a user requests fileXXX
  7. a manifest contains entries for abc, abd and abe. A user requests a.
  8. a manifest contains a default (empty string) entry as well as entries for abc, abd and abe. A user requests a.
  9. a manifest contains as default entry the hash of a manifest with default entry the hash of a manifest with default entry the hash of a manifest with default entry the hash of a manifest with entry index.html. A user mounts the original manifest via FUSE.
  10. a manifest contains only a default entry - hash of a file. A user mounts the manifest with FUSE.
  11. A manifest with hash H1 contains an entry for .eth with hash H2. The domain H1.eth is registered and hash H3 is added as content. A user opens bzz://H1.eth

@holisticode
Copy link
Contributor

holisticode commented Dec 27, 2017

  1. A manifest contains an entry for a (dynamic js) single-page app, thus needs to handle everything behind a #, e.g. <host>/bzz:/<hash>#page1, <host>/bzz:/<hash>#page2, <host>/bzz:/<hash>#page3

@cobordism
Copy link
Author

@cobordism cobordism changed the title SIP - Changes in manifest handling (entries should break on '/' and not arbitrary common prefixes) SIP - Changes in manifest handling (the trailing slash problem and other issues) Dec 30, 2017
@cobordism cobordism added this to the 0.3 milestone Jan 2, 2018
@cobordism
Copy link
Author

notes from yesterdays discussion:

  • We need to make sure we handle the ? correctly
  • manifests should declare explicitly what to do with overmatching (requesting fileX when manifest contains file); whether to serve the content or a 404. [Default 404 unless overmatch begins with a ? maybe?]
  • The / character that appears directly after bzz:/<hash> or bzz:/name.eth is special and we should probably automatically add it in with a redirect. Although this means that it is not possible to load name.eth but only name.eth/, but has the benefit that html links are always handled correctly whether loaded at bzz://name.eth or http://gateway/bzz:/name.eth
  • It should be possible to have the empty path resolve to a hash ("hard link") or to a string - specifying another entry in the manifest ("soft link").
  • we did not discuss URL fragments
  • we did not reach a conclusion about what the default behaviour for undermatches should be - 404 or 300 or ... but whatever the default is, we said that the manifest could provide an explicit override.
  • we will schedule another call to carry this discussion forward.
  • we did not discuss ENS

A note on mounting file systems:

  • Several team members feel that mounting a manifest as a filesystem is not of primary concern - or rather: not every manifest needs to be mountable as a filesystem.
  • Any default file in a manifest (hard link above) will be invisible to the mounted directory. [Unless further tooling is developed in which this default hash can be some form of attribute to the directory]
  • If there are (if we allow) keys ending with a / that resolve to hashes of content other than manifests, then that content will also be invisible in the mounted directory.
  • Suggestion: Any directory uploaded with swarm --recursive up should produce manifests that are mountable as filesystems.

@cobordism cobordism modified the milestones: 0.3 breaking changes, 0.3.1 Jun 22, 2018
@gbalint gbalint removed this from the 0.3.1 milestone Aug 2, 2018
@acud acud added the manifest label Sep 20, 2018
@acud acud self-assigned this Sep 20, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants