-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Gateway++ Phase 1 #100
Conversation
In [nft.storage](http://nft.storage) we have the following high priority needs: | ||
|
||
- Add the Pinning API to ipfs-cluster. | ||
- Add transactional CAR file uploads to the Pinning API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind elaborating on how this endpoint should work / look like?
How different it would be from /api/v0/dag/import
?
transactional is being mentioned multiple times in this proposal, but I feel it's used to describe specific behavior in specific use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My read: it should basically be /api/v0/dag/import
, but nicely integrated into the Pinning API. Send it a blob, receive back a success/fail message, and transactional in the sense that it's all imported or not. If the CAR has a problem part way through, then bail on all of it. Details on how to do this blob of binary should be resolved this week hopefully with the binary API discussion, is multipart/form-data
appropriate here? If we're doing this fresh for the Pinning API then we have an opportunity to try out an alt approach that we might choose for a v0 binary solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cluster is adding support for CAR files on the /add endpoint (which otherwise mimics api/v0/add).
Some choices cluster (or I) made:
- It's a POST multipart - even if cluster just accepts a single CAR part with a single root, multipart is how we usually upload things in the web and it is flexible enough to do other things (like normal adding).
- CAR must have a single root. The cluster API is constrained by being able to Pin one thing, so CARs must have a single root, or otherwise multiple roots would have to be wrapped in a single CID. I see the pinning API also does not have a "multple pin" endpoint so this may be a reasonable limitation also in the pinning API.
- I did not add a new endpoint because there is significant overlap between adding CARs and adding files normally: replication factors, pin options, stream channels, pin sharding etc. If a Pinning API add endpoint is added for CARs, think it might be expanded in the future to do normal unixfs-adding, or raw block-adding.
- Cluster added a
format=<car/unixfs>
query option to the/add
endpoint control how things are supposed to be added (choosing a DAG Formatter, which given an input produces ipld.Nodes as output).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the pinning API was designed to handle the operation of pinning content which is a separate operation from pushing content to a particular endpoint. @lidel may be able to fill in the blanks here. I'm having trouble finding the slides at the moment, but this talk from Juan (and the slides in the background) https://youtu.be/Pcv8Bt4HMVU?t=912 setting the background for the pinning API discussion differentiates between the different types of operations that might need to be provided.
Using CAR files as a mechanism for pushing data is wasteful in that it ignores the existence of duplicate data at the endpoint. For example, adding a 10kB file to a 100MB directory now requires uploading 100MB of data. Making CAR file uploads "first class citizens" and the recommended way people interact with our stack is IMO a mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's actually high demand for HTTP-only environments having a standardized ingestion format that's not /api/v0/import
then providing something here seems reasonable.
However, IMO we should have tooling in place that points people down a more correct path (i.e. a libp2p node that spins up a single WSS connection to the endpoint from the pinning API and sends the data over Bitswap/GraphSync).
Additionally, it might be nice if we could allow people to be more efficient by being able to ask the pinning service "which blocks in this CAR file manifest do you already have?" and then only uploading a CAR with the delta of missing blocks. Since this is an optimization it can be done later if it's a pain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me add that we should also work towards software like @aschmahmann describes that would support a graphsync upload, outside this proposal, but that would be more friction for the immediate needs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hsanjuan can you confirm that this item is done as of ipfs-cluster/ipfs-cluster#1343?
So the only thing left in this proposal is the pinning API to cluster? (plus the doc+deploy items below)
Or is there more to do with CAR uploads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at the very least i’d expect the CAR upload feature to need to be updated to accept and validate a token in the same way the Pinning API does after the Pinning API lands in cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hsanjuan can you confirm that this item is done as of ipfs-cluster/ipfs-cluster#1343?
The item as written is not done. Cluster added CAR-file import to its own REST API which is different than the official Pinning API (which it does not have). When the Pinning API knows how it wants to support CAR file import, it should be easy to re-use the importer that cluster includes now, along with the rest of the Pinning API and the token-based authentication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding "DAG import" endpoint to Pinning Service API is being picked up in ipfs/pinning-services-api-spec#73 (comment) – would appreciate feedback.
|
||
#### Alternatives | ||
|
||
There are alternative approaches to building thin clients. The proposals around changing/improving the RPC API could be designed for this purpose, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What specific requirements do you have for those "thin clients"?
I've been discussing "thin clients" with mobile browser and IoT vendors and most of their needs could be accomplished by regular IPFS node with disabled p2p transports and discovery and doing content via CAR import/export via Gateway.
Sounds like the only additional piece here is remote pinning. Perhaps we could identify common needs and spec out a variant of our stack tailored for thin clients? Mobile browsers would really like having this mode as a pre-built preset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regular IPFS node with disabled p2p transports and discovery and doing content via CAR import/export via Gateway
So what's left in a "regular IPFS node" when you strip out these bits? This description sounds just like what this proposal wants but without the notion of being a "regular IPFS node". But that probably comes back to the problems we have of "IPFS node" being something different for everyone! Has import via the gateway been something already on the table? How has that been imagined so far and is there an alternative here to pulling in the Pinning API to achieve this?
Symmetric use of CAR for import and export would certainly be worth exploring as part of this proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what's left in a "regular IPFS node" when you strip out these bits?
Integrity guaranteed provided by content addressing (data can be fetched in trustless manner) and ability to use IPLD for advanced data structures.
Has import via the gateway been something already on the table? How has that been imagined so far and is there an alternative here to pulling in the Pinning API to achieve this?
Yes, we are planning to add DAG import/export directly to gateway endpoints (/ipfs/
, /ipns/
). Longer discussion in
ipfs/in-web-browsers#170 but tldr idea is:
- Improve the concept of a writable gateway to support DAG import via
HTTP PUT /ipfs/{cid}
- IPNS publishing could be as easy as
HTTP PUT /ipns/{libp2p-key}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many of the environments that we're concerned about are unable to sustain a basic libp2p node that makes a single connection via TCP/WebSockets and really need to just have HTTP?
As mentioned in some of my other comments (https://github.com/protocol/web3-dev-team/pull/100/files#r617675012, https://github.com/protocol/web3-dev-team/pull/100/files#r617641097, https://github.com/protocol/web3-dev-team/pull/100/files#r617641520) we can efficiently use libp2p to transfer IPLD data between two peers as all the transports we support have bidirectional streaming, otherwise we lose efficiency by being unidirectional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many of the environments that we're concerned about are unable to sustain a basic libp2p node
I think it is less about resources and more about "deployment style", more specifically about preferring stateless-ness where possible.
A libp2p node is an active unit, requiring an actively running process, servicing of periodic protocol chatter, etc.
An HTTP client interface on the other hand is completely and utterly "dumb". You could drive such an "http-only ipfs-node" from a bash script, which is decidedly not possible today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many of the environments that we're concerned about are unable to sustain a basic libp2p node that makes a single connection via TCP/WebSockets and really need to just have HTTP?
Serverless (Lambda), Cloudflare Workers, and mobile devices.
Pretty much all the highest growth application environments have trouble with long running processes and connections and prefer or require a stateless protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have trouble with long running processes and connections and prefer or require a stateless protocol.
Is a long running HTTP upload exempt from this? If not then spinning up a temporary libp2p node shouldn't be very different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HTTP upload is not exempt, we can’t push too much data at once. We’re going to have to break up large files by encoding in the client and doing uploads under 100mb to get around CF Worker limits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cloudflare Workers seem to have support for WebSockets https://blog.cloudflare.com/introducing-websockets-in-workers/ so using libp2p shouldn't be a problem there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has an interface for being a websocket service but there’s no client in CF workers.
proposals/gateway-plusplus-phase1.md
Outdated
|
||
### Content Routing for Large Providers | ||
|
||
Gateways and large providers need to be directly peered since large providers have too much content to provide in the DHT. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we explore if having a provider strategy that only announces file root blocks improve things for big providers?
Most of the data is unixfs, and most of the announced blocks could be skipped. Only file roots matter in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’ll help, but we’re throwing incremental improvements as an exponential problem. nft.storage will have too many CIDs to keep in the DHT by the end of the month even with roots only and the improvements Adin made that havent been released.
We’ve come to the same conclusion other large providers like Pinata came to, we can’t support the DHT with this much content.
But this is going to work out because content discovery has always been about more than just the DHT. We should work on a protocol for a federation of large providers to use and continue to improve the DHT for a larger network of more nodes with smaller amounts of content per node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. I wonder if we could leverage DNS hints here.
We discussed having websites and gateway announce own addrs to enable clients to preconnect and skip DHT step: ipfs/kubo#6516
|
||
In [nft.storage](http://nft.storage) we have the following high priority needs: | ||
|
||
- Add the Pinning API to ipfs-cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for adding Pinning Service API to ipfs-cluster – this will not only help with NFTs, but enable people to self-host pinning infra with ease and use it with ipfs-webui v2.12.0+ and soon ipfs-desktop and Brave.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the benefit of adding the pinning service API to cluster when cluster already provides the required push API? Is it purely for client code generation and auth tokens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. It would be nice if Cluster did support the pinning service API, but adding files and pinning dags to a remote cluster is already well supported. We make use of this in adding websites to cluster from CI which is a great example of a constrained environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much of that can be replicated in a browser if you had the auth available? Is ipfs-cluster-ctl
just a simple wrapper around the REST API + some UnixFS slurping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! ipfs-cluster-ctl
is a wrapper around the cluster REST api. The auth is basic, so with some https, you're good.
ipfs-cluster-ctl is an HTTP API client to the REST API endpoint with full feature-parity that always works with the HTTP API as offered by a cluster peer on the same version. Anything that ipfs-cluster-ctl can do is supported by the REST API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the benefit of adding the pinning service API to cluster when cluster already provides the required push API? Is it purely for client code generation and auth tokens?
And swapping Pinning Service providers as needed. But yeah, I don't see it is a blocker. The regular REST API can do the needed things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And swapping Pinning Service providers as needed.
But that doesn't actually provide anything in this scenario as the Pinning Service API doesn't allow for pushing of data, so any supporting Pinning Service Provider won't be able to accept the CAR files unless we extend the pinning services API to include the ability to push.
Not opposed to specifying a Pushing API that services can implement. Actually I feel like this issue is more about pushing data directly and not about pinning at all so it is confusing to try and suggest that the pinning API is useful in solving this problem.
|
||
While this maps well to where web developers are today, it's not a "pure p2p" approach to solving problems. We're beefing up the ability to rely on large IPFS nodes that end up | ||
being federated rather than fully decentralized. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed approach results in wasted bandwidth (upload from the user and download from the service provider) when some of the data already exists on the service provider.
This pushes developers away from working with modifiable/appendable data structures which is something we have otherwise been encouraging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above regarding “wasted bandwidth.”
We don’t get to determine what data structures people use. NFT developers are already trying to use IPFS and we are not meeting their needs. We may wish they had done something different but we’re past the point of being able to determine their pattern of use.
|
||
## Problem Statement | ||
|
||
For reading data, the IPFS Gateway is already serving these users quite well. Not only does it allow them to read data from the IPFS network without running a full node, they are able to integrate with existing HTTP caching infrastructure to improve performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quite well ... if the data is unixfs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the gateway supports block reads https://github.com/ipfs/go-ipfs/blob/master/docs/gateway.md#read-only-api so you can get the data for non-unixfs. for most of the stuff we built in the IPLD team we just used block read/write interfaces, the DAG API was never quite the right fit.
if you’re working with really long chains you’ll need something like Graphsync, or we could go down the GraphQL route like i had in the future section before I pulled it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or #1 :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this proposal, and the direction it's pointing.
IMO this would much aided by a better name, describing what this proposal actually contains. You can still use "Gateway++" to label a larger direction, but I think it inhibits understanding of the immediate goal and use case. Consider linking to other gateway and API-related proposals to paint the bigger picture.
|
||
- Add the Pinning API to ipfs-cluster. | ||
- Add transactional CAR file uploads to the Pinning API. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to have ipfs/kubo#6129 as an addition to the HTTP API. It would allow users not using an IPFS node to assess the correct behaviour of the gateway(++).
@aschmahmann @lidel I'd like to focus in on this comment for a bit:
And the video link @aschmahmann helpfully provided that has this line in it: https://www.youtube.com/watch?v=Pcv8Bt4HMVU&t=912s Juan mentions a "GitHub thread" in there that touches on this, so there must be more context here. There's also a slide a little later in the video titled "Interface Wishes" that continues the point although Juan doesn't discuss the detail in that slide. In the Pinning API we have the notion of "provider hints" and I'm wondering, in the framing of Juan's talk, what the distinction might be between "go and find it over there and pin it" vs "here it is, just pin this" (i.e. the provider hint is essentially just "it's right here!"). I imagine the thinking of the Pinning API evolved somewhat from that discussion so it shouldn't necessarily hold us back, but I don't want to miss a key distinction if there is one in here that we're not seeing clearly. |
Maybe that context comes from this thread: ipfs/notes#378 (comment) Although the concerns that @lanzafame expressed there seem to be more about DAG construction and formats, which is dealt with by just using a CAR - i.e. we offload DAG construction to the user, entirely, and just take pure IPLD blocks. If you want UnixFS then make it yourself and upload it. |
Do we actually have a server-side implementation of the Pinning API anywhere or did we just make a spec for the ecosystem pinning services, plus the ability to codegen clients from it? i.e. if we tackle this, we’re going to be implementing the Pinning API from scratch aren’t we, not just merging some things that already exist. |
I think both are truthy https://github.com/ipfs-shipyard/rb-pinning-service-api |
Is this still in progress @mikeal ? |
@olizilla this is all either done or being superseded by a yet-to-be-written filecoin.storage doc, are we good to close it out? |
this is being superceeded by filecoin.storage |
No description provided.