Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Gateway++ Phase 1 #100

Closed
wants to merge 2 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 122 additions & 0 deletions proposals/gateway-plusplus-phase1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Gateway++

Authors: @mikeal (with ideas from many people)

Initial PR: TBD <!-- Reference the PR first proposing this document. Oooh, self-reference! -->

## Background

Developers often have to deploy code into constrained environments. Mobile devices, browsers and cloud functions all have a variety of limitations that make running and maintaining a full IPFS node impractical or even impossible.

For the most part, these developers are falling back to HTTP and stringing together whatever HTTP interfaces we currently provide so that they can offload these tasks to a shared IPFS node.

HTTP is the lowest common denominator so when we see environmental constraints in application stacks we see HTTP as the means by which developers work around these limitation in order to access IPFS. This also allows developers to leverage additional infrastructure they already have deployed for HTTP, like caching, with IPFS.

We should lean into this rather than push against it. In the short term, we can’t expect every user and developer to have a full implementation of the protocol. Strengthening the Gateway protocol to meet user needs is the path of least resistance and can enable numerous thin clients to be built for IPFS that can run in virtually any environment.

This proposal establishes "Gateway++" as a meta-project for meeting these user needs with extensions to our gateway protocols and infra.

## Problem Statement

For reading data, the IPFS Gateway is already serving these users quite well. Not only does it allow them to read data from the IPFS network without running a full node, they are able to integrate with existing HTTP caching infrastructure to improve performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quite well ... if the data is unixfs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the gateway supports block reads https://github.com/ipfs/go-ipfs/blob/master/docs/gateway.md#read-only-api so you can get the data for non-unixfs. for most of the stuff we built in the IPLD team we just used block read/write interfaces, the DAG API was never quite the right fit.

if you’re working with really long chains you’ll need something like Graphsync, or we could go down the GraphQL route like i had in the future section before I pulled it :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or #1 :-)


However, the user needs for writing data to the network are not being met well by our current solutions. We have two interfaces that support writes over HTTP and neither was designed as a full solution to these user needs.

- Pinning Service API.
- 😡 Requires you to already be running a full IPFS node as a client.
momack2 marked this conversation as resolved.
Show resolved Hide resolved
- 😡 Is not transactional. You have to poll in order to tell when the data is actually available.
- 😃 Is intended to be multi-tenant and authenticated using a flexible bearer token.
- HTTP RPC API
- 😡 Is single-tenant. No authentication system.
- The RPC API is designed for a single user to manage a remote node, it was not designed and built to be multi-tenant and would look very different if it were.
- 😡 Is not always transactional. Many write calls return before they are available.

These interfaces were built with particular user profiles in mind. The users we’re seeing in NFTs (including our own service as a user of IPFS) are not well served by these interfaces and we need a new “project” definition (Gateway++) that is centered around these user needs.

Developers are building applications that have to run in a lot of constrained environments that can’t support a full node. Strengthening the Gateway to serve these needs over HTTP leverages endpoints they are already using and provides the features they need in a stateless protocol that has ubiquitous support.

We are never going to hear “Sorry, I can’t use HTTP in this environment” from a developer.

## Phase 1

We should expect this to be a long term project with additional proposals in due time.

For now, we have a list of high priority tasks that need to be fixed.

In [nft.storage](http://nft.storage) we have the following high priority needs:

- Add the Pinning API to ipfs-cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for adding Pinning Service API to ipfs-cluster – this will not only help with NFTs, but enable people to self-host pinning infra with ease and use it with ipfs-webui v2.12.0+ and soon ipfs-desktop and Brave.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of adding the pinning service API to cluster when cluster already provides the required push API? Is it purely for client code generation and auth tokens?

Copy link
Contributor

@olizilla olizilla Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It would be nice if Cluster did support the pinning service API, but adding files and pinning dags to a remote cluster is already well supported. We make use of this in adding websites to cluster from CI which is a great example of a constrained environment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much of that can be replicated in a browser if you had the auth available? Is ipfs-cluster-ctl just a simple wrapper around the REST API + some UnixFS slurping?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! ipfs-cluster-ctl is a wrapper around the cluster REST api. The auth is basic, so with some https, you're good.

ipfs-cluster-ctl is an HTTP API client to the REST API endpoint with full feature-parity that always works with the HTTP API as offered by a cluster peer on the same version. Anything that ipfs-cluster-ctl can do is supported by the REST API.

https://cluster.ipfs.io/documentation/reference/api/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of adding the pinning service API to cluster when cluster already provides the required push API? Is it purely for client code generation and auth tokens?

And swapping Pinning Service providers as needed. But yeah, I don't see it is a blocker. The regular REST API can do the needed things.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And swapping Pinning Service providers as needed.

But that doesn't actually provide anything in this scenario as the Pinning Service API doesn't allow for pushing of data, so any supporting Pinning Service Provider won't be able to accept the CAR files unless we extend the pinning services API to include the ability to push.

Not opposed to specifying a Pushing API that services can implement. Actually I feel like this issue is more about pushing data directly and not about pinning at all so it is confusing to try and suggest that the pinning API is useful in solving this problem.

- Add transactional CAR file uploads to the Pinning API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind elaborating on how this endpoint should work / look like?
How different it would be from /api/v0/dag/import ?

transactional is being mentioned multiple times in this proposal, but I feel it's used to describe specific behavior in specific use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My read: it should basically be /api/v0/dag/import, but nicely integrated into the Pinning API. Send it a blob, receive back a success/fail message, and transactional in the sense that it's all imported or not. If the CAR has a problem part way through, then bail on all of it. Details on how to do this blob of binary should be resolved this week hopefully with the binary API discussion, is multipart/form-data appropriate here? If we're doing this fresh for the Pinning API then we have an opportunity to try out an alt approach that we might choose for a v0 binary solution.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cluster is adding support for CAR files on the /add endpoint (which otherwise mimics api/v0/add).

Some choices cluster (or I) made:

  • It's a POST multipart - even if cluster just accepts a single CAR part with a single root, multipart is how we usually upload things in the web and it is flexible enough to do other things (like normal adding).
  • CAR must have a single root. The cluster API is constrained by being able to Pin one thing, so CARs must have a single root, or otherwise multiple roots would have to be wrapped in a single CID. I see the pinning API also does not have a "multple pin" endpoint so this may be a reasonable limitation also in the pinning API.
  • I did not add a new endpoint because there is significant overlap between adding CARs and adding files normally: replication factors, pin options, stream channels, pin sharding etc. If a Pinning API add endpoint is added for CARs, think it might be expanded in the future to do normal unixfs-adding, or raw block-adding.
  • Cluster added a format=<car/unixfs> query option to the /add endpoint control how things are supposed to be added (choosing a DAG Formatter, which given an input produces ipld.Nodes as output).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the pinning API was designed to handle the operation of pinning content which is a separate operation from pushing content to a particular endpoint. @lidel may be able to fill in the blanks here. I'm having trouble finding the slides at the moment, but this talk from Juan (and the slides in the background) https://youtu.be/Pcv8Bt4HMVU?t=912 setting the background for the pinning API discussion differentiates between the different types of operations that might need to be provided.

Using CAR files as a mechanism for pushing data is wasteful in that it ignores the existence of duplicate data at the endpoint. For example, adding a 10kB file to a 100MB directory now requires uploading 100MB of data. Making CAR file uploads "first class citizens" and the recommended way people interact with our stack is IMO a mistake.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's actually high demand for HTTP-only environments having a standardized ingestion format that's not /api/v0/import then providing something here seems reasonable.

However, IMO we should have tooling in place that points people down a more correct path (i.e. a libp2p node that spins up a single WSS connection to the endpoint from the pinning API and sends the data over Bitswap/GraphSync).

Additionally, it might be nice if we could allow people to be more efficient by being able to ask the pinning service "which blocks in this CAR file manifest do you already have?" and then only uploading a CAR with the delta of missing blocks. Since this is an optimization it can be done later if it's a pain.

Copy link
Contributor

@anorth anorth Apr 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add that we should also work towards software like @aschmahmann describes that would support a graphsync upload, outside this proposal, but that would be more friction for the immediate needs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hsanjuan can you confirm that this item is done as of ipfs-cluster/ipfs-cluster#1343?
So the only thing left in this proposal is the pinning API to cluster? (plus the doc+deploy items below)
Or is there more to do with CAR uploads?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at the very least i’d expect the CAR upload feature to need to be updated to accept and validate a token in the same way the Pinning API does after the Pinning API lands in cluster.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hsanjuan can you confirm that this item is done as of ipfs-cluster/ipfs-cluster#1343?

The item as written is not done. Cluster added CAR-file import to its own REST API which is different than the official Pinning API (which it does not have). When the Pinning API knows how it wants to support CAR file import, it should be easy to re-use the importer that cluster includes now, along with the rest of the Pinning API and the token-based authentication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding "DAG import" endpoint to Pinning Service API is being picked up in ipfs/pinning-services-api-spec#73 (comment) – would appreciate feedback.


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to have ipfs/kubo#6129 as an addition to the HTTP API. It would allow users not using an IPFS node to assess the correct behaviour of the gateway(++).

We also need to do the following to kick off the project.

- Document configuring IPFS Gateway to pass Pinning API requests to a cluster or other IPFS node with the Pinning API enabled.
- Configure and deploy our own gateway with this configuration. (We won’t be handing out auth tokens to anyone who doesn’t already have one, this is just to eat our own dogfood).

#### What does done look like?

Each work item in Phase 1 is something nft.storage needs so we have a builtin "first user" who would have all their requirements satisfied by this proposal.

Once this is deployed you'll be able to write thin client interfaces with only a gateway URL and bearer token as configuration.

#### What does success look like?

Beyond just having satisfied the needs of nft.storage, this should also satisfy the needs that Cloudflare has and we would hopefully see support for IPFS in Cloudflare Workers as a thin
client to their Gateway.

#### Alternatives

There are alternative approaches to building thin clients. The proposals around changing/improving the RPC API could be designed for this purpose,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What specific requirements do you have for those "thin clients"?

I've been discussing "thin clients" with mobile browser and IoT vendors and most of their needs could be accomplished by regular IPFS node with disabled p2p transports and discovery and doing content via CAR import/export via Gateway.

Sounds like the only additional piece here is remote pinning. Perhaps we could identify common needs and spec out a variant of our stack tailored for thin clients? Mobile browsers would really like having this mode as a pre-built preset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regular IPFS node with disabled p2p transports and discovery and doing content via CAR import/export via Gateway

So what's left in a "regular IPFS node" when you strip out these bits? This description sounds just like what this proposal wants but without the notion of being a "regular IPFS node". But that probably comes back to the problems we have of "IPFS node" being something different for everyone! Has import via the gateway been something already on the table? How has that been imagined so far and is there an alternative here to pulling in the Pinning API to achieve this?

Symmetric use of CAR for import and export would certainly be worth exploring as part of this proposal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what's left in a "regular IPFS node" when you strip out these bits?

Integrity guaranteed provided by content addressing (data can be fetched in trustless manner) and ability to use IPLD for advanced data structures.

Has import via the gateway been something already on the table? How has that been imagined so far and is there an alternative here to pulling in the Pinning API to achieve this?

Yes, we are planning to add DAG import/export directly to gateway endpoints (/ipfs/, /ipns/). Longer discussion in
ipfs/in-web-browsers#170 but tldr idea is:

  • Improve the concept of a writable gateway to support DAG import via HTTP PUT /ipfs/{cid}
  • IPNS publishing could be as easy as HTTP PUT /ipns/{libp2p-key}

Copy link
Contributor

@aschmahmann aschmahmann Apr 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many of the environments that we're concerned about are unable to sustain a basic libp2p node that makes a single connection via TCP/WebSockets and really need to just have HTTP?

As mentioned in some of my other comments (https://github.com/protocol/web3-dev-team/pull/100/files#r617675012, https://github.com/protocol/web3-dev-team/pull/100/files#r617641097, https://github.com/protocol/web3-dev-team/pull/100/files#r617641520) we can efficiently use libp2p to transfer IPLD data between two peers as all the transports we support have bidirectional streaming, otherwise we lose efficiency by being unidirectional.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many of the environments that we're concerned about are unable to sustain a basic libp2p node

I think it is less about resources and more about "deployment style", more specifically about preferring stateless-ness where possible.

A libp2p node is an active unit, requiring an actively running process, servicing of periodic protocol chatter, etc.

An HTTP client interface on the other hand is completely and utterly "dumb". You could drive such an "http-only ipfs-node" from a bash script, which is decidedly not possible today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many of the environments that we're concerned about are unable to sustain a basic libp2p node that makes a single connection via TCP/WebSockets and really need to just have HTTP?

Serverless (Lambda), Cloudflare Workers, and mobile devices.

Pretty much all the highest growth application environments have trouble with long running processes and connections and prefer or require a stateless protocol.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have trouble with long running processes and connections and prefer or require a stateless protocol.

Is a long running HTTP upload exempt from this? If not then spinning up a temporary libp2p node shouldn't be very different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP upload is not exempt, we can’t push too much data at once. We’re going to have to break up large files by encoding in the client and doing uploads under 100mb to get around CF Worker limits.

Copy link
Contributor

@aschmahmann aschmahmann May 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloudflare Workers seem to have support for WebSockets https://blog.cloudflare.com/introducing-websockets-in-workers/ so using libp2p shouldn't be a problem there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has an interface for being a websocket service but there’s no client in CF workers.

but the RPC interface is a larger API (not very "thin") and wasn't designed for multi-tenant. There would probably be **more** work involved in getting
the RPC interface to support these users than this proposal.

#### Counterpoints &amp; pre-mortem

While this maps well to where web developers are today, it's not a "pure p2p" approach to solving problems. We're beefing up the ability to rely on large IPFS nodes that end up
being federated rather than fully decentralized.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed approach results in wasted bandwidth (upload from the user and download from the service provider) when some of the data already exists on the service provider.

This pushes developers away from working with modifiable/appendable data structures which is something we have otherwise been encouraging.

Copy link
Contributor Author

@mikeal mikeal Apr 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above regarding “wasted bandwidth.”

We don’t get to determine what data structures people use. NFT developers are already trying to use IPFS and we are not meeting their needs. We may wish they had done something different but we’re past the point of being able to determine their pattern of use.

#### Dependencies/prerequisites

None

#### Impact

*Just* for nft.storage, 7. Overall, a 9 or 10, this should unlock numerous thin clients to be built and will allow developers to leverage a lot more of their existing infrastructure.

#### Confidence

Confidence is a 10 for "nft.storage needs this" but is an 8 or 9 for "this is absolutely the right thin client interface."

## Future opportunities
### New Thin Clients

Rather than thinking of a thin client as the current IPFS API with less code, let’s consider what a smaller overall API profile looks like for this new interface.

The JS library work we’ve been doing for the last year could be leveraged for a much smaller and higher impact JS library built against the Gateway++ interface.

Since all writes that pass through the CAR file interface are transactional we now have availability guarantees for all of the writes.

We can do file encoding in the client (we’re going to have to write this anyway for [nft.storage](http://nft.storage) in order to get over the 100MB Cloudflare Worker limit) using the latest codecs.

Finally, since the protocol is so simple we should expect to see serveral thin clients designed to meet a variety of user needs and integrations.

### Content Routing for Large Providers

Gateways and large providers need to be directly peered since large providers have too much content to provide in the DHT.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we explore if having a provider strategy that only announces file root blocks improve things for big providers?

Most of the data is unixfs, and most of the announced blocks could be skipped. Only file roots matter in practice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’ll help, but we’re throwing incremental improvements as an exponential problem. nft.storage will have too many CIDs to keep in the DHT by the end of the month even with roots only and the improvements Adin made that havent been released.

We’ve come to the same conclusion other large providers like Pinata came to, we can’t support the DHT with this much content.

But this is going to work out because content discovery has always been about more than just the DHT. We should work on a protocol for a federation of large providers to use and continue to improve the DHT for a larger network of more nodes with smaller amounts of content per node.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. I wonder if we could leverage DNS hints here.
We discussed having websites and gateway announce own addrs to enable clients to preconnect and skip DHT step: ipfs/kubo#6516


As a stopgap we’re maintaining a list of peers all large providers and gateways should peer to and encouraging large providers to add themselves to it. This is not a viable long term solution.

We need “BGP for Content Routing,” which is to say that we need a federated protocol for efficient content routing in a network of directly peered large providers.

### GraphQL

It's @gozala’s idea to get GraphQL in the Gateway but there has been GraphQL IPLD stuff happening for about a year now.

If we want to reach a lot of developers quickly we should consider ways that we could expose GraphQL access to IPLD data through the Gateway using GraphQL’s standard HTTP protocol.

The amount of tooling that already exists for GraphQL is quite large so this would allow for a number of high impact integrations.

## Required resources

The first phase of Gateway++ is primarily work in Cluster and Infra, so I'll defer to that team to estimate the resources required. Hector has already made some progress on adding CAR file input to cluster so it would be great if he could continue with this work as well.