Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC: TAP 19 - Add support for Content Addressable Systems like IPFS in TUF #2415

Conversation

shubham4443
Copy link

@shubham4443 shubham4443 commented Jun 27, 2023

Fixes #2325

Description of the changes being introduced by the pull request:
This PR is a part of GSoC'23 Project. It introduces support for content addressable systems like IPFS in TUF. Detailed implementation document can be found here.

In order to test with an actual content addressable system, I have created a sample repository which contains metadatas with an IPFS target. Simply run the following commands to download the target file.

# initialize the client with Trust-On-First-Use
./client --url https://shubham4443.github.io/tuf-ipfs tofu

# Then download example file from the IPFS:
./client --url https://shubham4443.github.io/tuf-ipfs download ipfs:QmSFEbC6Y17cdti7damkjoqESWftkyfSXjdKDQqnf4ECV7

The PR is in draft status. hashes and length fields in targets.json are redundant in case of content addressable systems and needs to be removed.

cc: @adityasaky @Ericson2314 @mnm678

Please verify and check that the pull request fulfills the following
requirements
:

  • The code follows the Code Style Guidelines
  • Tests have been added for the bug fix or new feature
  • Docs have been added for the bug fix or new feature

Signed-off-by: Shubham Nazare <shubham4443@gmail.com>
@trishankatdatadog
Copy link
Member

Quite an elegant implementation: I like it!

@jku
Copy link
Member

jku commented Jun 28, 2023

This looks neat. A couple of quick comments for now (I hope to have time to review this next week):

# initialize the client with Trust-On-First-Use
./client --url https://shubham4443.github.io/python-tuf tofu

# Then download example file from the IPFS:
./client --url https://shubham4443.github.io/python-tuf download ipfs:QmSFEbC6Y17cdti7damkjoqESWftkyfSXjdKDQqnf4ECV7

I think the url should be https://shubham4443.github.io/tuf-ipfs...
With that url the client initialization works here but the download does not:

INFO:tuf.api.metadata:Key 889515d9d623948c081b180b7b48a2a8a269803ca1975fb917af66512f33a76c failed to verify targets

That sounds a bit strange, likely isn't related to your patch as such... Does this repository really work for you (if you wipe your local metadata cache first to ensure you really get the same files)?

I have a question as well: I have never used IPFS and don't really know how it works. What additional software is this code expecting me to run?

    ipfs_gateway_url = 'http://127.0.0.1:8081/ipfs/'
    
    ...
    
    file_url = self.ipfs_gateway_url + self.cid
    response = requests.get(file_url, timeout=5)

@shubham4443
Copy link
Author

This looks neat. A couple of quick comments for now (I hope to have time to review this next week):

# initialize the client with Trust-On-First-Use
./client --url https://shubham4443.github.io/python-tuf tofu

# Then download example file from the IPFS:
./client --url https://shubham4443.github.io/python-tuf download ipfs:QmSFEbC6Y17cdti7damkjoqESWftkyfSXjdKDQqnf4ECV7

I think the url should be https://shubham4443.github.io/tuf-ipfs... With that url the client initialization works here but the download does not:

INFO:tuf.api.metadata:Key 889515d9d623948c081b180b7b48a2a8a269803ca1975fb917af66512f33a76c failed to verify targets

That sounds a bit strange, likely isn't related to your patch as such... Does this repository really work for you (if you wipe your local metadata cache first to ensure you really get the same files)?

I have a question as well: I have never used IPFS and don't really know how it works. What additional software is this code expecting me to run?

    ipfs_gateway_url = 'http://127.0.0.1:8081/ipfs/'
    
    ...
    
    file_url = self.ipfs_gateway_url + self.cid
    response = requests.get(file_url, timeout=5)

Thanks for correcting the url!

The code requires you to download an IPFS daemon which can be found here - https://ipfs.tech/#install. The code uses a private gateway to install the IPFS content. However, public gateways can also be used which do not require anything to be downloaded but they are slow sometimes. How we should utilize these gateways to download files is yet to be discussed between my mentors.

@jku
Copy link
Member

jku commented Jun 28, 2023

ipfs_gateway_url = 'http://127.0.0.1:8081/ipfs/'

I think I got it now, it's expecting that there is an ipfs application running on localhost that serves as a http-ipfs proxy: https://daniel.haxx.se/blog/2022/08/10/ipfs-and-their-gateways/

Maybe this has advantages that I don't see or is something IPFS users know to expect... but I think this is not a great idea for a client library. It's basically an undocumented runtime dependency on a webservice. Even if the URL was not hard coded it feels wrong.

This may be a stupid question and IPFS just does not work like that but ... Is there no self-contained IPFS python module we could depend on?

@shubham4443
Copy link
Author

@jku There seems to be no actively maintained IPFS python module (ref: https://discuss.ipfs.tech/t/why-there-is-no-python-working-library-to-work-with-ipfs/15871). However, I can see some recent activities in https://github.com/ipfs-shipyard/py-ipfs-http-client (see: ipfs-shipyard/py-ipfs-http-client#316)

@jku
Copy link
Member

jku commented Jul 3, 2023

I've taken a closer look now:

  • I still think this is cool, nice work
  • However I have a feeling 95% of this does not have to live within the python-tuf codebase:
    • most importantly I get the feeling that client application developers must be aware that they are using a IPFS enabled TUF repository because of the gateway setup requirement: In other words an existing TUF repository cannot just start using IPFS because their downloading applications would effectively break since no-one has an IPFS gateway running. As a result, it's not an issue to ask downloader application developers to use a IPFS/CAS specific module instead of a generic tuf library
    • enabling a feature like this in a generic tuf library based on purely the targetpath seems wrong (sounds like it would require a spec change), confirming that a generic TUF client application should not enable it without explicit developer choice -- again making the case that a separate module would be fine
    • As the TAP explains, TUF is no longer in control of verifying the artifact integrity. If this is the case, forcing the download to happen through a generic TUF library seems like an abstraction for the sake of abstraction. python-tuf has reasons to manage the http download case: none of those reasons apply here
    • As a technical detail, the adapter is a TargetFile field... but that is not actually necessary: The adapter could be constructed on demand by the download or cache lookup code and not stored in the TargetFile. This way there is no requirement to modify the get_targetinfo() code path at all -- all changes are in download and artifact cache lookup.

My hand wavy suggestion would be to

  • build an independent library or application on top of python-tuf (e.g. python-tuf-ipfs for specifically this use case or python-tuf-cas if you think the abstraction holds for all CAS)
  • You could even derive from tuf.ngclient.Updater if you want to: for get_targetinfo() you'd just use the underlying ngclient implementation and for download_target() and find_cached_target() you would use your own code
  • If there are potential improvements to metadata API (TargetFile most importantly) that would make this work better, let's discuss them

I'm available for a meeting (in EEST office hours) if the above sounds like I'm not making sense or there's a significant disagreement: like I said elsewhere, CAS and IPFS are not something I'm familiar with so I could be making assumptions...

@Ericson2314
Copy link

@jku So on one hand, I think it should be possible to support both HTTP and IPFS downloading, and then the client can choose. Allowing the server side to easily do both (and possibly more content addressable stores) would be

On the other hand, I am very interested in trying to IPFS-ify the metadata itself as a follow-up task, and for that I think further differences from the way things are done today are desirable. E.g. I think explicit/TUF-level snapshots are no longer needed if the root always signs a single Merkle DAG containing everything --- consistency is effectively delegated to the IPFS layer below. More divergences of course mean less to leverage for python-tuf anyways.

@adityasaky
Copy link
Collaborator

@jku I largely agree with your points. We've discussed having this be a standalone application (though we were also considering is -ipfs and others could be plugins for python-tuf). @shubham4443 is updating this implementation based on the discussion in the last meeting to better use IPFS CIDs and include semantic information for each artifact (i.e., foo-1.0.0.tar.zst: <IPFS_CID>), we'll discuss how it'd function as a standalone application after that's done.

enabling a feature like this in a generic tuf library based on purely the targetpath seems wrong (sounds like it would require a spec change), confirming that a generic TUF client application should not enable it without explicit developer choice -- again making the case that a separate module would be fine

Could you elaborate so I'm not misunderstanding it? We do have this change proposed in the TAP for CAS, do you mean this can't be included until the TAP is approved and text merged into the spec? Also, a generic TUF client cannot choose the CAS, it must first be enabled on the server side by updating the metadata. A generic client that does not support the CAS at this point will of course fail.

E.g. I think explicit/TUF-level snapshots are no longer needed if the root always signs a single Merkle DAG containing everything --- consistency is effectively delegated to the IPFS layer below. More divergences of course mean less to leverage for python-tuf anyways.

@Ericson2314 I'm not sure if this is practical, though it depends on "root" in your message. Do you mean we remove the snapshot role and have the timestamp role identify the IPFS root node that contains the current set of all TUF metadata?

@jku
Copy link
Member

jku commented Jul 4, 2023

enabling a feature like this in a generic tuf library based on purely the targetpath seems wrong (sounds like it would require a spec change), confirming that a generic TUF client application should not enable it without explicit developer choice -- again making the case that a separate module would be fine

Could you elaborate so I'm not misunderstanding it? We do have this change proposed in the TAP for CAS, do you mean this can't be included until the TAP is approved and text merged into the spec? Also, a generic TUF client cannot choose the CAS, it must first be enabled on the server side by updating the metadata. A generic client that does not support the CAS at this point will of course fail.

Today a TUF client that sees targetpath "ipfs:abcdef" will use that string to build a URL and will try to download that URL with HTTP. With this PR, the client would not do that and would instead connect to an ipfs gateway on localhost. It looks like a change in functionality that client apps should explicitly enable... Maybe this is not very important in practice but it has a bit of a smell.

The more important point was in the previous paragraph: No existing repository can just start using IPFS targetpaths because the clients would just stop working even if the TUF library had the IPFS feature (since approximately no-one runs an IPFS gateway).

@jku
Copy link
Member

jku commented Jul 4, 2023

@shubham4443 the discussion is a bit removed from your PR, sorry about that. I know this is still marked a draft and I feel a bit bad about drowning the PR with these comments: if you'd rather continue in peace and quiet just say so, we can find another place to have these talks.

On the other hand, I am very interested in trying to IPFS-ify the metadata itself as a follow-up task, and for that I think further differences from the way things are done today are desirable.

Yeah I have seen that mentioned but never expanded on: that would indeed change the equation... but I think that idea is quite far from the current PR and I'm not too keen on letting the existence of that idea affect the decision made here. To be more specific:

  • TUF as a design is based on versioning things which seems like the exact opposite of content addressing so the two implementations likely have entirely different sets of problems to solve... As an example, the way TUF root version updates are implemented seems fairly incompatible with the core concept of CAS
  • a TUF implementation using IPFS-metadata and merkle tree snapshots sounds very cool. I'm not opposed to building one within python-tuf project, but I don't think the resulting client will look a lot like python-tufs ngclient, so the question would be... why modify ngclient instead of just starting a new client?

If this PR should be viewed more as ground work for metadata-over-IPFS then I think I'd like to see at least a brief design doc for that.

@Ericson2314
Copy link

Ericson2314 commented Jul 4, 2023

@jku So the GSOC as formally proposed is just about targets to keep the scope manageable.

I personally am most interested in IPFS metadata (+ targets) and the conceptual refactoring that comes with it --- as I think that's when we cross the threshold from "adding misc features" to "reducing complexity via separating concerns" --- but I don't want to speak for the others.

I wanted to include where thing might go next to give you additional context, but yes the location of this work should probably be decided mostly/entirely based on the scope of the GSOC.

@jku
Copy link
Member

jku commented Sep 6, 2023

Closing since this now lives in https://github.com/theupdateframework/tap19-ipfs-poc

@jku jku closed this Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prototype support for content addressable systems such as IPFS
5 participants