From 2615c4855b7f2272fc682afc8b69a5bba497a3ff Mon Sep 17 00:00:00 2001 From: Alexander von Gluck IV Date: Mon, 3 May 2021 16:58:32 -0500 Subject: [PATCH] blog: Add IPFS experiment post --- .../kallisti5/2021-05-03_ipfs_experiment.md | 95 +++++++++++++++++++ 1 file changed, 95 insertions(+) create mode 100644 content/blog/kallisti5/2021-05-03_ipfs_experiment.md diff --git a/content/blog/kallisti5/2021-05-03_ipfs_experiment.md b/content/blog/kallisti5/2021-05-03_ipfs_experiment.md new file mode 100644 index 000000000..825812b04 --- /dev/null +++ b/content/blog/kallisti5/2021-05-03_ipfs_experiment.md @@ -0,0 +1,95 @@ ++++ +type = "blog" +title = "Haiku's CDN, an IPFS Experiment" +author = "kallisti5" +date = "2021-05-03 15:50:05-05:00" +tags = ["haiku", "software"] ++++ + +Hello! I'm Alex, a member of our systems administration team and on the Haiku, Inc. board of directors. I've been playing with moving our repositories over to IPFS, and wanted to collect some user feedback. + +## First a little history + +With the addition of package management in 2013, Haiku's amount of data to manage has been growing. + +In ~2018 I moved our Haiku package repositories (and nightly images, and release images) to S3 object storage. This helped to reduce the large amount of data we were lugging around on our core infrastructure, and offloaded it onto an externally mananged service which we could progmatically manage. All of our CI/CD could securely and pragmatically build artifacts into these S3 buckets. We found a great vendor which let us host a lot of data with unlimited egress bandwidth for an amazing price. This worked great through 2021, however the vendor recently began walking back their "unlimited egress bandwidth" position. Last week they shutdown our buckets resulting in a repo + nightly outage of ~24 hours while we negotiated with their support team. + +## The problem + +In these S3 buckets, we host around 1 TiB of data, and have between 2-3 TiB of egress data monthly. We also have around 4 TiB of egress data on our servers. + +***This is almost 8 TiB of bandwidth a month*** + +We have large number of wonderful organizations and individuals offering to mirror our package repositories and artifacts via rsync (the defacto way to mirror large amounts of data in the open source world)... however we have one major issue which historically has prevented us from taking people up on these offers for anything except release images. Haiku's package management kit doesn't have any kind of built-in signature checking of packages. While our CI/CD system **does** sign Haiku nightly images, releases, and repositories with minisig (and haikuports buildmaster could be extended to do the same), our package management tools today perform zero checking of package or repository security. + +This means a malicious actor could add tainted packages to a mirror, regenerate the repository file (which contains checksums of each package), and redistribute "bad things" to every Haiku user using the mirror. + +Is this likely to happen? No. Is it possible? Yes. + +## The solution + +In steps IPFS (InterPlanetary File System). In mid-2020, I (quietly) setup http://ipfs.haiku-os.org as an IPFS mirror of our release images. + +You can access this on any public ipfs gateway... + +* https://ipfs.io/ipns/ipfs.haiku-os.org +* https://cloudflare-ipfs.com/ipns/ipfs.haiku-os.org + +The official description states *"A peer-to-peer hypermedia protocol designed to make the web faster, safer, and more open."*. In a bit more technical words, essentially IPFS is a network of peer-to-peer systems exchanging chunks of data based on a hash of the data within the chunk. (Think BitTorrent, where every seed is also an http gateway, and you're 1/4 way there). A great overview is available on their [website](https://ipfs.io/#how) (which is also hosted on IPFS). + +**Essentially:** + +* We add repositories and artifacts to our "central" IPFS node (A Raspberry Pi 4 on my home network today) +* We update /ipns/hpkg.haiku-os.org (using our private key) to point to the latest "hash" of our repositories +* If you want to help host our repositories and artifacts, you *pin* /ipns/hpkg.haiku-os.org nightly, weekly, etc +* People mirroring our repositories don't need a "static ip", they only need to be able to expose port 4001 to the internet from their instance of IPFS +* Users can access our repositories, artifacts, etc on **any** public IPFS gateway node. + * Gateway nodes "pull in all of the chunks" from all of the users pinning /ipns/hpkg.haiku-os.org when requested, and serve them to users +* Haiku hosts a few dedicated gateway nodes globally. These act as regional gateways to the "closest people hosting artifacts" + +**Out of this we get:** + +* Anyone can mirror Haiku (not just those with large amounts of free bandwidth or static ip addresses) +* We no longer have to worry about country IP blocks + * Russia blocks our Digital Ocean VM's for example. + * Throw up some Russian gateways and have a few folks pin the data to mirror it. +* We get transparent deduplication + * The repo today is ~140 GiB of data, and is ~95 GiB to mirror on disk. +* We get transparent repo signatures + * As long as you trust the gateway node, the data is secure +* Users can just "access data", or mirror everything locally for a "hyper fast" software repository. + +Cloudflare has been an early adopter and offers a free public IPFS gateway (with SSL): [cloudflare-ipfs.com](https://cloudflare-ipfs.com) + +## The downsides + +IPFS *is* a new technology, and there are a lot of pointy bits. + +* Needing everyone to manually "repin" constantly data to mirror the latest repository chunks +* If few people are pinning the latest data, initial lookups can be a bit slow (3-4 minutes) +* IPFS has a steep learning curve for anyone mirroring. It takes time to find out how to do what +* IPFS (the application) does have bugs. I've ran into several. + +## Summary + +I have no idea if this will work. +The idea is great since it fixes "pretty much all" of our content distribution issues. + +* We empower more tech-savy individuals to leverage IPFS locally, while still offering "turn key" access to our software +* We decouple the "large amount of storage" and "large amount of bandwidth" making finding reasonable hosting solutions easier +* We enable getting Haiku's software into restrictive geographic regions +* It has built in signature checking ensuring some level of security +* It has deduplication built-in, saving space + +Time will tell if the implementation is viable and reliable enough. In the short term, our current more traditional repositories +are not going away as long as we can continue to host data from our S3 buckets. I'm hopeful we can get enough +people playing with the new system to reduce S3 bandwidth and give us some time to investigate this alternative path. + +A few people have mentioned adding native IPFS support to pkgman.. this would enable Haiku to obtain updates +directly from a peer-to-peer network. That seems like an awesome potental future. + +## What *you* can do + +* Learn about IFPS +* [Try to pin haiku's repositories at locations that don't have bandwidth caps](https://github.com/haiku/infrastructure/blob/master/docs/ipfs-pinning.md) +* Provide feedback below