Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 datastore support #1261

Closed
wants to merge 47 commits into from
Closed

S3 datastore support #1261

wants to merge 47 commits into from

Conversation

tv42
Copy link
Contributor

@tv42 tv42 commented May 20, 2015

For your reviewing pleasure, do not merge yet. Sorry for the noise in the pull request, this is on top of the pin branch, start reading from the most recent commit with "pin:" in the commit message. Once pinning is fully merged, I'll rebase this and ask for a merge, but for now living on top of that branch causes less noise in the datastore.

Note that S3 objects names are hex, because datastore keys can be binary. So that means S3 objects like 2f626c6f636b732f122014d8c7e2382fe98c052b96b7a86daa9b176fde6324cbfdd37314617dc1058a7d.

@jbenet jbenet added the status/in-progress In progress label May 20, 2015
@whyrusleeping
Copy link
Member

@jbenet dont you dare git push each this one, i want to run other tests this month.

@tv42 tv42 mentioned this pull request May 22, 2015
52 tasks
@whyrusleeping
Copy link
Member

@tv42 do you have any thoughts on query for the s3datastore? thats gonna be the hard part i think.

Also, how hard would it be to run a local 's3' server for testing this code?

Most of this looks good so far.

@tv42
Copy link
Contributor Author

tv42 commented May 22, 2015

@whyrusleeping I don't expect S3 to bring any special difficulties to Query. I do want to whack the Query API with a mallet; whether that happens as part of the continuation of this or not depends on whether I come up with a good enough replacement API soon enough. If not, s3 Query will look a lot like flatfs Query, except it could support more (unused) features.

I'm not aware of any "tiny s3 clones" worth recommending, and didn't want to start writing one on a whim. Making a true S3 clone is ridiculously hard (been there; https://github.com/ceph/s3-tests/ ), though this code uses a small enough of a fraction of features that httptest.Server might be plenty enough.

Also, remember that it's currently only 70 lines of code and 83 lines to configure that (most of that validating the JSON, for better error reporting), and maybe 60-100 LoC for a simple Query implementation. And it's pretty much all very simple code.

Perhaps some of the sharness stuff could be run in a special mode, where they talk to actual AWS S3?

@whyrusleeping
Copy link
Member

Perhaps some of the sharness stuff could be run in a special mode, where they talk to actual AWS S3?

Maybe... that's probably okay to do if we make sure they only run when a given env var (or set of env vars) is set

@whyrusleeping whyrusleeping added codereview and removed status/in-progress In progress labels May 31, 2015
// All values of DSOpener must have a JSON representation that is
// semantically equivalent to the Datastore section of the config
// file that they were instantiated by, including the "Type" key.
type DSOpener interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea 👍

@jbenet
Copy link
Member

jbenet commented Jun 7, 2015

@tv42 this PR is great. +1 everything LGTM. sorry for the huge delay on my end.

I do want to whack the Query API with a mallet; whether that happens as part of the continuation of this or not depends on whether I come up with a good enough replacement API soon enough. If not, s3 Query will look a lot like flatfs Query, except it could support more (unused) features.

Either sounds good to me.

though this code uses a small enough of a fraction of features that httptest.Server might be plenty enough.

Yeah i think this works for normal tests. And, there's many S3 mock things out there that already implement the exact API for us (not that it's complicated...)

Perhaps some of the sharness stuff could be run in a special mode, where they talk to actual AWS S3?

Maybe... that's probably okay to do if we make sure they only run when a given env var (or set of env vars) is set

Yeah, we can have a special set of sharness tests that can be run against S3. I wouldn't want to put this on CI for all PRs. (could make a simple side-repo with the right config that pulls go-ipfs and tests on S3, and make sure to pull to this repo before merging to stable. would need a way to make credentials private or whatever)


@whyrusleeping

@jbenet dont you dare git push each this one, i want to run other tests this month.

@GitCop
Copy link

GitCop commented Jul 15, 2015

There were the following issues with your Pull Request

  • Commit: ef34768
    • Invalid signoff. Commit message must end with License: MIT
      Signed-off-by: .* <.*>

Guidelines are available to help. Your feedback on GitCop is welcome on this issue


This message was auto-generated by https://gitcop.com

@GitCop
Copy link

GitCop commented Jul 15, 2015

There were the following issues with your Pull Request

  • Commit: ef34768
    • Invalid signoff. Commit message must end with License: MIT
      Signed-off-by: .* <.*>

Guidelines are available to help. Your feedback on GitCop is welcome on this issue


This message was auto-generated by https://gitcop.com

@tv42
Copy link
Contributor Author

tv42 commented Jul 15, 2015

Gitcop is wrong (ipfs/infra#46), and my commits have been amended to have the right trailer.

Unit tests pass, sharness passes.

This is on top of dev0.4.0 (with a fix for a random bug I found in dev0.4.0) and ready to roll! Paging @jbenet @whyrusleeping

@tv42 tv42 changed the title S3 datastore support (for review, don't merge yet) S3 datastore support Jul 15, 2015
@tv42
Copy link
Contributor Author

tv42 commented Jul 15, 2015

I have no idea about those circleci & travis errors in https://circleci.com/gh/ipfs/go-ipfs/640 https://travis-ci.org/ipfs/go-ipfs/jobs/71014668 but none of that happens locally and none of it seems to be related to these changes.

tv42 added 11 commits July 14, 2015 21:40
This used to lead to large refcount numbers, causing Flush to create a
lot of IPFS objects, and merkledag to consume tens of gigabytes of
RAM.

License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
OS X sed is documented as "-i SUFFIX", GNU sed as "-iSUFFIX". The one
consistent case seems to be "-iSUFFIX", where suffix cannot empty (or
OS X will parse the next argument as the suffix).

This used to leave around files named `refsout=` on Linux, and was
just confusing.

License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
These secondary copies were never actually queried, and didn't contain
the indirect refcounts so they couldn't become the authoritative
source anyway as is. New goal is to move pinning into IPFS objects.

A migration will be needed to remove the old data from the datastore.
This can happen at any time after this commit.

License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
Pinner had method GetManual that returned a ManualPinner, so every
Pinner had to implement ManualPinner anyway.

License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
Platform-dependent behavior is not nice, and negative refcounts are
not very useful.

License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
License: MIT
Signed-off-by: Jeromy <jeromyj@gmail.com>
@GitCop
Copy link

GitCop commented Jul 15, 2015

There were the following issues with your Pull Request

  • Commit: 82afdb1
    • Invalid signoff. Commit message must end with License: MIT
      Signed-off-by: .* <.*>

Guidelines are available to help. Your feedback on GitCop is welcome on this issue


This message was auto-generated by https://gitcop.com

@tv42
Copy link
Contributor Author

tv42 commented Jul 15, 2015

Rebased again to get on top of latest dev0.4.0.

tv42 added 13 commits July 15, 2015 08:36
The generated file went through some changes because of differing
go-bindata versions.

License: MIT
Signed-off-by: Tommi Virtanen <tv@eagain.net>
License: MIT
Signed-off-by: Tommi Virtanen <tv@eagain.net>
License: MIT
Signed-off-by: Tommi Virtanen <tv@eagain.net>
License: MIT
Signed-off-by: Tommi Virtanen <tv@eagain.net>
License: MIT
Signed-off-by: Tommi Virtanen <tv@eagain.net>
This gives us a clean slate for the new code, avoiding leftovers.

License: MIT
Signed-off-by: Tommi Virtanen <tv@eagain.net>
Earlier, it also checked checked the leveldb directory. That part
added no crash safety to the application, and just hardcoded
assumptions about the datastore.

If anything, this should rely on the absolute last item created by
fsrepo.Init, and there should be fsync guarantees about ordering.

License: MIT
Signed-off-by: Tommi Virtanen <tv@eagain.net>
License: MIT
Signed-off-by: Tommi Virtanen <tv@eagain.net>
License: MIT
Signed-off-by: Tommi Virtanen <tv@eagain.net>
…m for S3

License: MIT
Signed-off-by: Tommi Virtanen <tv@eagain.net>
…"private"

License: MIT
Signed-off-by: Tommi Virtanen <tv@eagain.net>
To test it, set up an S3 bucket (in an AWS region that is not US
Standard, for read-after-write consistency), run `ipfs init`, then
edit `~/.ipfs/config` to say

      "Datastore": {
        "Type": "s3",
        "Region": "us-west-1",
        "Bucket": "mahbukkit",
        "ACL": "private"
      },

with the right values. Set `AWS_ACCESS_KEY_ID` and
`AWS_SECRET_ACCESS_KEY` in the environment and you should be able to
run `ipfs add` and `ipfs cat` and see the bucket be populated.

No automated tests exist, unfortunately. S3 is thorny to simulate.

License: MIT
Signed-off-by: Tommi Virtanen <tv@eagain.net>
License: MIT
Signed-off-by: Tommi Virtanen <tv@eagain.net>
@GitCop
Copy link

GitCop commented Jul 15, 2015

There were the following issues with your Pull Request

  • Commit: 82afdb1
    • Invalid signoff. Commit message must end with License: MIT
      Signed-off-by: .* <.*>

Guidelines are available to help. Your feedback on GitCop is welcome on this issue


This message was auto-generated by https://gitcop.com

@tv42 tv42 mentioned this pull request Jul 16, 2015
@tv42
Copy link
Contributor Author

tv42 commented Jul 16, 2015

Closing in favor of #1488 because that one targets dev0.4.0. This one is from before the dev branch existed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants