[WIP] Badger datastore #4007

magik6k · 2017-06-24T22:16:53Z

Depends on:

more configurable datastore configs #3575 (more configurable datastore configs)
optionally Implement --profile option in ipfs init #4001 (--profile option in ipfs init)

TODO:

Put badger/badger-ds into gx

whyrusleeping · 2017-06-26T23:59:39Z

repo/fsrepo/datastores.go

+		p = filepath.Join(r.path, p)
+	}
+
+	os.MkdirAll(p, 0755) //TODO: find better way


i would check errors here

This was unpushed for some reason.

Kubuxu · 2017-06-27T07:17:15Z

Please rebase.

whyrusleeping · 2017-06-28T19:22:58Z

some initial numbers, adding all of ipfs dists (~2GB)

badgerds: 19.88 seconds, 9% cpu
flatfs(sync): 109.46 seconds, 2% cpu
flatfs(nosync): 11.91 seconds, 12% cpu

So badger is significantly faster than flatfs, and comparable to flatfs-nosync, while still actually syncing data to disk. This is really great stuff :)

whyrusleeping · 2017-06-28T19:25:54Z

downside is that querying the datastore (ipfs repo gc ipfs refs local etc) are much slower.

whyrusleeping · 2017-06-29T17:49:59Z

Also, for context:

sha256 hashing all the files from the above experiment took 5.43 seconds
and copying them all to another directory (same disk) took 0.80 seconds

Kubuxu · 2017-06-29T19:06:30Z

It seems for me like badger isn't syncing.

Kubuxu · 2017-06-29T19:07:00Z

Can you try running the sync test and doing manual sync latter.

License: MIT Signed-off-by: Łukasz Magiera <magik6k@gmail.com>

License: MIT Signed-off-by: Jeromy <jeromyj@gmail.com>

whyrusleeping · 2017-09-05T02:50:27Z

Some more benchmark numbers, adding ipfs dists again:

sha256sum everything: 0:05.62
add with badgerds: 0:17.06
add with flatfs: 0:34.58
filestore add w/badger: 0:13.14
filestore add w/flatfs: 0:18.14
add badger+blake2b: 0:11.92
add flatfs+blake2b: 0:33.25

License: MIT Signed-off-by: Jeromy <jeromyj@gmail.com>

whyrusleeping · 2017-09-05T06:25:58Z

This looks like its ready to go. Just probably wanting some code review, cc @magik6k @Stebalien @kevina @Kubuxu

Also, thank you to @manishrjain and the badger crew for implementing the error handling and pushing more improvements to this. Its looking all around better than our current filesystem based blockstore.

manishrjain · 2017-09-05T06:37:34Z

Glad that it's working for you guys!

I looked at the Badger code usage, you are not setting SyncWrites=true. Did you set it during the benchmarks that you run above? (By default, we set it to false for better performance.)

Also, can you ensure that you set GOMAXPROCS=128 -- a value large enough so that if you're running this on SSD, you'll be able to see the max IOPS allowed by the SSD. This is useful for key-value iteration and random Get throughput.

Also, Badger writes work best if you do batch + goroutines. You can also now do BatchSetAsync, without the goroutines which would give you good performance (and give back a callback when the write is done).

We're also working on mmap of value log, which would significantly improve the random Get latency.

whyrusleeping · 2017-09-05T06:41:20Z

@manishrjain ah, no. We arent setting SyncWrites to true. We should proably do that by default, and then add that to our configuration blob. Why the need for so many threads? That seems odd to me.

Our calling code probably isnt taking advantage of the parallelism of the batch put as much as we should. I can look at tweaking that some and see how it affects the performance.

kevina · 2017-09-05T06:42:19Z

repo/config/profile.go

+			"path": "badgerds",
+		}
+		return nil
+	},
 }


I may be missing something, but won't this config bypass the measure datastore? If so will this create a problem?

IIRC there is global measure in fsrepo

manishrjain · 2017-09-05T08:04:50Z

Why the need for so many threads? That seems odd to me.

It's got to do with the fact that disk reads block OS threads, and how many threads can be scheduling disk reads at the same time. Full discussion here:

https://groups.google.com/forum/#!topic/golang-nuts/jPb_h3TvlKE

We set Dgraph by default to 128 threads. It doesn't add that much overhead, so it's a safe change to do.

Consider either using BatchSetAsync, or calling BatchSet from multiple goroutines. Doing a single Set would always be way slower, because every write call to Badger has to hit the disk. So, the more calls we can batch, the more disk cost can be amortized, and a much higher throughput achieved.

For values below 100 bytes, we have seen 400K key-value writes per second on the cheapest i3 instance (with local SSD).

P.S. Sent a PR to go-ipfs-badger. Small changes to how Badger is accessed.

magik6k · 2017-09-05T11:23:02Z

We might want to put this in experimental-features doc with some info on how to use this.

Stebalien · 2017-09-05T17:46:26Z

We can also increase GOMAXPROCS from within go (runtime.GOMAXPROCS).

License: MIT Signed-off-by: Jeromy <jeromyj@gmail.com>

whyrusleeping · 2017-09-06T21:57:26Z

I think this is good to go now. Thoughts? @magik6k @Stebalien @kevina ?

magik6k · 2017-09-06T22:55:40Z

LGTM, though badger-ds could be updated to 0.2.1 (see https://github.com/ipfs/go-ds-badger/commits/master)

kevina · 2017-09-06T23:24:06Z

This LGTM but I have not been following the Badger datastore discussion so I am not really qualified to review this.

Stebalien · 2017-09-06T23:37:54Z

repo/fsrepo/datastores.go

+
+	badgerds "gx/ipfs/QmNWbaGdPCA3anCcvh4jm3VAahAbmmAsU58sp8Ti4KTJkL/go-ds-badger"
+	levelds "gx/ipfs/QmPdvXuXWAR6gtxxqZw42RtSADMwz4ijVmYHGS542b6cMz/go-ds-leveldb"
+	badger "gx/ipfs/QmQL7yJ4iWQdeAH9WvgJ4XYHS6m5DqL853Cck5SaUb8MAw/badger"


This isn't declared in package.json. However, if you update to go-ds-badger 0.2.1, you can use the re-exported badgerds.DefaultOptions and badgerds.Options and avoid this import altogether.

hoffmabc · 2017-09-07T00:27:27Z

Is there going to be a guide to show how to use this?

whyrusleeping · 2017-09-07T00:37:06Z

@Stebalien Could you handle updating the badger-ds dependency?

@hoffmabc Yes, We will add docs around this. But the simple of case of making a new ipfs node that uses this is just ipfs init --profile=badgerds. I'm curious though, What sort of structure would you expect docs for this to be in? Its generally pretty easy for us to write "how to use a thing", but its hard to know what users would expect.

(and use the re-exported options instead of importing badger directly) License: MIT Signed-off-by: Steven Allen <steven@stebalien.com>

Stebalien · 2017-09-07T01:05:30Z

@whyrusleeping done.

hoffmabc · 2017-09-07T01:21:30Z

I'm not sure @whyrusleeping to be honest. That seems pretty straightforward for using this.

hoffmabc · 2017-09-07T01:33:15Z

Do any tests need to be written for this out of curiosity?

whyrusleeping · 2017-09-07T01:49:06Z

We have tests on the data store in isolation, but you're right. It would be good to have some level of integration testing.

…

On Wed, Sep 6, 2017, 9:33 PM Brian Hoffman ***@***.***> wrote: Do any tests need to be written for this out of curiosity? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4007 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABL4HC9t7qHFm-u4sOhXbZ8ZQv78cVBuks5sf0fegaJpZM4OEdGB> .

hoffmabc · 2017-09-07T01:49:58Z

Also we initialize ipfs using a config file where leveldb is the only option. Will this be extended to support this datastore?

EDIT: I see the pre-requisite now above.

magik6k · 2017-09-07T09:45:17Z

For tests: we can/should try running whole sharness over this datastore. To not make it take forever to run on travis/circle it could be done as jenkins pipeline stage which would pass additional profiles to ipfs/iptb init in sharness.

magik6k · 2017-09-07T12:50:59Z

Datastore section in docs/config.md needs correcting too.

hoffmabc · 2017-09-08T16:43:22Z

is this getting merged today?

whyrusleeping · 2017-09-08T19:53:43Z

@hoffmabc Yeap! :)

Kubuxu added the status/in-progress In progress label Jun 24, 2017

magik6k changed the base branch from feat/datastore-configs to master June 24, 2017 22:17

whyrusleeping reviewed Jun 26, 2017

View reviewed changes

magik6k force-pushed the feat/badger-ds branch from bb34d58 to 8525be5 Compare June 27, 2017 00:01

magik6k force-pushed the feat/badger-ds branch 5 times, most recently from 0f20573 to fb920fc Compare July 14, 2017 16:06

magik6k force-pushed the feat/badger-ds branch from fb920fc to 3fafeec Compare July 19, 2017 19:27

magik6k added 4 commits September 4, 2017 10:18

Initial badger ds support

fe42bf9

License: MIT Signed-off-by: Łukasz Magiera <magik6k@gmail.com>

badgerds: update datastore hooks

bc27b9d

License: MIT Signed-off-by: Łukasz Magiera <magik6k@gmail.com>

badgerds: use go-ds-badger from gx

c79bccf

License: MIT Signed-off-by: Łukasz Magiera <magik6k@gmail.com>

badgerds: implement DiskSpec

c310e3d

License: MIT Signed-off-by: Łukasz Magiera <magik6k@gmail.com>

whyrusleeping force-pushed the feat/badger-ds branch from 3fafeec to c310e3d Compare September 4, 2017 17:38

whyrusleeping added 2 commits September 4, 2017 12:33

add badger init profile

0004008

License: MIT Signed-off-by: Jeromy <jeromyj@gmail.com>

update badger dep

167f5c7

License: MIT Signed-off-by: Jeromy <jeromyj@gmail.com>

whyrusleeping added 2 commits September 4, 2017 19:59

fixup package.json

f200445

License: MIT Signed-off-by: Jeromy <jeromyj@gmail.com>

add godoc

152a9fa

License: MIT Signed-off-by: Jeromy <jeromyj@gmail.com>

whyrusleeping added this to the Ipfs 0.4.11 milestone Sep 5, 2017

kevina reviewed Sep 5, 2017

View reviewed changes

whyrusleeping added 2 commits September 5, 2017 21:24

add measure layer to badgerds profile defaults

526a0ac

License: MIT Signed-off-by: Jeromy <jeromyj@gmail.com>

add option to set syncWrites to badger

a9b8b90

License: MIT Signed-off-by: Jeromy <jeromyj@gmail.com>

Stebalien requested changes Sep 6, 2017

View reviewed changes

update go-ds-badger

80757f3

(and use the re-exported options instead of importing badger directly) License: MIT Signed-off-by: Steven Allen <steven@stebalien.com>

Stebalien approved these changes Sep 7, 2017

View reviewed changes

whyrusleeping merged commit 71d72e2 into master Sep 8, 2017

whyrusleeping deleted the feat/badger-ds branch September 8, 2017 19:53

leerspace mentioned this pull request Sep 17, 2017

ipfs init --profile=badgerds, mmap value log error #4242

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Badger datastore #4007

[WIP] Badger datastore #4007

magik6k commented Jun 24, 2017 •

edited by whyrusleeping

Loading

whyrusleeping Jun 26, 2017

magik6k Jun 27, 2017

Kubuxu commented Jun 27, 2017

whyrusleeping commented Jun 28, 2017

whyrusleeping commented Jun 28, 2017

whyrusleeping commented Jun 29, 2017

Kubuxu commented Jun 29, 2017

Kubuxu commented Jun 29, 2017

whyrusleeping commented Sep 5, 2017

whyrusleeping commented Sep 5, 2017

manishrjain commented Sep 5, 2017

whyrusleeping commented Sep 5, 2017

kevina Sep 5, 2017

magik6k Sep 5, 2017

manishrjain commented Sep 5, 2017 •

edited

Loading

magik6k commented Sep 5, 2017

Stebalien commented Sep 5, 2017

whyrusleeping commented Sep 6, 2017

magik6k commented Sep 6, 2017

kevina commented Sep 6, 2017

Stebalien Sep 6, 2017

hoffmabc commented Sep 7, 2017

whyrusleeping commented Sep 7, 2017

Stebalien commented Sep 7, 2017

hoffmabc commented Sep 7, 2017

hoffmabc commented Sep 7, 2017

whyrusleeping commented Sep 7, 2017 via email

hoffmabc commented Sep 7, 2017 •

edited

Loading

magik6k commented Sep 7, 2017

magik6k commented Sep 7, 2017

hoffmabc commented Sep 8, 2017

whyrusleeping commented Sep 8, 2017

[WIP] Badger datastore #4007

[WIP] Badger datastore #4007

Conversation

magik6k commented Jun 24, 2017 • edited by whyrusleeping Loading

whyrusleeping Jun 26, 2017

Choose a reason for hiding this comment

magik6k Jun 27, 2017

Choose a reason for hiding this comment

Kubuxu commented Jun 27, 2017

whyrusleeping commented Jun 28, 2017

whyrusleeping commented Jun 28, 2017

whyrusleeping commented Jun 29, 2017

Kubuxu commented Jun 29, 2017

Kubuxu commented Jun 29, 2017

whyrusleeping commented Sep 5, 2017

whyrusleeping commented Sep 5, 2017

manishrjain commented Sep 5, 2017

whyrusleeping commented Sep 5, 2017

kevina Sep 5, 2017

Choose a reason for hiding this comment

magik6k Sep 5, 2017

Choose a reason for hiding this comment

manishrjain commented Sep 5, 2017 • edited Loading

magik6k commented Sep 5, 2017

Stebalien commented Sep 5, 2017

whyrusleeping commented Sep 6, 2017

magik6k commented Sep 6, 2017

kevina commented Sep 6, 2017

Stebalien Sep 6, 2017

Choose a reason for hiding this comment

hoffmabc commented Sep 7, 2017

whyrusleeping commented Sep 7, 2017

Stebalien commented Sep 7, 2017

hoffmabc commented Sep 7, 2017

hoffmabc commented Sep 7, 2017

whyrusleeping commented Sep 7, 2017 via email

hoffmabc commented Sep 7, 2017 • edited Loading

magik6k commented Sep 7, 2017

magik6k commented Sep 7, 2017

hoffmabc commented Sep 8, 2017

whyrusleeping commented Sep 8, 2017

magik6k commented Jun 24, 2017 •

edited by whyrusleeping

Loading

manishrjain commented Sep 5, 2017 •

edited

Loading

hoffmabc commented Sep 7, 2017 •

edited

Loading