Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide APIs for interacting with dats #10

Closed
RangerMauve opened this issue Feb 21, 2018 · 36 comments
Closed

Provide APIs for interacting with dats #10

RangerMauve opened this issue Feb 21, 2018 · 36 comments

Comments

@RangerMauve
Copy link

Having a daemon for easily saving dats locally is useful, but I think people would get even more out of it if the daemon could allow them to actually interact with the dats.

  • Ability to read data from a dat (read file, directory, stats, etc)
  • Ability to listen on changes to a dat
  • Ability to create new dats
  • Ability to modify dats that have been created (add / update / remove)

My end goal is to abstract interaction with the daemon using the same interface as the DatArchive API in the Beaker Browser.

@soyuka
Copy link
Owner

soyuka commented Feb 21, 2018

It wonder if it'd be possible to in fact re-use the DatArchive module. It's not exposed as a module on npm though and it'd be a shame to write it again. Thoughts @pfrazee ?

@RangerMauve
Copy link
Author

The related source code in beaker is the web API for the web-side and Background app API for the non-web side.

@RangerMauve
Copy link
Author

Which all seems to be wrapping over pauls-dat-api

@RangerMauve
Copy link
Author

Maybe the daemon could expose pauls-dat-api using the manifest here over a protobuf RPC

@pfrazee
Copy link

pfrazee commented Feb 21, 2018

There's a somewhat hacked-together module on npm right now, https://github.com/beakerbrowser/node-dat-archive

pauls-dat-api is the core of the logic, if you want to mimic its interface, that's where I'd start. The question I'd ask is, how do you plan to expose the API? Is the idea that dat-daemon could be embedded in a node app and then leverage DatArchive, or are you planning to have apps connect to an active daemon process and leverage DatArchive as a wrapper around RPC?

@soyuka
Copy link
Owner

soyuka commented Feb 21, 2018

Yes but the high-level API is also neat

@pfrazee
Copy link

pfrazee commented Feb 21, 2018

@soyuka at one point I found myself needing the DatArchive interface in Beaker's background process, and I had to re-implement the interface there: https://github.com/beakerbrowser/beaker/blob/beaker-0.8/app/lib/bg/dat-archive.js. Not my proudest but I couldn't think of a better solution. I think I ended up getting rid of the code that required that though.

@soyuka
Copy link
Owner

soyuka commented Feb 21, 2018

are you planning to have apps connect to an active daemon process and leverage DatArchive as a wrapper around RPC?

This. With the help of websockets for example, it'd allow to build dat-aware extensions or whatever. IMO the dat command line utility should also be decoupled through a daemon (using websockets or tcp).

Though, my first intent was to be able to have a daemon that shares dat on my own server and/or 24/24 low-power computer at home. As we're seeing more interest/ideas that could wrap arround an RPC I'm all ears :).

About the last message, thanks might be useful.

See also initial thoughts here: dat-ecosystem-archive/dat-desktop#434

@pfrazee
Copy link

pfrazee commented Feb 21, 2018

Yeah an RPC-able DatArchive daemon/backend has been on my mind. Should be useful.

@martinheidegger
Copy link

Well, Hyperdrive uses a random-access-... storage approach to collect data. random-access-datdaemon could allow transparent communication to the dat's managed by dat-daemon?

@soyuka
Copy link
Owner

soyuka commented Feb 23, 2018

IMO using hyperdrive or random-access is too low level for this library which should try to stick with dat-node and high-level stuff.

@pfrazee
Copy link

pfrazee commented Feb 23, 2018

@martinheidegger Yeah that's an interesting idea, Id ask: 1) would the low-level semantics make it difficult to add things like permissions, if ever needed? 2) would that access level be inefficient because it puts messaging at the wrong layer? Point 2 would concern me because of how hyperdrive does lookups (against the metadata log).

@RangerMauve
Copy link
Author

@pfrazee From the work the IPFS community has been doing on their daemon, access restrictions are being done within the browser extension.

The IPFS-companion extension has full access to the IPFS daemon, and then it provides the web with an IPFS global that has access restrictions scoped to the origin being served (kinda like how browser permissions work for webrtc and the such already).

@RangerMauve
Copy link
Author

So the daemon would allow anything running locally to connect to it, and higher-level applications that require access restrictions based on some sort of criteria can then do so.

@RangerMauve
Copy link
Author

There's already a library, node-dat-archive which implements the same API as DatArchive in the beaker browser.

What about providing an RPC API that looks like this?

@soyuka
Copy link
Owner

soyuka commented Mar 2, 2018

What about providing an RPC API that looks like this?

Totally! I've some pending work on this but there are some edge cases to consider when trying to translate this to an RPC API.

I'm going to document my progress on the protocol this weekend. One thing that bother's me with the DatArchive API are the readFile/writeFile things. I'd prefer if the RPC was only working with streams when writing/reading. I have a proof of concept with streams and it works really great.

Also, if I want to continue to use protocol buffers, I need to structure everything. For example say we want to readdir, how is the response? Do we allow some kind of "generic" data in the response protocol or do we "type" everything?
When talking about streams there's also the same problem. If we stick with the following statement:

The daemon receives an "Instruction" and sends an "Answer"

How would streaming work? I don't really want to wrap binary data with metadata. Therefore I've come up with file access "endpoints" (#11 (comment)). I'm still not sure that this is the best solution but it allows to:

  1. keep "instruction" "answer" thing for standard instructions
  2. having a streaming fs interface to read/write

Anyway I'm going to first document the RPC protocol before I implement it :).

@RangerMauve
Copy link
Author

Good point about readFile/Write file. It's a higher level API to make it easier to work with so it makes sense to do something with more control for the protocol.

Do we allow some kind of "generic" data in the response protocol or do we "type" everything?

From what I've seen, people will set up multiple message types for the responses and requests. That way if you know you're sending a "ReadDirmessage, you're going to expect aDirResults` in the response.
If you allow for "generic" data, you might as well just go with JSON or something else more dynamic since you're losing a lot of the benefit of protobuf structured messages.

What about having using websockets but instead of having the "endpoints" in the URL, having that information in the first protobuf message sent to the daemon?

@RangerMauve
Copy link
Author

What about using pauls-electron-rpc as a basis? It's what beaker already uses for RPC between the browser window and the node process for the DatArchive API

@soyuka
Copy link
Owner

soyuka commented Mar 2, 2018

If you allow for "generic" data, you might as well just go with JSON or something else more dynamic since you're losing a lot of the benefit of protobuf structured messages.

Exactly! I'm going to stick with protobuf.

What about having using websockets but instead of having the "endpoints" in the URL, having that information in the first protobuf message sent to the daemon?

Yes I thought about this as well but it's not that easy. Say it works like that:

  1. [client] => I want to write a File
  2. [daemon] => ok I'm ready (and now only accepting a stream)
  3. [client] => sends stream

Now, how do we know that the stream ends? Is the client sending an End instruction? But what if the daemon only accepts streams without checking for more instruction?

I've prototyped something that works like this but I it's less developer-friendly and more complicated to handle then just to open a stream when you want to write/read.
Translated to javascript it's ~10 lines of client code vs only 1 with streams for example.

Oh and this is client => daemon writing, reading is the same but complexity will be on the client side.

Also with dedicated stream channels, you know that you'll only get data and no protobuf there, it's kinda good separation of concerns imo.

What about using pauls-electron-rpc as a basis? It's what beaker already uses for RPC between the browser window and the node process for the DatArchive API

Because I'm not fond of the abstraction proposed in this module. I dunno why. I'm still trying to re-use the more code I can from paul's dat api!

@RangerMauve
Copy link
Author

RangerMauve commented Mar 2, 2018

Now, how do we know that the stream ends

I assumed you were using one connection per request, so the socket closing would be enough.
Alternately one could have a content length in the first message.
Another option is to use something like multistream from IPFS which allows you to multiplex several streams over one along with all the stream lifecycle events.

@RangerMauve
Copy link
Author

The main worry I have with putting information in the URL is that it doesn't scale to arbitrary streams, so it won't work for unix sockets or TCP sockets (or arbitrary streams)

@soyuka
Copy link
Owner

soyuka commented Mar 2, 2018

I assumed you were using one connection per request, so the socket closing would be enough.

Opening a connection for each request looks weird, usually you keep the connection open and send a bunch of requests no?

The main worry I have with putting information in the URL is that it doesn't scale to arbitrary streams, so it won't work for unix sockets or TCP sockets (or arbitrary streams)

My though as well. Though, I could still implement simple readFile, writeFile that transfers the whole data in one call as a fallback. Or we could just use websockets everywhere :D.

@soyuka
Copy link
Owner

soyuka commented Mar 2, 2018

I think that we also have to keep in mind that this daemon is only used to interact with local dats and will not be available from the outside. Low interest in tcp/unix sockets as a websocket also works well.

@RangerMauve
Copy link
Author

If you're not opening a socket per request, how are you specifying the data you want in the URL? Is the URL just the dat archive ID?
Do you have any code I could read for this yet? I think I'm misunderstanding and confusing myself. :P

@soyuka
Copy link
Owner

soyuka commented Mar 2, 2018

Haha sorry this is my bad.
You're indeed having one socket per stream or live data.
On the other hand you should have one and only one socket opened to interact with the daemon (add, list, remove, readdir etc.)

For example (say it's a client interface in JS):

var client = new Client() // this keeps an open connection, now that I think about it maybe it doesn't need to keep it open but it definitely could

var answer = await client.send({action: LIST})
// answer is a list of dats

var write = client.createWriteStream('key/path/bar') // creates a new connection
write.write('foo')
write.end() // which closes here

var readdir = await client.send({action: READDIR, path: 'path'})

assert(readdir[0] === 'bar')

var statistics = client.createFileActivityStream() or createNetworkActivityStream() // these are also opening new connections

statistics.on('data', function(stats) {
  //do something
})

@pfrazee
Copy link

pfrazee commented Mar 2, 2018

FYI the only reason the DatArchive interface doesn't have streams yet is because I'm waiting to see if browser stream APIs stabilize more

@RangerMauve
Copy link
Author

What about targeting async iterators instead of actual streams for the DatArchive API?
Plus, with Firefox and Chrome both supporting WHATWG streams I doubt they'll be doing massive changes there.

@pfrazee
Copy link

pfrazee commented Mar 2, 2018

@RangerMauve If WHATWG streams have taken the lead then I'll probably use them. Based on https://streams.spec.whatwg.org/#example-manual-read it looks like you can do async iterators in that spec. (https://jakearchibald.com/2017/async-iterators-and-generators/ seems to confirm that)

@martinheidegger
Copy link

I have given this a little more thought and to me it seems that the best API of interacting with dats is the same API that dats use to sync each-other. Once dat-daemon returns the list of DATs. the web/desktop app could simply open the dats "sparse".

const client = new Client()
const Dat = require('dat-node')
client.watch({ action: LIST_DATS }, (datsDiff) => {
  removeDats(datsDiff.removed)
  addDats(datsDiff.added)
})

const dats = {}

function addDats (datKeys) {
   datKeys.forEach(datKey => {
      dats[datKey] = new Dat(new HyperDrive(ram), datKey, {sparse: true, sparseMetadata: true})
   })
}

// etc.

The new data could now show what files there are, what the latest version is, what versions exist, etc. all given by the Dat API. Now: going through the bittorrent stuff for a local up talking to a local service is definitely overkill, but lucky for us Dat doesn't really specify which transport protocol is supposed to be used. It could simply run over a tcp port that its automatically connected to.

@RangerMauve
Copy link
Author

Are there docs on what the protocol can do?

@martinheidegger
Copy link

@RangerMauve What do you mean? It contains all the information on everything about a dat. https://github.com/datproject/docs/blob/master/papers/dat-paper.pdf

@soyuka
Copy link
Owner

soyuka commented Apr 17, 2018

I'm working on a high-level API that'll have the same api as the DatArchive from beaker backed up by the daemon.
https://github.com/soyuka/dat-daemon/tree/master/packages/client (more work tbd) be closing this.
Daemon rfc: https://github.com/soyuka/dat-daemon/blob/master/rfc.md

@soyuka soyuka closed this as completed Apr 17, 2018
@RangerMauve
Copy link
Author

That's awesome. Have you looked into the approach I did that accomplishes a similar goal?

Instead of having more API in the gateway, I had the gateway provide a replication stream through websockets and create a hyperdrive instance client-side.

https://github.com/RangerMauve/dat-archive-web

@soyuka
Copy link
Owner

soyuka commented Apr 17, 2018

Oh nice ! You should've pinged me there ! I'm going to take a look at it !

@RangerMauve
Copy link
Author

Sorry, I got really engrossed in the devlopment and didn't think to. :D

You should check out this issue, there's gonna be a video call between some people interested in this stuff.

@soyuka
Copy link
Owner

soyuka commented Apr 17, 2018

Oh real nice thanks for this one :D.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants