-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-playable Event Graph (REG) storage format specification #8
Comments
Let me try to answer the above from the perspective of the
With regard to requirements, I think the following are important:
I'll skip the implementation details for now, as I am a REG newbie, but I would love to hear from the REG experts what they think about the above. |
You describe parents as "a set of operations that this transaction depends on" — presumably you mean a set of transactions that it depends on? We also need to specify how these transactions are identified; probably easiest to just give each an ID that can be an arbitrary string. The In the edge case where two concurrent transactions insert at the same position, the trace might need to contain an indicator to show which one of the two should come first in the merged result (otherwise it will not be possible to reliably reproduce Are you thinking this REG data format will be only for the purpose of encoding test datasets, benchmarks, etc., or will it also be used for storing end-user documents in production software? If it's the former, I don't think we need a binary encoding, since human-readable verboseness is preferable for this use case. If we want to consider extending the format beyond plain text, there are lots of potential opportunities, but also a lot of complexity. For example, there are several different ways of representing rich text, and it's not immediately obvious to me whether a single representation is sufficient to encode edits for different rich text collaboration algorithms. If you want to represent spreadsheets or vector graphics it gets more complex still. My suggestion would be to focus the REG format on plain text initially, and to discuss extensions to other data types separately once we have some experience with plain text. |
Sounds great - I'd love this too!
Absolutely!
I think what you've suggested is a good start. We'd also need to specify the IDs of each patch - which is a bit trickier because:
It might be possible to use an opaque string and encode your algorithms' ID format into the string in such a way that its always ordered as expected. For example, if your system orders concurrent items first by agent ID then sequence number, the opaque ID could be "AAAAASSSSSSS", storing the agent bytes then sequence number bytes. Doing this could be done in such a way that it preserves lexical ordering. The problem is that it'd make it harder to load an arbitrary log into an arbitrary system. Another approach would be to just have a top level field specifying the type of the IDs for the trace. But again, that'd quite dramatically limit compatibility. @ept:
The format above looks like its based on the concurrent editing trace format I've defined in this repository. In that case, yes - the JSON file itself contains a big list of transactions. The parents field lists the indexes of previous transactions. This is simple, straightforward and I think it should work for any format.
I absolutely agree we want these sort of traces. I have a few traces like this that I use for benchmarking and conformance testing. And a script that lets us reconstruct an editing trace from any file in any git repository. I think it'd be pretty tricky to use an indicator like that in the transaction itself. It could operate as a checksum, but what if the expected ordering doesn't match your local algorithm? Perhaps a better approach would be to just name the algorithm used at the top level when replaying the graph of events - eg Re: rich text, at the risk of eating my words, I think the format is reasonably consistent even if different CRDTs implement rich text somewhat differently. The format I'd expect is a combination of text editing events (insert / delete) and annotations, annotating ranges like But, baby steps. Lets start with a plain text editing format and go from there. I think JSON would be reasonably easy too - though I'm hesitant to support the full range of JSON-patch operations. But we could start with a set operation and object-delete operation. |
Ah yes, that's good.
That works. The merging algorithm can then take into account other information on the transactions when ordering the insertions, e.g. using the
Yes, that would be fine for Peritext.
Yeah, insertion at the boundary of formatting spans is tricky. Many rich text editors I've tried treat different types of annotation differently – for example, adding chars at the end of a bold span makes them bold too, but adding chars at the end of a link places them outside of the link. If there's the end of a link and the end of a bold in the same place, the behaviour varies depending on the app. But anyway, I agree that let's start with plain text, and we can cross the rich text bridge later when we come to it. |
The idea for standardizing is to hopefully use it outside of just testing. One place I have in mind is exporting full history of |
Prior work on an update format for collaborative rich-text documents: https://github.com/matrix-org/collaborative-documents Instead of directly capturing the causal graph, that proposal leaves space for algorithm-specific metadata; I found it hard to picture how that would work with a concrete algorithm. But maybe the document format could serve as inspiration here. |
This is a proposal to formalize and document serialization formats for Re-playable Event Graphs (REGs).
What are REGs? I am not an expert on REGs, but essentially it is a way to represents a sequence of all editing events in a format like this:
The key here is that:
[position: number, remove: number, insert: string]
. (There is no CRDT metadata.)parents: []
, which is a set of operations that this transaction depends on (knows about at the time of creation).What I would like to get from this issue:
cc @josephg @ept @dmonad @mweidner037 @gritzko @archagon @zxch3n
The text was updated successfully, but these errors were encountered: