Skip to content

Commit

Permalink
Data Builder Record Abstraction (#5)
Browse files Browse the repository at this point in the history
* RFC for introducing an abstraction layer to the databuilder records

Signed-off-by: Andrew <andrjc4@vt.edu>

* Update templates/000-databuilder-record-abstraction.md

Co-authored-by: Marcos Iglesias <190833+Golodhros@users.noreply.github.com>

* RFC for introducing an abstraction layer to the databuilder records

Signed-off-by: Andrew <andrjc4@vt.edu>

* remove merge tags

Signed-off-by: Andrew <andrjc4@vt.edu>

* move the rfc to the rfc folder

Signed-off-by: Andrew <andrjc4@vt.edu>

* delete old rfc in template folder

Signed-off-by: Andrew <andrjc4@vt.edu>

Co-authored-by: Marcos Iglesias <190833+Golodhros@users.noreply.github.com>
  • Loading branch information
AndrewCiambrone and Golodhros authored Nov 5, 2020
1 parent a77d234 commit 193f897
Showing 1 changed file with 72 additions and 0 deletions.
72 changes: 72 additions & 0 deletions rfcs/002-databuilder-record-abstraction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
- Feature Name: Add a abstraction layer to the databuilder's record model
- Start Date: 2020-10-02
- RFC PR: [amundsen-io/rfcs#5](https://github.com/amundsen-io/rfcs/pull/5)

- Amundsen Issue: [amundsen-io/amundsen#0000](https://github.com/amundsen-io/amundsen/issues/0000) (leave this empty for now)

# Add a abstraction layer to the databuilder model

## Summary

Currently the databuilder's records are all subclasses of the class `Neo4jCsvSerializable` and the output of these models are Dictionaries formatted for neo4j. This RFC proposes this coupling is broken and neo4j serialization is removed from the models and moved into the the loader portion.

## Motivation

While SeatGeek was in the process of writing the Neptune integration portion for the databuilder a problem came up of seperating Neo4j from the databuilder's records. The way Neo4j is receives nodes and edges is different from the way Neptune or other graph databases do. So SeatGeek wanted to come up with a process that:
> Maintained feature parity between the datastores.
> Did not add additional overhead for adding new features to the databuilder
> Made it easier to add other data stores in the future.
## Guide-level Explanation (aka Product Details)

This is less of a feature and more of a refactor. For this rfc this section is duplicate of the `Reference-level Explanation` section.

## UI/UX-level Explanation

This would be a backend only change no UI level changes are required.

## Reference-level Explanation (aka Technical Details)

Each of the models in the databuilder are built off of the `GraphSerializable` class. The `GraphSerializable` class is an abstract class that is responsible for validating and returning the nodes and relationships of its subclasses. The interface on the `GraphSerializable` is `next_node` with a return type of `Union[GraphNode, None]` and `next_relation` with a return type of `Union[GraphRelationship, None]`.

GraphNode would be defined as named tuple with the attributes:
- key: str
- label: str
- attributes: Dict

GraphRelationship would be a named tuple with the attributes:
- start_label: str
- end_label: str
- start_key: str
- end_key: str
- forward_label: str
- reverse_label: str
- attributes: Dict

The loader would be then be responsible for taking these graph nodes and graph relationships and converting them to the format expected by the publisher.

The unpolished changes mentioned above can be found: https://github.com/AndrewCiambrone/amundsendatabuilder/pull/6

## Drawbacks

This adds a level of complexity because it adds an additional layer to the databuilder. Instead of having the neo4j serialization built into the models it becomes a separate process. This also would add a small performance degradation because the models are producing named tuples and then they are being convered into another format.


## Alternatives

1. Duplicate all the models for each datastore but that opens the possibility of missing features in one datastore over the other.

2. Translating from neo4j models into Neptune. However this would require future contributions to remember to make sure the changes are compatible with the neptune translator.

## Prior art

N/A

## Unresolved questions

> What parts of the design do you expect to resolve through the RFC process before this gets merged?
General agreement over the approach

## Future possibilities

This is the first step in introducing other data stores into the databuilder. In the future SeatGeek would like to introduce the other portions of the Neptune integration that they have been working on but since this part of the change is a bit of a overhaul of all the models, its important to get this portion in first.

0 comments on commit 193f897

Please sign in to comment.