-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lineage Stage 0 #25
Lineage Stage 0 #25
Conversation
Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
…et number now Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
@danwom use this one instead |
## Summary | ||
|
||
|
||
Currently Amundsen doesn't have a way of surfacing lineage information for tables and columns. The idea for this first iteration is to have a way to show upstream and downstream tables and columns to users through the Table Details page so they can explore the current resource's lineage as well as navigate to related resources in Amundsen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, actually we could use programmatic description to surface the lineage . I think what it is lacking, it is a graph UI to surface the lineage intuitively.
Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
- Added photos into a nested assets/ folder. Signed-off-by: Daniel Won <dwon@lyft.com>
the png in https://github.com/amundsen-io/rfcs/blob/59bd5b98c15e823fcf1757fcd5ff97790e215dae/rfcs/025-lineage-stage-0.md doesn't seem to work |
I will take a look tonight |
|
||
![Lineage Stage 0 Architecture](assets/025/lineage-arch.png) | ||
|
||
Implementing this feature will require defining a Lienage API on the metadata service for Tables and Columns. When the API is called it will make calls to neo4j and whatever the source of lineage data is. An interface needs to be created to interact with an implementer's lineage service in a generic way. The data from the calls to these services will be put together to form the lineage response as defined below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this generic API call be made to any of the metadata proxies? or is this simply a configurable function which users will be able to extend themselves?
Suppose a user has a proxy set to Atlas
. For Stage 0, will this endpoint call a function inside the atlas_proxy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah on the Base Proxy class I added a get_lineage method that can be implemented and will be called for any proxy when the endpoint is hit https://github.com/amundsen-io/amundsenmetadatalibrary/blob/master/metadata_service/proxy/base_proxy.py#L160
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the idea is that with this method any lineage provider (like Atlas, Marquez, spline or open lineage) can be supported, right? Any proxy would support any lineage provider really.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgorsk1 I think this is just step 0 which we should bring the backend model (or I assume we will bring) with ingestion in step 1/2. The reason on why we are doing this is because Lyft still uses a 3rd party vendor lineage service . It is easier for the implementation in these kinds of steps by focusing on FE first with backend proxy; then build a backend model/ ingestion with push mechanism in a subsequent step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we will hopefully add a backend model for lineage in the future so that we can query the db directly for lineage rather than making ad hoc calls to a provider from metadata. In that case we would have to add support for lineage extraction on databuilder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clear, for me support for external service is more than fine, just wanted to make sure I got it right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me for Stage 0.
@danwom @allisonsuarez here is the preview (https://github.com/amundsen-io/rfcs/blob/59bd5b98c15e823fcf1757fcd5ff97790e215dae/rfcs/025-lineage-stage-0.md) . Could you update and fix the broken image links? thanks |
|
||
We will add two additional tabs to the `Table Details` page, `Upstream` and `Downstream`. Each tab will contain a list of tables from which data is inherited or consumed. This allows users view a table's lineage in a very simple manner. | ||
|
||
![Column Lineage Preview](assets/025/column-lineage-preview.png) | ||
![Column Lineage Preview](../assets/025/column-lineage-preview.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason we don't go with lineage tab with column and dashboard, then have upstream/downstream subtab underneath?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a preference to avoid nested tabs when possible. At some point if we have too many tabs then we can reconsider nesting or other options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me it feels weird to break it into two tabs. lineage is, after all, a big picture of data flow - so would make sense to have it in one place. It'd make it easier to analyze it for end user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, +1 on @mgorsk1 , I felt the same to have single lineage tab with nested tab for upstream/downstream (not sure how complexity it changes for FE implementation). @danwom @allisonsuarez could you share more about the details of the preferences?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for not using the nested tabs where possible.
For Stage 0, it will be a list of items for each direction, which IMHO is okay to be placed in different tabs. But yes, for the next milestone, where we'll have a graph/chart of the complete lineage (and not the list of tables), that must be placed in one view under 1 tab to have a clear view of the complete lineage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not really advocating for nested tabs as this sounds more like implementation preference rather than UX question and im fine avoiding it if we don't like nesting.
But is nesting the only way this could be implemented in single tab ? I would even see something like single list (sorted by upstream and then downstream) with table names and icon/abbreviation if the table is upstream or downstream.
Signed-off-by: Daniel Won <dwon@lyft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm for stage 0
* started writing rfc and added dir Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com> * made small change on README and added more to RFC will create PR to get number now Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com> * had to redo this Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com> * reverted weird changes Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com> * changed naming again Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com> * Added details to rfc 025 lineage. - Added photos into a nested assets/ folder. Signed-off-by: Daniel Won <dwon@lyft.com> * Fixed relative image links in rfc 025 Signed-off-by: Daniel Won <dwon@lyft.com> Co-authored-by: Daniel Won <dwon@lyft.com> Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
Closed #24 in favor of this one
This is an RFC describing the first iteration of Lineage related work.