Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lineage Stage 0 #25

Merged
merged 9 commits into from
Mar 9, 2021
Merged

Lineage Stage 0 #25

merged 9 commits into from
Mar 9, 2021

Conversation

allisonsuarez
Copy link
Contributor

Closed #24 in favor of this one

This is an RFC describing the first iteration of Lineage related work.

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
…et number now

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
@allisonsuarez allisonsuarez requested a review from a team as a code owner February 23, 2021 18:40
Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
@allisonsuarez
Copy link
Contributor Author

@danwom use this one instead

## Summary


Currently Amundsen doesn't have a way of surfacing lineage information for tables and columns. The idea for this first iteration is to have a way to show upstream and downstream tables and columns to users through the Table Details page so they can explore the current resource's lineage as well as navigate to related resources in Amundsen.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, actually we could use programmatic description to surface the lineage . I think what it is lacking, it is a graph UI to surface the lineage intuitively.

allisonsuarez and others added 2 commits February 23, 2021 14:41
Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
- Added photos into a nested assets/ folder.

Signed-off-by: Daniel Won <dwon@lyft.com>
@feng-tao
Copy link
Member

feng-tao commented Mar 2, 2021

@feng-tao
Copy link
Member

feng-tao commented Mar 2, 2021

I will take a look tonight


![Lineage Stage 0 Architecture](assets/025/lineage-arch.png)

Implementing this feature will require defining a Lienage API on the metadata service for Tables and Columns. When the API is called it will make calls to neo4j and whatever the source of lineage data is. An interface needs to be created to interact with an implementer's lineage service in a generic way. The data from the calls to these services will be put together to form the lineage response as defined below.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this generic API call be made to any of the metadata proxies? or is this simply a configurable function which users will be able to extend themselves?
Suppose a user has a proxy set to Atlas. For Stage 0, will this endpoint call a function inside the atlas_proxy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah on the Base Proxy class I added a get_lineage method that can be implemented and will be called for any proxy when the endpoint is hit https://github.com/amundsen-io/amundsenmetadatalibrary/blob/master/metadata_service/proxy/base_proxy.py#L160

Copy link
Contributor

@mgorsk1 mgorsk1 Mar 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea is that with this method any lineage provider (like Atlas, Marquez, spline or open lineage) can be supported, right? Any proxy would support any lineage provider really.

Copy link
Member

@feng-tao feng-tao Mar 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgorsk1 I think this is just step 0 which we should bring the backend model (or I assume we will bring) with ingestion in step 1/2. The reason on why we are doing this is because Lyft still uses a 3rd party vendor lineage service . It is easier for the implementation in these kinds of steps by focusing on FE first with backend proxy; then build a backend model/ ingestion with push mechanism in a subsequent step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we will hopefully add a backend model for lineage in the future so that we can query the db directly for lineage rather than making ad hoc calls to a provider from metadata. In that case we would have to add support for lineage extraction on databuilder.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clear, for me support for external service is more than fine, just wanted to make sure I got it right.

Copy link
Member

@verdan verdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me for Stage 0.

@feng-tao
Copy link
Member

feng-tao commented Mar 3, 2021

@danwom @allisonsuarez here is the preview (https://github.com/amundsen-io/rfcs/blob/59bd5b98c15e823fcf1757fcd5ff97790e215dae/rfcs/025-lineage-stage-0.md) . Could you update and fix the broken image links? thanks


We will add two additional tabs to the `Table Details` page, `Upstream` and `Downstream`. Each tab will contain a list of tables from which data is inherited or consumed. This allows users view a table's lineage in a very simple manner.

![Column Lineage Preview](assets/025/column-lineage-preview.png)
![Column Lineage Preview](../assets/025/column-lineage-preview.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason we don't go with lineage tab with column and dashboard, then have upstream/downstream subtab underneath?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a preference to avoid nested tabs when possible. At some point if we have too many tabs then we can reconsider nesting or other options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it feels weird to break it into two tabs. lineage is, after all, a big picture of data flow - so would make sense to have it in one place. It'd make it easier to analyze it for end user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, +1 on @mgorsk1 , I felt the same to have single lineage tab with nested tab for upstream/downstream (not sure how complexity it changes for FE implementation). @danwom @allisonsuarez could you share more about the details of the preferences?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for not using the nested tabs where possible.

For Stage 0, it will be a list of items for each direction, which IMHO is okay to be placed in different tabs. But yes, for the next milestone, where we'll have a graph/chart of the complete lineage (and not the list of tables), that must be placed in one view under 1 tab to have a clear view of the complete lineage.

Copy link
Contributor

@mgorsk1 mgorsk1 Mar 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really advocating for nested tabs as this sounds more like implementation preference rather than UX question and im fine avoiding it if we don't like nesting.

But is nesting the only way this could be implemented in single tab ? I would even see something like single list (sorted by upstream and then downstream) with table names and icon/abbreviation if the table is upstream or downstream.

Signed-off-by: Daniel Won <dwon@lyft.com>
Copy link
Member

@feng-tao feng-tao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm for stage 0

@allisonsuarez allisonsuarez added Status: Final Comment Period (FCP) On final comment period (seven days) and removed Status: Active labels Mar 4, 2021
@feng-tao feng-tao added Status: Landed The proposed changes are shipped in an actual release and removed Status: Final Comment Period (FCP) On final comment period (seven days) labels Mar 9, 2021
@feng-tao feng-tao merged commit e52fb0d into master Mar 9, 2021
@feng-tao feng-tao deleted the asm-lineage-0 branch March 9, 2021 05:03
allisonsuarez added a commit that referenced this pull request May 5, 2021
* started writing rfc and added dir

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>

* made small change on README and added more to RFC will create PR to get number now

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>

* had to redo this

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>

* reverted weird changes

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>

* changed naming again

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>

* Added details to rfc 025 lineage.
- Added photos into a nested assets/ folder.

Signed-off-by: Daniel Won <dwon@lyft.com>

* Fixed relative image links in rfc 025

Signed-off-by: Daniel Won <dwon@lyft.com>

Co-authored-by: Daniel Won <dwon@lyft.com>
Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Landed The proposed changes are shipped in an actual release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants