Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sourceCode facet for open lineage #1467

Closed
10 tasks
sunank200 opened this issue Dec 20, 2022 · 2 comments · Fixed by #1537
Closed
10 tasks

Add sourceCode facet for open lineage #1467

sunank200 opened this issue Dec 20, 2022 · 2 comments · Fixed by #1537
Assignees
Labels
feature New feature or request product/python-sdk Label describing products
Milestone

Comments

@sunank200
Copy link
Contributor

Please describe the feature you'd like to see
The python transform should also expose its content with the sourceCode facet, similar to the sql facet. Example: link [Action item: Create an issue on astro-sdk]
Include the transformation python code that the transformations were running in the OpenLineage events so that they showed up in the Info tab
For demo purposes I hard-coded both of these in a custom openlineage-airflow fork like:

code = inspect.getsource(task.python_callable)
job_facet = {"sql": SqlJobFacet(query=code), "sourceCodeLocation": SourceCodeLocationJobFacet("git", "https://github.com/astronomer/astro-days-chicago/blob/9cca4e166d73106e903f3d9f32af334d7b5560a3/dags/airflow_ecosystem.py")}

^ the code would probably be cleaner as {"sourceCode": SourceCodeJobFacet("python", code)} I stuffed it in sql only due to the lack of: https://github.com/astronomer/astro/issues/2150 which is a temporary limitation

Describe the solution you'd like

  • Add source code facet for operators for open lineage integrations

Acceptance Criteria

  • Run all the operators on astro-cloud and Marquez and post screenshot of facets with sourceCode facet.
  • All checks and tests in the CI should pass
  • Unit tests (90% code coverage or more, once available)
  • Integration tests (if the feature relates to a new database or external service)
  • Example DAG
  • Docstrings in reStructuredText for each of methods, classes, functions and module-level attributes (including Example DAG on how it should be used)
  • Exception handling in case of errors
  • Logging (are we exposing useful information to the user? e.g. source and destination)
  • Improve the documentation (README, Sphinx, and any other relevant)
  • How to use Guide for the feature (example)
@sunank200 sunank200 added the feature New feature or request label Dec 20, 2022
@sunank200 sunank200 added this to the 1.4.0 milestone Dec 20, 2022
@pankajastro pankajastro added the product/python-sdk Label describing products label Jan 4, 2023
@pankajastro
Copy link
Contributor

@rajaths010494 can you please add the reason here for why currently in PR #1537 get_source_code only in DataframeOperator and BaseSQLDecoratedOperator?

@pankajastro
Copy link
Contributor

@rajaths010494 can you please add the reason here for why currently in PR #1537 get_source_code only in DataframeOperator and BaseSQLDecoratedOperator?

Openlineage for SourceCodeJobFacet not visible on Marquez UI hhttps://github.com/OpenLineage/OpenLineage#1478

utkarsharma2 pushed a commit that referenced this issue Jan 17, 2023
**Please describe the feature you'd like to see**
closes: #1467

The python transform should also expose its content with the sourceCode
facet, similar to the sql facet. Example:
[link](https://github.com/OpenLineage/OpenLineage/blob/3090ced24604c95716dacd667c2cff52bf438aba/integration/airflow/openlineage/airflow/extractors/python_extractor.py#L31)
[Action item: Create an issue on astro-sdk]
Include the transformation python code that the transformations were
running in the OpenLineage events so that they showed up in the Info tab
For demo purposes I hard-coded both of these in a custom
openlineage-airflow fork like:
```
code = inspect.getsource(task.python_callable)
job_facet = {"sql": SqlJobFacet(query=code), "sourceCodeLocation": SourceCodeLocationJobFacet("git", "https://github.com/astronomer/astro-days-chicago/blob/9cca4e166d73106e903f3d9f32af334d7b5560a3/dags/airflow_ecosystem.py")}
```
^ the code would probably be cleaner as {"sourceCode":
SourceCodeJobFacet("python", code)} I stuffed it in sql only due to the
lack of: astronomer/astro#2150 which is a
temporary limitation

**Describe the solution you'd like**
- Add source code facet for operators for open lineage integrations

Currently source code facet is done for base decorator operators.

Co-authored-by: Pankaj <pankaj.singh@astronomer.io>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request product/python-sdk Label describing products
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants