Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest/glue): handle error when generating s3 tags for virtual view tables #5398

Conversation

timcosta
Copy link
Contributor

@timcosta timcosta commented Jul 15, 2022

When using Glue Virtual Views and tagging data with s3 bucket/object tags, the ingestion currently errors out when it finds a virtual view as there is no location set, and therefore the location doesnt being with s3:// or one of the other acceptable prefixes. This PR checks for the presence of a Location block and then returns None if one does not exist.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@anshbansal anshbansal added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Jul 15, 2022
@github-actions
Copy link

github-actions bot commented Jul 15, 2022

Unit Test Results (build & test)

403 tests  ±0   403 ✔️ ±0   8m 31s ⏱️ -31s
  97 suites ±0       0 💤 ±0 
  97 files   ±0       0 ±0 

Results for commit 17b9ae0. ± Comparison against base commit 65a308a.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Jul 15, 2022

Unit Test Results (metadata ingestion)

       8 files  ±0         8 suites  ±0   1h 12m 24s ⏱️ - 3m 41s
   601 tests ±0     597 ✔️ ±0    3 💤 ±0  1 ±0 
1 140 runs  ±0  1 095 ✔️ +1  44 💤 ±0  1  - 1 

For more details on these failures, see this check.

Results for commit 17b9ae0. ± Comparison against base commit 65a308a.

♻️ This comment has been updated with latest results.

@timcosta
Copy link
Contributor Author

Not really sure why CI failed, in the raw output the old and new values look the same to me - urn:li:domain:Engineering. Seems like maybe there's an added newline, but not sure if that's just the test output or an actual change. It also seems to be in an area of code that would be unaffected by this PR.

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @timcosta!
I've added some suggestions on simplifying this code change. I don't think we need to preserve compatibility with old behavior, instead we just need to fix how we are getting the bucket name to fetch tags.

# and we should be able to return no tags rather than fail the entire ingestion
try:
bucket_name = s3_util.get_bucket_name(
table["StorageDescriptor"]["Location"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that we just need to be more defensive about this. When a StorageDescriptor -> Location entry is not found, don't try to get tags for that bucket. Shouldn't be raising exception either.

So something like:

if table.get("StorageDescriptor",{}).get("Location"):
   bucket_name = s3_util.get_bucket_name(table["StorageDescriptor"]["Location"])...
...
else:
   return None

might be enough.

@@ -123,6 +123,10 @@ class GlueSourceConfig(AwsSourceConfig, PlatformSourceConfigBase, GlueProfilingC
default=False,
description="If an S3 Objects Tags should be created for the Tables ingested by Glue.",
)
s3_tags_ignore_virtual_view_tables: Optional[bool] = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need this flag to preserve old behavior... I don't think anyone will want the old behavior.

@timcosta timcosta force-pushed the feat_glue_s3_tag_handle_virtual_views branch from 6c995ef to 30c585e Compare July 18, 2022 15:39
@timcosta timcosta changed the title feat(ingest/glue): handle error when generating s3 tags for virtual view tables fix(ingest/glue): handle error when generating s3 tags for virtual view tables Jul 18, 2022
@timcosta
Copy link
Contributor Author

Thanks for the comments @shirshanka, updated the branch to simply contain the logic fix rather than the flag to preserve backwards compatibility. I agree that people likely wouldnt want the original behavior, but it's usually easier to get PRs merged without changes that could be perceived as breaking :)

@shirshanka shirshanka added the on-deck PR or Issue that will be reviewed and/or addressed by the DataHub Maintainers in future cycles label Jul 18, 2022
@shirshanka shirshanka self-assigned this Jul 18, 2022
Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Will merge once CI is green.

@shirshanka shirshanka merged commit d065553 into datahub-project:master Jul 20, 2022
Santhin pushed a commit to Santhin/datahub that referenced this pull request Jul 26, 2022
…ew tables (datahub-project#5398)

Co-authored-by: Tim Costa <timcosta@amazon.com>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
maggiehays pushed a commit to maggiehays/datahub that referenced this pull request Aug 1, 2022
…ew tables (datahub-project#5398)

Co-authored-by: Tim Costa <timcosta@amazon.com>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
neojunjie added a commit to aqden/datahub that referenced this pull request Aug 29, 2022
* feat(ingest): working with multiple bigquery projects (datahub-project#5240)

* fix(build): missing libs (datahub-project#5254)

* fix(build): use correct creds (datahub-project#5261)

* feat(ingest): Option to define path spec for Redshift lineage generation (datahub-project#5256)

* fix(ui): Enable previews properly when browsing for DataJob (datahub-project#5250)

* fix(docs): Fix acronym on mxe docs (datahub-project#5249)

* fix(ui): Support deleting references to glossary terms / nodes, users, assertions, and groups (datahub-project#5248)

* Adding referential integrity to deletes API

* Updating comments

* Fix build

* fix checkstyle

* Fixing Delete Entity utils Test

* feat(docs) add links in quickstart for adding users (datahub-project#5267)

* fix(siblings) Display sibling assertions in Validations tab (datahub-project#5268)

* fix(siblings) Display sibling assertions in Validations tab

* query changes

Co-authored-by: Chris Collins <chriscollins@Chriss-MBP-2-192.lan>

* feat(domain) Add ability to edit a Domain name from the UI (datahub-project#5266)

* feat(ingest): delta-lake: adding support for delta lake (datahub-project#5259)

Co-authored-by: Shirshanka Das <shirshanka@apache.org>

* fix(siblings) Update the names of siblings utils args for readability (datahub-project#5269)

Co-authored-by: Chris Collins <chriscollins@Chriss-MBP-2-193.lan>

* docs(adopters): add showroomprive and n26 as DataHub adopters (datahub-project#5271)

* feat(glossary) Add Source section to sidebar for Glossary Terms (datahub-project#5262)

* fix(ingest): delta-lake - fix dependency issue for snowflake due to s3_util (datahub-project#5274)

* fix(ingest): s3 - Remove unneeded methods from s3_util (datahub-project#5276)

* feat(ui): Selector recommendations in Owner, Tag and Domain Modal (datahub-project#5197)

* fix(security) Sanitize rich text before sending to backend or rendering on frontend (datahub-project#5278)

* feat(GraphQL): Support for Deleting Domains, Tags via GraphQL API (datahub-project#5272)

* feat(build): reduce build time for ingestion image (datahub-project#5225)

* fix(ingestion): profiling - Fixing partitioned table profiling in BQ (datahub-project#5283)

* fix(ingest) redshift: Adding missing dependencies and relaxing sqlalchemy dependency (datahub-project#5284)

Relaxing sqlalchemy deps to make our plugins work with Airflow 2.3

* fix(ingestion): Reverting sqlalchemy upgrade because it caused issues with mssql and redshift-usage (datahub-project#5289)

* fix(Siblings): Have sibling hook use entity client (datahub-project#5279)

* fixing dbt platform issues

* have sibling hook use entity client over entity service

* switching search service as well

* lint

* more lint

* more specific exceptions

* refactor(ui): Show message when related glossary terms are empty. (datahub-project#5285)

* docs(adopter): add Digital Turbine as DataHub adopter (datahub-project#5290)

* docs(docker): Update schema-registry  docker.env (datahub-project#5231)

* feat(siblings): index sibling aspects for historical dbt metadata (datahub-project#5291)

* fixing dbt platform issues

* starting sibling restore index job work

* finish restore indices

* migrating to list urns

* rename constant

* disaster recovery

* feat(ui) Adding support for deleting Tags and Domains via the UI (datahub-project#5280)

* Adding support for deleting tags and domains via the UI

* Fixing tests

* fix(test): add cleanup in tests, make urls configurable (datahub-project#5287)

* fix(docs,quickstart): release related changes for 0.8.40 (datahub-project#5299)

* fix(doc): config typo on confluent cloud doc (datahub-project#5293)

* fix(cli): suppress secrets in stacktraces (datahub-project#5302)

* Minor UI bug fuix (datahub-project#5292)

* fix(cli): timeline - category should be owner not ownership (datahub-project#5304)

* perf(ui): reduce data fetched by siblings in lineage (datahub-project#5308)

* fix(ingest): bigquery - Fix for bigquery error when there was no bigquery catalog specified (datahub-project#5303)

* fix(ui) Fix entity profile sidebar width issues (datahub-project#5305)

Co-authored-by: Chris Collins <chriscollins@Chriss-MBP-2.lan>

* perf(search): Improve search default performance  (datahub-project#5311)

* perf(ui): Performance improvements and misc refactorings in the UI (datahub-project#5310)

* feat(ui): Modified the drop down of Menu Items (datahub-project#5301)

* fix(validation) Fail validation error silently instead of crashing (datahub-project#5314)

* feat(docs) Add documentation on authorization & authentication (datahub-project#5265)

* fix(ui) Make profile icon clickable to expand header menu (datahub-project#5317)

* refactor(ui): Extract searchable page into its own component (perf + ux)  (datahub-project#5318)

* fix(gms) Remove auto-creating status aspects if not present when ingesting metadata (datahub-project#5315)

* fix(ui): Add missing SearchRoutes component (datahub-project#5321)

* feat(ingest): looker - ingest dashboard create/update/delete timestamps (datahub-project#5312)

* fix(ui): Fix pipeline tasks list loading (datahub-project#5332)

* feat(ingest): lookml - adding support for only emitting reachable views from explores (datahub-project#5333)

* fix(ingest): tableau - omit schema fields when name is absent (datahub-project#5275)

* fix(siblings) Combine siblings data but remove duplicate data (datahub-project#5337)

* fix(docs): Fix typo in metadata-ingestion.md (datahub-project#5338)

* fix(me) Cache the me query for performance reasons (datahub-project#5316)

* fix(tokens) Adds non-admin tests for access tokens (datahub-project#5174)

* feat(bigquery): support size, rowcount, lastmodified based table selection for profiling (datahub-project#5329)

Co-authored-by: Shirshanka Das <shirshanka@apache.org>

* chore: Refactor Python Codebase (datahub-project#5113)

* docs(bigquery): profiling report enhancement (datahub-project#5342)

* feat(ingest): update CSV source to support description and ownership type (datahub-project#5346)

* fix(ui): fixed the ui issue (datahub-project#5341)

* feat(ingest): salesforce - add connector (datahub-project#5104)

Co-authored-by: Shirshanka Das <shirshanka@apache.org>
Co-authored-by: Vincent Koc <koconder@users.noreply.github.com>

* feat(bootstrap): create abstract class UpgradeStep to abstract away upgrade logic (datahub-project#5349)

* fix(ingest): bigquery-usage - dataset name for sharded tables (datahub-project#5347)

Co-authored-by: Shirshanka Das <shirshanka@apache.org>

* docs(features): update grammar (datahub-project#5350)

* fix(ci): fix mysql test and attempt kafka-connect ingestion (datahub-project#5352)

* feat(ui): add copy function for stats table sample value (datahub-project#5331)

* fix(ui) Correct show/hide tabs in Settings based on privileges (datahub-project#5355)

Co-authored-by: Chris Collins <chriscollins@Chriss-MacBook-Pro-2.local>

* fix(siblings): add useMutationUrn to domain section (datahub-project#5270)

* fixing dbt platform issues

* useMutationUrn for domains modal

* feat(schema) Show last observed timestamp in the schema tab (datahub-project#5348)

* fix(glossary) Fixes a bug for yaml ingested terms without source_url (datahub-project#5356)

* feat(lineage) Add Lineage tab to Chart and Dashboard entity profiles (datahub-project#5357)

* fix(cassandra): fix Cassandra queries used by IngestDataPlatformInstancesStep (datahub-project#5199)

* refactor(ui): Use createTag mutation for creating new tags from the UI (datahub-project#5359)

* feat(ui): Added recommendation on group modal (datahub-project#5362)

* refactor(ui): Remove unnecessary fields in GraphQL (datahub-project#5358)

* feat(ingest) - add audit actor urn to auditStamp (datahub-project#5264)

* feat(ingest): improve domain ingestion usability (datahub-project#5366)

* fix(config): fixes config key in DataHubAuthorizerFactory (datahub-project#5371)

* fix(ingest): domains - check whether urn based domain exists during resolution (datahub-project#5373)

* feat(quickstart): Adding env variables and cli options for customizing mapped ports in  quickstart (datahub-project#5353)

Co-authored-by: Shirshanka Das <shirshanka@apache.org>

* fix(build): tweak ingestion build (datahub-project#5374)

* feat(sdk): python - add get_aspects_for_entity (datahub-project#5255)

Co-authored-by: Shirshanka Das <shirshanka@apache.org>

* fix(airflow): fix for failing serialisation when Param was specified + support for external task sensor (datahub-project#5368)

fixes datahub-project#4546

* fix(users): fix to not get invite token unless the invite token modal is visible (datahub-project#5380)

* fix(gms) Propagate cache exception upstream (datahub-project#5381)

* fix(bootstrap): skip ingesting data platforms that already exist (datahub-project#5382)

* fix(cli): respect server telemetry settings correctly (datahub-project#5384)

Co-authored-by: Shirshanka Das <shirshanka@apache.org>

* fix(ingest): bigquery - Graceful bq partition id date parsing failure (datahub-project#5386)

* feat(airflow): Circuit breaker and python api for Assertion and Operation (datahub-project#5196)

* feat(kafka-setup): add options for sasl_plaintext (datahub-project#5385)

allow sasl_plaintext options using environment variables

* fix(bigquery): multi-project GCP setup run query through correct project (datahub-project#5393)

* fix(bigquery): add storage project name (datahub-project#5395)

* Add Changes to support smoke test on Datahub deployed on kubernetes Cluster (datahub-project#5334)

Co-authored-by: Aseem Bansal <asmbansal2@gmail.com>

* fix(PlayCookie) PLAY_TOKEN cookie rejected because userprofile exceeds 4096 chars (datahub-project#5114)

* feat(dashboards): add datasets field to DashboardInfo aspect (datahub-project#5188)

Co-authored-by: John Joyce <john@acryl.io>

* feat(siblings): allow viewing siblings separately (datahub-project#5390)

* allow pulling back curtain for siblings

* sibling pullback working for lineage + property merge

* propagating provinence to ui

* fixups from merge & some renames

* fix styling & add tooltip

* adding cypress tests

* fix lint

* updating mocks

* updating smoke test

* fixing domains smoke test

* responding to comments

* refactor(ui): Added Cursor pointer to tags (datahub-project#5389)

* feat(GMS): Adding Dashboard Usage Models (datahub-project#5399)

* fix(quickstart): use platform agnostic way to get folder (datahub-project#5400)

* Adds support for Domains in CSV source (datahub-project#5372)

* feat(ingestion) Build out UI form for Snowflake Managed Ingestion (datahub-project#5391)

* fix(kafka): add missing configs (datahub-project#5394)

* feat(model): dashboard usage model, is_null condition added (datahub-project#5397)

* fix(datahub-client): Fix kafka config issue (datahub-project#5403)

* build: improve comprehensiveness of gradle clean (datahub-project#5003)

* fix(gms): Change MessageDigest to be thread safe (datahub-project#5405)

* fix(metadata-ingestion) Fix broken csv enricher test (datahub-project#5406)

* fix(tests): Removes duplicate policies tests & makes DataHub user configurable (datahub-project#5365)

* feat(quickstart,docs): updates for v0.8.41 (datahub-project#5409)

* fix(ingest): ensure upgrade checks run async (datahub-project#5383)

* fix(ingest): looker - pass transport options to all api calls (datahub-project#5417)

* feat(quickstart): moving to official confluent images for m1 (datahub-project#5416)

* fix(documentation) Fix erratic cursor in documentation editor bug (datahub-project#5411)

Co-authored-by: Chris Collins <chriscollins@Chriss-MBP-2-280.lan>

* feat(ui): Supporting enriched search preview + misc improvements  (datahub-project#5419)

* chore: remove unnecessary modules from codebase (datahub-project#5420)

* fix(ingest): looker - extract usage for dashboards allowed by pattern (datahub-project#5424)

* fix(docker): fix kafka-setup command to support same capabilities as previous (datahub-project#5428)

* fix(protobuf) Set undeprecated ownership type & fix case sentitive urn corpGroup (datahub-project#5425)

* fix(ui): add dataset qualifiedName parameter to lineage query (datahub-project#5427)

* fix(glossary) Fix dropdown where disabled buttons are still clickable (datahub-project#5430)

Co-authored-by: Chris Collins <chriscollins@Chriss-MBP-2.lan>

* docs(bigquery): add changelog and unittest for profiling limits (datahub-project#5407)

* fix(siblings): fixing lineage fetching for siblings & sources (datahub-project#5415)

* fix(ui): Fixing unreleased search preview bugs  (datahub-project#5432)

* feat(ui): Adding Statistics Summary to Dataset + Dashboard Profiles  (datahub-project#5440)

* feat(ingest): add test source connection feature, structured report file (datahub-project#5442)

* fix(ingest/glue): handle error when generating s3 tags for virtual view tables (datahub-project#5398)

Co-authored-by: Tim Costa <timcosta@amazon.com>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>

* feat(ingest): model - adding a small extension to support communicating structured responses (datahub-project#5429)

* fix(ingest): bigquery-usage - fix dataset name for sharded table (datahub-project#5412)

* feat(ingestion) Add new endpoint to test an ingestion connection (datahub-project#5438)

* feat(cli,build): remove deprecated variables GMS_HOST/_PORT (datahub-project#5451)

* fix(search): make filters by default an empty list if null (datahub-project#5454)

* fix(ingest): hive - add column comment as a column description (datahub-project#5449)

* feat(groups): add native groups concept to DataHub (datahub-project#5443)

* fix(ingest): fix serialization of report to handle nesting (datahub-project#5455)

* fix(ingest): tableau - fix tableau db error, add more logs (datahub-project#5423)

* build(deps): bump terser from 5.9.0 to 5.14.2 in /docs-website (datahub-project#5448)

Bumps [terser](https://github.com/terser/terser) from 5.9.0 to 5.14.2.
- [Release notes](https://github.com/terser/terser/releases)
- [Changelog](https://github.com/terser/terser/blob/master/CHANGELOG.md)
- [Commits](https://github.com/terser/terser/commits)

---
updated-dependencies:
- dependency-name: terser
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* docs: spark-lineage - configuration details for Amazon EMR (datahub-project#5459)

* feat(app): schema-history - remove blame language for the schema history feature (datahub-project#5457)

* Worked on the alignment of menu icon in search header (datahub-project#5458)

* build(deps): bump terser from 4.8.0 to 4.8.1 in /datahub-web-react (datahub-project#5446)

Bumps [terser](https://github.com/terser/terser) from 4.8.0 to 4.8.1.
- [Release notes](https://github.com/terser/terser/releases)
- [Changelog](https://github.com/terser/terser/blob/master/CHANGELOG.md)
- [Commits](https://github.com/terser/terser/commits)

---
updated-dependencies:
- dependency-name: terser
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* feat(ingest): snowflake - basic test connection capability (datahub-project#5464)

* fix(ingest/trino): Avoid exception if $properties table empty or not readable (datahub-project#5447)

Under some configuration of access rules in Trino, the user may not have
read access to the content of the table, which will result in an exception
(`fetchone()` returns `None`)

This commit ensures no exception are raised and the ingestion can proceed.

* feat(ingest): preflight - Add way to check/upgrade brew package version in preflight if needed (datahub-project#5435)

* fix(build): add base image with gradle wrapper cached (datahub-project#5467)

* doc(bigquery): groups grants by requirements (datahub-project#5468)

* fix(docs,build): remove base image not needed, cleanup docs (datahub-project#5469)

* feat(ui): Partial support for Chart usage (datahub-project#5473)

* fix(ingest): bigquery: multiproject profiling fix (datahub-project#5474)

* fix(ingest): kafka - revert deps back to < 1.9.0 (datahub-project#5476)

* feat(ci): datahub-upgrade - support multiplatform image (datahub-project#5477)

* feat(cli): quickstart - experimental support for backup restore (datahub-project#5418)

* feat(ingest): dbt - updating source lineage logic (datahub-project#5414)

Co-authored-by: Shirshanka Das <shirshanka@apache.org>

* Ingestion: Added form in Big Query type to edit the queries. (datahub-project#5431)

* docs(reindex): fix docsearch config (datahub-project#5479)

* refactor(ui): Adding checkbox option to select multiple results at once. (datahub-project#5422)

* feat(cli): delete - hard delete deletes soft deleted entities (datahub-project#5478)

* fix(docs): add missing closing marker for note section (datahub-project#5480)

* fix(ci): intermittent failure in github actions (datahub-project#5452)

* feat(model, ingest): add user email in dashboard user usage counts (datahub-project#5471)

* feat(ingest): snowflake - test_connection add support for capability report (datahub-project#5472)

Co-authored-by: Shirshanka Das <shirshanka@apache.org>

* feat(build): automatically mark issues as stale to close inactive issues (datahub-project#5482)

* fix(ingest): loosen confluent-kafka dep requirement (datahub-project#5489)

* refactor(ingest): cleanup importlib.import_module calls (datahub-project#5490)

* build(ingest): make gradle build less chatty (datahub-project#5491)

* fix(ingest): dbt - add support for trino datatypes (datahub-project#5379)

* refactor(ci): use custom action for checking codegen status (datahub-project#5493)

* feat(spark-lineage, java-emitter): Support ssl cert disable verification functionality (datahub-project#5488)

Co-authored-by: Shirshanka Das <shirshanka@apache.org>

* docs(auth): fix link to point to new doc (datahub-project#5501)

* docs(updating-datahub): add note for breaking change in looker usage ingestion (datahub-project#5499)

* fix(ingest): cleanup unused flake8 noqa statements (datahub-project#5492)

* fix(ingest): cleanup unused flake8 noqa statements

In the future, we can discover these using `flake8-noqa`.

* add back c901

* refactor(ci): refactor Docker build-and-push workflows (datahub-project#5494)

* docs(slack): update to Slack guidelines (datahub-project#5504)

* feat(cli): delete - add --only-soft-deleted option, perf improvements (datahub-project#5485)

* fix(ingest): use temp dir for file generated during test (datahub-project#5505)

* feat(ui) Show Glossary and Domains header links to everyone (datahub-project#5506)

Co-authored-by: Chris Collins <chriscollins@Chriss-MBP-2.lan>

* fix(ui): Fix Flickering Issue on search input field (datahub-project#5503)

* fix(ingest): respect rest emitter timeout setting (datahub-project#5508)

* fix(ui): Flickering Issue on search input field (datahub-project#5515)

* feat(ui): Added form to Looker and Tableau (datahub-project#5487)

* feat(identity): update azure and okta connectors to emit Origin aspects (datahub-project#5495)

* feat(ui): Adding Search Select feature(frontend only)  (datahub-project#5507)

* test(ingest): limit GMS retries in test (datahub-project#5509)

* fix(ingest): airflow: update subdag check for compatibility with older Airflow versions (datahub-project#5523)

* use getattr to default None if no subdag

* add None check

* add other None check

* Apply suggestions from code review- double quotes

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* minor tweak to fix lint

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* fix(ingest): fix unbound variable bug in cli ingest list-runs (datahub-project#5527)

* fix(ui) Display Term Group name properly in Recently Viewed (datahub-project#5528)

* feat(ingestion) Add frontend connection test for Snowflake (datahub-project#5520)

* fix(glossary) Fix Glossary success messages and sort Glossary (datahub-project#5533)

* show error and success messages in glossary properly

* sort glossary nodes and terms alphabetically

Co-authored-by: Chris Collins <chriscollins@Chriss-MBP-2.lan>

* feat(apache-ranger): Apache Ranger Authorizer support in datahub-gms (datahub-project#4999)

* feat(ingest): add deprecation warning for Python 3.6 (datahub-project#5519)

* docs(townhall) add past townhall agendas (datahub-project#5536)

* feat(ingestion): add groups to ldap users (datahub-project#5470)

* chore(issues): reduce time for issues to be marked stale and then closed (datahub-project#5537)

* fix(ingestion) Set pipeline_name on UI recipes with forms (datahub-project#5535)

* Fixing OIDC logout issues (datahub-project#5538)

* fix(analytics-tab) - fix analytics tab config variable for gms (datahub-project#5529)

* feat(ui): Support batch adding / remove tags from search lists. (Batch Actions part 2/7)  (datahub-project#5534)

* fix(ingestionSource): improve error experience when ingestion source is in an inconsistent state (datahub-project#5522)

* fix(docs): Fixed typo in schema history markdown! (datahub-project#5545)

* fix(docker): Fixing dev docker and quickstart  (datahub-project#5550)

* feat(ui): Support Batch adding and removing Glossary Terms (Batch Actions 3/7) (datahub-project#5544)

* feat(ci): test quickstart works (datahub-project#5518)

* feat(ci): test quickstart works

* do not fail fast

* remove macos

* add some debug information

* tweak triggers

* fix workflow file

* remove running on every PR

* Update .github/workflows/check-quickstart.yml

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* Update .github/workflows/check-quickstart.yml

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* test(ingest): mark trino/hana tests as xfail due to flakes (datahub-project#5549)

* feat(ingestion): superset - add display_uri to config (datahub-project#5408)

* fix(quickstart): failure on a path not being present (datahub-project#5554)

* fix(dbt): fix issue of assertion error when stateful ingestion is used with dbt tests (datahub-project#5540)

* fix(dbt): fix issue of dbt stateful ingestion with tests

Co-authored-by: MugdhaHardikar-GSLab <mugdha.hardikar@gslab.com>
Co-authored-by: MohdSiddique Bagwan <mohdsiddique.bagwan@gslab.com>
Co-authored-by: Ravindra Lanka <rlanka@acryl.io>

* feat(ui): Batch add & remove Owners to assets via the UI (datahub-project#5552)

* feat(ingestion) Update managed ingestion scheduler to be easier to use (datahub-project#5559)

* fix(ingestion): correct trino datatype handling (datahub-project#5541)

Co-authored-by: Ravindra Lanka <rlanka@acryl.io>

* feat(ingest) Allow ingestion of Elasticsearch index template (datahub-project#5444)


Co-authored-by: Ravindra Lanka <rlanka@acryl.io>

* fix(ingest): fix some typos and logging issues (datahub-project#5564)

* feat(transformers): Add domain transformer for dataset (datahub-project#5456)

Co-authored-by: MohdSiddique Bagwan <mohdsiddique.bagwan@gslab.com>

* chore(0.8.42): update breaking changes doc (datahub-project#5563)

* fix(ingest): activate mypy support for ParamSpec typing annotation (datahub-project#5551)

* (chore): upgrading ingestion to 0.8.42 (datahub-project#5562)

* fix(gms): ensure directory is present (datahub-project#5568)

* fix(ci): flaky smoke test fix (datahub-project#5569)

* fix(gms): missing directory for gms (datahub-project#5570)

* chore(build): tweak stale issue timing (datahub-project#5571)

* feat(ui): Batch set & unset Domain for assets via the UI (datahub-project#5560)

* extending assertion std model (datahub-project#5575)

* feat(ui): Support batch deprecation from the UI (Batch actions part 6/7) (datahub-project#5572)

* feat(graphql): add MutableTypeBatchResolver (datahub-project#4976)

* feat(ingestion) Implement secrets in new managed ingestion form (datahub-project#5574)

* fix(ui): Fixing batch set domains bug (datahub-project#5580)

* chore(gradle): update node version for docs site (datahub-project#5581)

* feat(test): add read-only smoke tests (datahub-project#5558)

* feat(ingestion) Add Save & Run button to managed ingestion builder (datahub-project#5579)

* fix(ingest): handle when current server version is unavailable (datahub-project#5547)

* feat(ingest): dbt - control over emitting test_results, test_definitions, etc. (datahub-project#5328)

Co-authored-by: Piotr Sierkin <piotr.sierkin@getindata.com>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>

* feat(datahub-client): add java file emitter (datahub-project#5578)

Co-authored-by: Shirshanka Das <shirshanka@apache.org>

* feat(ingest): infer aspectName from aspect type in MCP (datahub-project#5566)

* fix(ingest): sql-common - db2, snowflake bug fixes to extract table descriptions (datahub-project#5526)

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>

* fix(ingest): moving delta-lake connector to be 3.7+ only (datahub-project#5584)

* feat(ingest): delta-lake - extract table history into operation aspect (datahub-project#5277)

Co-authored-by: Shirshanka Das <shirshanka@apache.org>

* fix apache ranger plugin readme file rendering (datahub-project#5585)

* feat(ui): make container description searchable and have description show up in results (datahub-project#5586)

* fix(groups): fix user, search, and preview group membership to be fetched for both external and native group memberships (datahub-project#5587)

* feat(ingest): power-bi - make ownership ingestion optional (datahub-project#5335)


Co-authored-by: MohdSiddique Bagwan <mohdsiddique.bagwan@gslab.com>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>

* Expose catalog_name in athena.py (datahub-project#5548)

* expose catalog_name to the sql alchemy uri that is passed into pyathena

Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>

* Fix profiling when using {table}. (datahub-project#5531)

* profiling fix for when using {table}

Co-authored-by: Shirshanka Das <shirshanka@apache.org>
Co-authored-by: Ravindra Lanka <rlanka@acryl.io>

* feat(ui): Support batch deleting from ui (datahub-project#5582)

* feat(ingest): clickhouse - add metadata modification time and data size (datahub-project#5330)

Co-authored-by: Ravindra Lanka <rlanka@acryl.io>

* feat(ui): Add rich UI ingestion run summary (datahub-project#5577)

* fix(ci): smoke test less flaky, add src, dev dep in smoke image (datahub-project#5594)

* updated mock custom to pass the test suite

* added env for mysql-setup for smoketest to pass

* added env for mysql-setup for smoketest to pass

* added env for mysql-setup for smoketest to pass

* push to heruko repo instead of linkedin

Co-authored-by: Aseem Bansal <asmbansal2@gmail.com>
Co-authored-by: Tamas Nemeth <treff7es@gmail.com>
Co-authored-by: Michael A. Schlosser <mikeschlosser16@gmail.com>
Co-authored-by: John Joyce <john@acryl.io>
Co-authored-by: Pedro Silva <pedro.cls93@gmail.com>
Co-authored-by: Chris Collins <chriscollins3456@gmail.com>
Co-authored-by: Chris Collins <chriscollins@Chriss-MBP-2-192.lan>
Co-authored-by: Mugdha Hardikar <mugdha.hardikar@gslab.com>
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
Co-authored-by: Chris Collins <chriscollins@Chriss-MBP-2-193.lan>
Co-authored-by: Maggie Hays <maggiem.hays@gmail.com>
Co-authored-by: Ankit keshari <86347578+Ankit-Keshari-Vituity@users.noreply.github.com>
Co-authored-by: Gabe Lyons <itsgabelyons@gmail.com>
Co-authored-by: liyuhui666 <71497399+liyuhui666@users.noreply.github.com>
Co-authored-by: Tengis Batsaikhan <tengee0411@gmail.com>
Co-authored-by: Chris Collins <chriscollins@Chriss-MBP-2.lan>
Co-authored-by: Pedro Silva <pedro@acryl.io>
Co-authored-by: Mayuri Nehate <33225191+mayurinehate@users.noreply.github.com>
Co-authored-by: dougpm <60357516+dougpm@users.noreply.github.com>
Co-authored-by: Vincent Koc <koconder@users.noreply.github.com>
Co-authored-by: Aditya Radhakrishnan <aditya@acryl.io>
Co-authored-by: Amanda Ng <10681923+ngamanda@users.noreply.github.com>
Co-authored-by: Chris Collins <chriscollins@Chriss-MacBook-Pro-2.local>
Co-authored-by: Justin Marozas <justin.marozas@ext.gresearch.co.uk>
Co-authored-by: Sergio Gómez Villamor <sgomezvillamor@gmail.com>
Co-authored-by: Navin Sharma <103643430+NavinSharma13@users.noreply.github.com>
Co-authored-by: Aezo <45879156+aezomz@users.noreply.github.com>
Co-authored-by: abiwill <abhi13101993@gmail.com>
Co-authored-by: Felix Lüdin <13187726+Masterchen09@users.noreply.github.com>
Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Co-authored-by: Chris Collins <chriscollins@Chriss-MBP-2-280.lan>
Co-authored-by: leifker <leifker@users.noreply.github.com>
Co-authored-by: Alexey Kravtsov <Havok.08@mail.ru>
Co-authored-by: Tim Costa <tim@timcosta.io>
Co-authored-by: Tim Costa <timcosta@amazon.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Guillaume Gardey <glinmac@gmail.com>
Co-authored-by: Vishal Shah <vshah@etsy.com>
Co-authored-by: mohdsiddique <mohdsiddiquebagwan@gmail.com>
Co-authored-by: Salih Can <salih.can@udemy.com>
Co-authored-by: RyanHolstien <RyanHolstien@users.noreply.github.com>
Co-authored-by: Skyler Sinclair <skyler.r.sinclair@gmail.com>
Co-authored-by: Dan Andreescu <dan.andreescu@gmail.com>
Co-authored-by: MohdSiddique Bagwan <mohdsiddique.bagwan@gslab.com>
Co-authored-by: Ravindra Lanka <rlanka@acryl.io>
Co-authored-by: Marcin Szymański <ms32035@gmail.com>
Co-authored-by: xiphl <50935738+xiphl@users.noreply.github.com>
Co-authored-by: NoahFournier <63198198+NoahFournier@users.noreply.github.com>
Co-authored-by: Piotr Sierkin <psierkin@gmail.com>
Co-authored-by: Piotr Sierkin <piotr.sierkin@getindata.com>
Co-authored-by: Jordan Wolinsky <jordan@zephyrai.bio>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata on-deck PR or Issue that will be reviewed and/or addressed by the DataHub Maintainers in future cycles
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants