Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flink: Ignore the Forbidden when creating a database #7795

Merged
merged 5 commits into from
Jun 26, 2023

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Jun 7, 2023

It can be that the user that runs a Flink job, doesn't have the privileges to create a database. In that case we just assume that it already exists.

It can be that the user that runs a Flink job, doesn't have
the privileges to create a database. In that case we just
assume that it already exists.
@Fokko Fokko requested a review from stevenzwu June 7, 2023 15:04
@github-actions github-actions bot added the flink label Jun 7, 2023
@ConeyLiu
Copy link
Contributor

Looks reasonable to me.

if (!databaseExists(defaultDatabase)) {
try {
createDatabase(getDefaultDatabase(), ImmutableMap.of(), true);
} catch (DatabaseAlreadyExistException e) {
Copy link
Contributor

@stevenzwu stevenzwu Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also a little weird since we already did the dataExists check earlier. I understand why this is needed to handle race condition when multiple jobs can run this code path concurrently.

The first attempt of ignoring ForbiddenException was still not ideal to me because it can mask real issue that default database wasn't created due to permission issue.

I checked the HiveCatalog and GenericInMemoryCatalog implementation from Flink code. They don't try to create the default database in the open method. Flink HiveCatalog just assert that default database exists. That behavior seems reasonable to me.

cc @pvary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting removing the create database call? The person that initially reported this issue, was very supprised that this call was in there. So I think that removing the call seems reasonable to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I was thinking about remove the create database call and just add a Preconditions check that the default database exists. that is also consistent with Flink's HiveCatalog implementation.

The Preconditions check is also debatable. I am not very clear about the expectation of catalog. is the default database always expected/required? not sure if @rdblue has any input.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be okay with removing the check, although I think as long as attempting to create it doesn't cause a failure it is probably slightly better to create it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fokko I am still leaning toward removing the create. This is a Flink Catalog impl (not Iceberg catalog). I think we can stick with Flink HiveCatalog style, where open method doesn't create the default database automatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevenzwu I agree. I would not expect writing data would also create a database. I'll update the PR

Copy link
Contributor Author

@Fokko Fokko Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the tests as well. I think this one is good to go @stevenzwu

@stevenzwu stevenzwu merged commit f5bb0c0 into apache:master Jun 26, 2023
@stevenzwu
Copy link
Contributor

thanks @Fokko for the fix. thanks @rdblue and @ConeyLiu for the review

@Fokko Fokko deleted the fd-consume-flink-error branch June 26, 2023 18:05
Fokko added a commit to Fokko/iceberg that referenced this pull request Jul 11, 2023
nastra pushed a commit to nastra/iceberg that referenced this pull request Jul 18, 2023
nastra pushed a commit to nastra/iceberg that referenced this pull request Aug 15, 2023
laithalzyoud added a commit to revolut-engineering/iceberg that referenced this pull request Jan 30, 2024
* Hive: Set commit state as Unknown before throwing CommitStateUnknownException (apache#7931) (apache#8029)

* Spark 3.4: WAP branch not propagated when using DELETE without WHERE (apache#7900) (apache#8028)

* Core: Include all reachable snapshots with v1 format and REF snapshot mode (apache#7621) (apache#8027)

* Spark 3.3: Backport 'WAP branch not propagated when using DELETE without WHERE' (apache#8033) (apache#8036)

* Flink: remove the creation of default database in FlinkCatalog open method (apache#7795) (apache#8039)

* Core: Handle optional fields (apache#8050) (apache#8064)

* Core: Handle allow optional fields

We expect:

- current-snapshot-id
- properties
- snapshots

to be there, but they are actually optional.

* Use AssertJ

* Core: Abort file groups should be under same lock as committerService (apache#7933) (apache#8060)

* Spark 3.4: Fix rewrite_position_deletes for certain partition types (apache#8059)

* Spark 3.3: Fix rewrite_position_deletes for certain partition types (apache#8059) (apache#8069)

* Spark: Add actions for disater recovery.

* Fix the compile error.

* Fix merge conflicts and formatting

* All tests are working and code integrated with Spark 3.3

* Fix union error and snapshots test

* Fix Spark broadcast error

* Add RewritePositionDeleteFilesSparkAction

---------

Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com>
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: Xianyang Liu <liu-xianyang@hotmail.com>
Co-authored-by: Szehon Ho <szehon.apache@gmail.com>
Co-authored-by: Yufei Gu <yufei_gu@apple.com>
Co-authored-by: yufeigu <yufei@apache.org>
Co-authored-by: Laith Alzyoud <laith.alzyoud@revolut.com>
Co-authored-by: vaultah <4944562+vaultah@users.noreply.github.com>
rodmeneses pushed a commit to rodmeneses/iceberg that referenced this pull request Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants