[Bugfix] Populate table name from the identifier in Iceberg conversion #630

jalpan-randeri · 2025-01-28T08:33:37Z

What is the purpose of the pull request

While converting the iceberg table to other format such as Hudi, the icerberg source table do not populate the table name. This is due to iceberg table's behavior as it is treated as Hadoop tables. This leads to table identified as table-location, leading to confusing conversation.

BUG= #494

Brief change log

This commit handles the conversation logic, when icebege table manager provides HadoopTable, it populate the table name from provided input TableIdentifier. This ensures that source table name is carried over to the transformation.

Verify this pull request

Updated unit test to cover this scenario.

Manual Verification

#### Step 1. create iceberg table
./bin/spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.7.0\                                                                  
    --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
    --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
    --conf spark.sql.catalog.spark_catalog.type=hive \
    --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.local.type=hadoop \
    --conf spark.sql.catalog.local.warehouse=$PWD/warehouse \
    --conf spark.sql.defaultCatalog=local

spark.sql(""" 
CREATE TABLE prod.db.sample (
 id bigint NOT NULL COMMENT 'unique id',
 data string)
USING iceberg 
""").show()

spark.sql("INSERT INTO prod.db.sample VALUES (1, 'jalpan) ").show()

#### Step 2. create xtable config
$ cat xtable_iceberg.yaml

sourceFormat: ICEBERG
targetFormats:
  - HUDI
datasets:
  -
    tableBasePath: /Users/jalpan/workspace/spark/warehouse/prod/db/sample
    tableName: sample

#### Step 3. perform sync
java -jar xtable-utilities/target/xtable-utilities_2.12-0.2.0-SNAPSHOT-bundled.jar --datasetConfig xtable_iceberg.yaml

#### Step 4. validate 
cat /Users/jalpan/workspace/spark/warehouse/prod/db/sample/.hoodie/hoodie.properties | grep name                                                                                      [14:21:05]
hoodie.table.name=sample

xtable-core/src/main/java/org/apache/xtable/iceberg/IcebergTableManager.java

xtable-core/src/main/java/org/apache/xtable/iceberg/IcebergConversionSource.java

the-other-tim-brown

LGTM! Can you squash down to a single commit?

What is the problem? While converting the iceberg table to other format such as Hudi, the icerberg source table do not populate the table name. This is due to iceberg table's behavior as it is treated as Hadoop tables. This leads to table identified as table-location, leading to confusing conversation. Solution: This commit handles the conversation logic, when icebege table manager provides HadoopTable, it populate the table name from provided input TableIdentifier. This ensures that source table name is carried over to the transformation. Testing: - Added unit test to cover this scenario. Co-authored-by: Tim Brown <tim.brown126@gmail.com>

ashvina · 2025-02-13T06:47:06Z

xtable-core/src/main/java/org/apache/xtable/iceberg/IcebergConversionSource.java

+        .name(
+            iceTable.name().contains(iceTable.location())
+                ? sourceTableConfig.getName()
+                : iceTable.name())


Could you please confirm if existing tests cases cover both the paths?

jalpan-randeri mentioned this pull request Jan 28, 2025

Wrong hoodie.table.name generated (iceberg->hudi) #494

Open

4 tasks

jalpan-randeri changed the title ~~[Bugfix] Populate table name from the identifier in Iceberge conversion~~ [Bugfix] Populate table name from the identifier in Iceberg conversion Jan 28, 2025

the-other-tim-brown reviewed Jan 30, 2025

View reviewed changes

xtable-core/src/main/java/org/apache/xtable/iceberg/IcebergTableManager.java Outdated Show resolved Hide resolved

the-other-tim-brown reviewed Feb 10, 2025

View reviewed changes

xtable-core/src/main/java/org/apache/xtable/iceberg/IcebergConversionSource.java Show resolved Hide resolved

the-other-tim-brown reviewed Feb 13, 2025

View reviewed changes

xtable-core/src/main/java/org/apache/xtable/iceberg/IcebergConversionSource.java Outdated Show resolved Hide resolved

the-other-tim-brown approved these changes Feb 13, 2025

View reviewed changes

jalpan-randeri force-pushed the jalpan/bugfix-iceberg-filebased-name branch from c357040 to 828112b Compare February 13, 2025 03:14

ashvina requested changes Feb 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Populate table name from the identifier in Iceberg conversion #630

[Bugfix] Populate table name from the identifier in Iceberg conversion #630

jalpan-randeri commented Jan 28, 2025 •

edited

Loading

the-other-tim-brown left a comment

ashvina Feb 13, 2025

[Bugfix] Populate table name from the identifier in Iceberg conversion #630

Are you sure you want to change the base?

[Bugfix] Populate table name from the identifier in Iceberg conversion #630

Conversation

jalpan-randeri commented Jan 28, 2025 • edited Loading

What is the purpose of the pull request

Brief change log

Verify this pull request

Manual Verification

the-other-tim-brown left a comment

Choose a reason for hiding this comment

ashvina Feb 13, 2025

Choose a reason for hiding this comment

jalpan-randeri commented Jan 28, 2025 •

edited

Loading