From 51d24134d1d5df267a0f51c93aa84f6af7cc7df6 Mon Sep 17 00:00:00 2001 From: Marius Grama Date: Mon, 8 Aug 2022 18:00:09 +0200 Subject: [PATCH] Document SQL Support for the Iceberg connector --- docs/src/main/sphinx/connector/iceberg.rst | 230 +++++++++++++++++---- 1 file changed, 192 insertions(+), 38 deletions(-) diff --git a/docs/src/main/sphinx/connector/iceberg.rst b/docs/src/main/sphinx/connector/iceberg.rst index a12c080061be..3309f7f3e040 100644 --- a/docs/src/main/sphinx/connector/iceberg.rst +++ b/docs/src/main/sphinx/connector/iceberg.rst @@ -50,6 +50,9 @@ metastore service (HMS) or AWS Glue. The catalog type is determined by the ``iceberg.catalog.type`` property, it can be set to either ``HIVE_METASTORE`` or ``GLUE``. + +.. _iceberg-hive-catalog: + Hive metastore catalog ^^^^^^^^^^^^^^^^^^^^^^ @@ -64,6 +67,8 @@ configuration properties as the Hive connector. At a minimum, connector.name=iceberg hive.metastore.uri=thrift://localhost:9083 +.. _iceberg-glue-catalog: + Glue catalog ^^^^^^^^^^^^ @@ -222,25 +227,150 @@ Iceberg. In addition to the :ref:`globally available ` and :ref:`read operation ` statements, the connector supports the following features: -* :doc:`/sql/insert` -* :doc:`/sql/delete`, see also :ref:`iceberg-delete` -* :doc:`/sql/update` -* :doc:`/sql/merge` -* :ref:`sql-schema-table-management`, see also :ref:`iceberg-tables` -* :ref:`sql-materialized-view-management`, see also - :ref:`iceberg-materialized-views` -* :ref:`sql-view-management` +* :ref:`sql-write-operations`: + + * :ref:`iceberg-schema-table-management` and :ref:`iceberg-tables` + * :ref:`iceberg-data-management` + * :ref:`sql-view-management` + * :ref:`sql-materialized-view-management`, see also :ref:`iceberg-materialized-views` + +.. _iceberg-schema-table-management: + +Schema and table management +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The :ref:`sql-schema-table-management` functionality includes support for: + +* :doc:`/sql/create-schema` +* :doc:`/sql/drop-schema` +* :doc:`/sql/alter-schema` +* :doc:`/sql/create-table` +* :doc:`/sql/create-table-as` +* :doc:`/sql/drop-table` +* :doc:`/sql/alter-table` +* :doc:`/sql/comment` + +.. _iceberg-create-schema: + +CREATE SCHEMA +~~~~~~~~~~~~~ + +The connector supports creating schemas. You can create a schema with or without +a specified location. + +You can create a schema with the :doc:`/sql/create-schema` statement and the +``location`` schema property. The tables in this schema, which have no explicit +``location`` set in :doc:`/sql/create-table` statement, are located in a +subdirectory under the directory corresponding to the schema location. + +Create a schema on S3:: + + CREATE SCHEMA iceberg.my_s3_schema + WITH (location = 's3://my-bucket/a/path/'); + +Create a schema on a S3 compatible object storage such as MinIO:: + + CREATE SCHEMA iceberg.my_s3a_schema + WITH (location = 's3a://my-bucket/a/path/'); + +Create a schema on HDFS:: + + CREATE SCHEMA iceberg.my_hdfs_schema + WITH (location='hdfs://hadoop-master:9000/user/hive/warehouse/a/path/'); + +Optionally, on HDFS, the location can be omitted:: + + CREATE SCHEMA iceberg.my_hdfs_schema; + +.. _iceberg-create-table: + +Creating tables +~~~~~~~~~~~~~~~ + +The Iceberg connector supports creating tables using the :doc:`CREATE +TABLE ` syntax. Optionally specify the +:ref:`table properties ` supported by this connector:: + + CREATE TABLE my_table ( + c1 integer, + c2 date, + c3 double + ) + WITH ( + format = 'PARQUET', + partitioning = ARRAY['c1', 'c2'], + location = 's3://my-bucket/a/path/' + ); + +When the ``location`` table property is omitted, the content of the table +is stored in a subdirectory under the directory corresponding to the +schema location. + +The Iceberg connector supports creating tables using the :doc:`CREATE +TABLE AS ` with :doc:`SELECT ` syntax:: + + CREATE TABLE tiny_nation + WITH ( + format = 'PARQUET' + ) + AS + SELECT * + FROM nation + WHERE nationkey < 10; + +Another flavor of creating tables with :doc:`CREATE TABLE AS ` +is with :doc:`VALUES ` syntax:: + + CREATE TABLE yearly_clicks ( + year, + clicks + ) + WITH ( + partitioning = ARRAY['year'] + ) + AS VALUES + (2021, 10000), + (2022, 20000); + +``NOT NULL`` column constraint +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The Iceberg connector supports setting ``NOT NULL`` constraints on the table columns. + +The ``NOT NULL`` constraint can be set on the columns, while creating tables by +using the :doc:`CREATE TABLE ` syntax:: + + CREATE TABLE my_table ( + year INTEGER NOT NULL, + name VARCHAR NOT NULL, + age INTEGER, + address VARCHAR + ); + +When trying to insert/update data in the table, the query fails if trying +to set ``NULL`` value on a column having the ``NOT NULL`` constraint. + +DROP TABLE +~~~~~~~~~~ + +The Iceberg connector supports dropping a table by using the :doc:`/sql/drop-table` +syntax. When the command succeeds, both the data of the Iceberg table and also the +information related to the table in the metastore service are removed. +Dropping tables which have their data/metadata stored in a different location than +the table's corresponding base directory on the object store is not supported. + +.. _iceberg-comment: .. _iceberg-alter-table-execute: ALTER TABLE EXECUTE -^^^^^^^^^^^^^^^^^^^ +~~~~~~~~~~~~~~~~~~~ The connector supports the following commands for use with :ref:`ALTER TABLE EXECUTE `. optimize -~~~~~~~~ +"""""""" The ``optimize`` command is used for rewriting the active content of the specified table so that it is merged into fewer but @@ -274,7 +404,7 @@ to the filter: WHERE partition_key = 1 expire_snapshots -~~~~~~~~~~~~~~~~ +"""""""""""""""" The ``expire_snapshots`` command removes all snapshots and all related metadata and data files. Regularly expiring snapshots is recommended to delete data files that are no longer needed, @@ -293,7 +423,7 @@ otherwise the procedure will fail with similar message: The default value for this property is ``7d``. remove_orphan_files -~~~~~~~~~~~~~~~~~~~ +""""""""""""""""""" The ``remove_orphan_files`` command removes all files from table's data directory which are not linked from metadata files and that are older than the value of ``retention_threshold`` parameter. @@ -327,7 +457,7 @@ This is an experimental command to remove extended statistics from the table. .. _iceberg-alter-table-set-properties: ALTER TABLE SET PROPERTIES -^^^^^^^^^^^^^^^^^^^^^^^^^^ +~~~~~~~~~~~~~~~~~~~~~~~~~~ The connector supports modifying the properties on existing tables using :ref:`ALTER TABLE SET PROPERTIES `. @@ -352,7 +482,53 @@ Or to set the column ``my_new_partition_column`` as a partition column on a tabl The current values of a table's properties can be shown using :doc:`SHOW CREATE TABLE `. -.. _iceberg-type-mapping: +COMMENT +~~~~~~~ + +The Iceberg connector supports setting comments on the following objects: + +- tables +- views +- table columns + +The ``COMMENT`` option is supported on both the table and +the table columns for the :doc:`/sql/create-table` operation. + +The ``COMMENT`` option is supported for adding table columns +through the :doc:`/sql/alter-table` operations. + +The connector supports the command :doc:`COMMENT ` for setting +comments on existing entities. + +.. _iceberg-data-management: + +Data management +^^^^^^^^^^^^^^^ + +The :ref:`sql-data-management` functionality includes support for ``INSERT``, +``UPDATE``, ``DELETE``, and ``MERGE`` statements. + +.. _iceberg-delete: + +Deletion by partition +~~~~~~~~~~~~~~~~~~~~~ + +For partitioned tables, the Iceberg connector supports the deletion of entire +partitions if the ``WHERE`` clause specifies filters only on the identity-transformed +partitioning columns, that can match entire partitions. Given the table definition +from :ref:`Partitioned Tables ` section, +the following SQL statement deletes all partitions for which ``country`` is ``US``:: + + DELETE FROM iceberg.testdb.customer_orders + WHERE country = 'US' + +A partition delete is performed if the ``WHERE`` clause meets these conditions. + +Row level deletion +~~~~~~~~~~~~~~~~~~ + +Tables using v2 of the Iceberg specification support deletion of individual rows +by writing position delete files. Type mapping ------------ @@ -513,30 +689,8 @@ In this example, the table is partitioned by the month of ``order_date``, a hash country VARCHAR) WITH (partitioning = ARRAY['month(order_date)', 'bucket(account_number, 10)', 'country']) -.. _iceberg-delete: - -Deletion by partition -^^^^^^^^^^^^^^^^^^^^^ - -For partitioned tables, the Iceberg connector supports the deletion of entire -partitions if the ``WHERE`` clause specifies filters only on the identity-transformed -partitioning columns, that can match entire partitions. Given the table definition -above, this SQL will delete all partitions for which ``country`` is ``US``:: - - DELETE FROM iceberg.testdb.customer_orders - WHERE country = 'US' - -Tables using either v1 or v2 of the Iceberg specification will perform a partition -delete if the ``WHERE`` clause meets these conditions. - -Row level deletion -^^^^^^^^^^^^^^^^^^ - -Tables using v2 of the Iceberg specification support deletion of individual rows -by writing position delete files. - -Snapshots ---------- +Rolling back to a previous snapshot +----------------------------------- Iceberg supports a "snapshot" model of data, where table snapshots are identified by a snapshot ID.