[DOP-13252] Improve Oracle documentation

MobileTeleSystems · Mar 12, 2024 · 02ba65a · 02ba65a
1 parent b5a577a
commit 02ba65a
Show file tree

Hide file tree

Showing 24 changed files with 885 additions and 142 deletions.
diff --git a/docs/changelog/next_release/211.improvement.rst b/docs/changelog/next_release/211.improvement.rst
@@ -1,3 +1,5 @@
 Improve Clickhouse documentation:
   * Add "Types" section describing mapping between Clickhouse and Spark types
   * Add "Prerequisites" section describing different aspects of connecting to Clickhouse
+  * Separate documentation of ``DBReader`` and ``Clickhouse.sql``
+  * Add examples for ``Clickhouse.fetch`` and ``Clickhouse.execute``
diff --git a/docs/changelog/next_release/228.improvement.rst b/docs/changelog/next_release/228.improvement.rst
@@ -1,4 +1,5 @@
 Improve Greenplum documentation:
   * Add "Types" section describing mapping between Greenplum and Spark types
   * Add more examples of reading and writing data from Greenplum
-  * Add notes about issues with IP resolution and building ``gpfdist`` URL.
+  * Add examples for ``Greenplum.fetch`` and ``Greenplum.execute``
+  * Add notes about issues with IP resolution and building ``gpfdist`` URL
diff --git a/docs/changelog/next_release/229.improvement.rst b/docs/changelog/next_release/229.improvement.rst
@@ -1,4 +1,5 @@
 Improve Postgres documentation:
   * Add "Types" section describing mapping between Postgres and Spark types
   * Add "Prerequisites" section describing different aspects of connecting to Postgres
-  * Separate documentation of DBReader and Postgres.sql
+  * Separate documentation of ``DBReader`` and ``Postgres.sql``
+  * Add examples for ``Postgres.fetch`` and ``Postgres.execute``
diff --git a/docs/changelog/next_release/233.improvement.rst b/docs/changelog/next_release/233.improvement.rst
@@ -0,0 +1,5 @@
+Improve Oracle documentation:
+  * Add "Types" section describing mapping between Oracle and Spark types
+  * Add "Prerequisites" section describing different aspects of connecting to Oracle
+  * Separate documentation of ``DBReader`` and ``Oracle.sql``
+  * Add examples for ``Oracle.fetch`` and ``Oracle.execute``
diff --git a/docs/connection/db_connection/clickhouse/execute.rst b/docs/connection/db_connection/clickhouse/execute.rst
@@ -23,11 +23,11 @@ Syntax support
 
 This method supports **any** query syntax supported by Clickhouse, like:
 
-* ``SELECT ... FROM ...``
-* ``WITH alias AS (...) SELECT ...``
-* ``SHOW ...``
-
-It does not support multiple queries in the same operation, like ``SET ...; SELECT ...;``.
+* ✅︎ ``SELECT ... FROM ...``
+* ✅︎ ``WITH alias AS (...) SELECT ...``
+* ✅︎ ``SELECT func(arg1, arg2)`` - call function
+* ✅︎ ``SHOW ...``
+* ❌ ``SET ...; SELECT ...;`` - multiple statements not supported
 
 Examples
 ^^^^^^^^
@@ -59,12 +59,12 @@ Syntax support
 
 This method supports **any** query syntax supported by Clickhouse, like:
 
-* ``CREATE TABLE ...``
-* ``ALTER ...``
-* ``INSERT INTO ... AS SELECT ...``
-* etc
-
-It does not support multiple queries in the same operation, like ``SET ...; CREATE TABLE ...;``.
+* ✅︎ ``CREATE TABLE ...``, ``CREATE VIEW ...``, and so on
+* ✅︎ ``ALTER ...``
+* ✅︎ ``INSERT INTO ... AS SELECT ...``
+* ✅︎ ``DROP TABLE ...``, ``DROP VIEW ...``, and so on
+* ✅︎ other statements not mentioned here
+* ❌ ``SET ...; SELECT ...;`` - multiple statements not supported
 
 Examples
 ^^^^^^^^

diff --git a/docs/connection/db_connection/clickhouse/prerequisites.rst b/docs/connection/db_connection/clickhouse/prerequisites.rst
@@ -46,7 +46,7 @@ used for creating a connection:
 
     .. code-tab:: sql Read + Write
 
-        -- allow external tables in the same schema as target table
+        -- allow creating tables in the target schema
         GRANT CREATE TABLE ON myschema.* TO username;
 
         -- allow read & write access to specific table
@@ -57,13 +57,4 @@ used for creating a connection:
         -- allow read access to specific table
         GRANT SELECT ON myschema.mytable TO username;
 
-    .. code-tab:: sql Write only
-
-        -- allow external tables in the same schema as target table
-        GRANT CREATE TABLE ON myschema.* TO username;
-
-        -- allow read access to specific table (to get column types)
-        -- allow write access to specific table
-        GRANT SELECT, INSERT ON myschema.mytable TO username;
-
 More details can be found in `official documentation <https://clickhouse.com/docs/en/sql-reference/statements/grant>`_.
diff --git a/docs/connection/db_connection/clickhouse/sql.rst b/docs/connection/db_connection/clickhouse/sql.rst
@@ -18,7 +18,7 @@ Syntax support
 Only queries with the following syntax are supported:
 
 * ``SELECT ...``
-* ``WITH ... SELECT ...``
+* ``WITH alias AS (...) SELECT ...``
 
 Queries like ``SHOW ...`` are not supported.
 

diff --git a/docs/connection/db_connection/clickhouse/types.rst b/docs/connection/db_connection/clickhouse/types.rst
@@ -17,8 +17,8 @@ This is how Clickhouse connector performs this:
 * Find corresponding ``Clickhouse type (read)`` -> ``Spark type`` combination (see below) for each DataFrame column. If no combination is found, raise exception.
 * Create DataFrame from query with specific column names and Spark types.
 
-Writing to some existing Clickhuse table
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Writing to some existing Clickhouse table
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 This is how Clickhouse connector performs this:
 
@@ -113,6 +113,8 @@ Here you can find source code with type conversions:
 Supported types
 ---------------
 
+See `official documentation <https://clickhouse.com/docs/en/sql-reference/data-types>`_
+
 Generic types
 ~~~~~~~~~~~~~
 
@@ -243,7 +245,7 @@ Note: ``DateTime(P, TZ)`` has the same precision as ``DateTime(P)``.
 
 .. [5]
     Generic JDBC dialect generates DDL with Clickhouse type ``TIMESTAMP`` which is alias for ``DateTime32`` with precision up to seconds (``23:59:59``).
-    Inserting data with milliseconds precision (``23:59:59.999``) will lead to throwing away milliseconds (``23:59:59``).
+    Inserting data with milliseconds precision (``23:59:59.999``) will lead to **throwing away milliseconds**.
 
 .. [6]
     Clickhouse will raise an exception that data in format ``2001-01-01 23:59:59.999999`` has data ``.999999`` which does not match format ``YYYY-MM-DD hh:mm:ss``.
@@ -304,8 +306,11 @@ This dialect does not have type conversion between some types, like Clickhouse `
 
 The is a way to avoid this - just cast everything to ``String``.
 
-Read unsupported column type
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Explicit type cast
+------------------
+
+``DBReader``
+~~~~~~~~~~~~
 
 Use ``CAST`` or ``toJSONString`` to get column data as string in JSON format,
 and then cast string column in resulting dataframe to proper type using `from_json <https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.from_json.html>`_:
@@ -332,8 +337,8 @@ and then cast string column in resulting dataframe to proper type using `from_js
         from_json(df.array_column, column_type).alias("array_column"),
     )
 
-Write unsupported column type
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+``DBWriter``
+~~~~~~~~~~~~
 
 Convert dataframe column to JSON using `to_json <https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.to_json.html>`_,
 and write it as ``String`` column in Clickhouse:
@@ -342,7 +347,7 @@ and write it as ``String`` column in Clickhouse:
 
     clickhouse.execute(
         """
-        CREATE TABLE target_tbl AS (
+        CREATE TABLE default.target_tbl AS (
             id Int32,
             array_column_json String,
         )
@@ -360,7 +365,7 @@ and write it as ``String`` column in Clickhouse:
 
     writer.run(df)
 
-Then you can parse this column on Clickhouse side:
+Then you can parse this column on Clickhouse side - for example, by creating a view:
 
 .. code:: sql
 

diff --git a/docs/connection/db_connection/greenplum/execute.rst b/docs/connection/db_connection/greenplum/execute.rst
@@ -28,12 +28,10 @@ Syntax support
 
 This method supports **any** query syntax supported by Greenplum, like:
 
-* ``SELECT ... FROM ...``
-* ``WITH alias AS (...) SELECT ...``
-
-Queries like ``SHOW ...`` are not supported.
-
-It does not support multiple queries in the same operation, like ``SET ...; SELECT ...;``.
+* ✅︎ ``SELECT ... FROM ...``
+* ✅︎ ``WITH alias AS (...) SELECT ...``
+* ✅︎ ``SELECT func(arg1, arg2)`` or ``{call func(arg1, arg2)}`` - special syntax for calling functions
+* ❌ ``SET ...; SELECT ...;`` - multiple statements not supported
 
 Examples
 ^^^^^^^^
@@ -65,15 +63,14 @@ Syntax support
 
 This method supports **any** query syntax supported by Greenplum, like:
 
-* ``CREATE TABLE ...``, ``CREATE VIEW ...``, and so on
-* ``ALTER ...``
-* ``INSERT INTO ... AS SELECT ...``
-* ``DROP TABLE ...``, ``DROP VIEW ...``, and so on
-* ``CALL procedure(arg1, arg2) ...``
-* ``SELECT func(arg1, arg2)`` or ``{call func(arg1, arg2)}`` - special syntax for calling functions
-* etc
-
-It does not support multiple queries in the same operation, like ``SET ...; CREATE TABLE ...;``.
+* ✅︎ ``CREATE TABLE ...``, ``CREATE VIEW ...``, and so on
+* ✅︎ ``ALTER ...``
+* ✅︎ ``INSERT INTO ... AS SELECT ...``
+* ✅︎ ``DROP TABLE ...``, ``DROP VIEW ...``, and so on
+* ✅︎ ``CALL procedure(arg1, arg2) ...``
+* ✅︎ ``SELECT func(arg1, arg2)`` or ``{call func(arg1, arg2)}`` - special syntax for calling functions
+* ✅︎ other statements not mentioned here
+* ❌ ``SET ...; SELECT ...;`` - multiple statements not supported
 
 Examples
 ^^^^^^^^

diff --git a/docs/connection/db_connection/greenplum/types.rst b/docs/connection/db_connection/greenplum/types.rst
@@ -23,7 +23,7 @@ This is how Greenplum connector performs this:
     Yes, **all columns of a table**, not just selected ones.
     This means that if source table **contains** columns with unsupported type, the entire table cannot be read.
 
-Writing to some existing Clickhuse table
+Writing to some existing Greenplum table
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 This is how Greenplum connector performs this:
@@ -33,7 +33,7 @@ This is how Greenplum connector performs this:
 * For each column in query result get column name and Greenplum type.
 * Match table columns with DataFrame columns (by name, case insensitive).
   If some column is present only in target table, but not in DataFrame (like ``DEFAULT`` or ``SERIAL`` column), and vice versa, raise an exception.
-  See `Write unsupported column type`_.
+  See `Explicit type cast`_.
 * Find corresponding ``Spark type`` -> ``Greenplumtype (write)`` combination (see below) for each DataFrame column. If no combination is found, raise exception.
 * If ``Greenplumtype (write)`` match ``Greenplum type (read)``, no additional casts will be performed, DataFrame column will be written to Greenplum as is.
 * If ``Greenplumtype (write)`` does not match ``Greenplum type (read)``, DataFrame column will be casted to target column type **on Greenplum side**. For example, you can write column with text data to ``json`` column which Greenplum connector currently does not support.
@@ -272,8 +272,11 @@ Columns of these types cannot be read/written by Spark:
 
 The is a way to avoid this - just cast unsupported types to ``text``. But the way this can be done is not a straightforward.
 
-Read unsupported column type
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Explicit type cast
+------------------
+
+``DBReader``
+~~~~~~~~~~~~
 
 Unfortunately, it is not possible to cast unsupported column to some supported type on ``DBReader`` side:
 
@@ -334,8 +337,8 @@ You can then parse this column on Spark side using `from_json <https://spark.apa
         from_json(df.array_column_as_json, schema).alias("array_column"),
     )
 
-Write unsupported column type
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+``DBWriter``
+~~~~~~~~~~~~
 
 It is always possible to convert data on Spark side to string, and then write it to ``text`` column in Greenplum table.
 

diff --git a/docs/connection/db_connection/oracle/execute.rst b/docs/connection/db_connection/oracle/execute.rst
@@ -1,7 +1,98 @@
 .. _oracle-execute:
 
 Executing statements in Oracle
-==============================
+==================================
+
+How to
+------
+
+There are 2 ways to execute some statement in Oracle
+
+Use :obj:`Oracle.fetch <onetl.connection.db_connection.oracle.connection.Oracle.fetch>`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Use this method to execute some ``SELECT`` query which returns **small number or rows**, like reading
+Oracle config, or reading data from some reference table.
+
+Method accepts :obj:`JDBCOptions <onetl.connection.db_connection.jdbc_mixin.options.JDBCOptions>`.
+
+Connection opened using this method should be then closed with :obj:`Oracle.close <onetl.connection.db_connection.oracle.connection.Oracle.close>`.
+
+Syntax support
+^^^^^^^^^^^^^^
+
+This method supports **any** query syntax supported by Oracle, like:
+
+* ✅︎ ``SELECT ... FROM ...``
+* ✅︎ ``WITH alias AS (...) SELECT ...``
+* ✅︎ ``SELECT func(arg1, arg2) FROM DUAL`` - call function
+* ✅︎ ``SHOW ...``
+* ❌ ``SET ...; SELECT ...;`` - multiple statements not supported
+
+Examples
+^^^^^^^^
+
+.. code-block:: python
+
+    from onetl.connection import Oracle
+
+    oracle = Oracle(...)
+
+    df = oracle.fetch(
+        "SELECT value FROM some.reference_table WHERE key = 'some_constant'",
+        options=Oracle.JDBCOptions(query_timeout=10),
+    )
+    oracle.close()
+    value = df.collect()[0][0]  # get value from first row and first column
+
+Use :obj:`Oracle.execute <onetl.connection.db_connection.oracle.connection.Oracle.execute>`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Use this method to execute DDL and DML operations. Each method call runs operation in a separated transaction, and then commits it.
+
+Method accepts :obj:`JDBCOptions <onetl.connection.db_connection.jdbc_mixin.options.JDBCOptions>`.
+
+Connection opened using this method should be then closed with :obj:`Oracle.close <onetl.connection.db_connection.oracle.connection.Oracle.close>`.
+
+Syntax support
+^^^^^^^^^^^^^^
+
+This method supports **any** query syntax supported by Oracle, like:
+
+* ✅︎ ``CREATE TABLE ...``, ``CREATE VIEW ...``
+* ✅︎ ``ALTER ...``
+* ✅︎ ``INSERT INTO ... AS SELECT ...``
+* ✅︎ ``DROP TABLE ...``, ``DROP VIEW ...``, and so on
+* ✅︎ ``CALL procedure(arg1, arg2) ...`` or ``{call procedure(arg1, arg2)}`` - special syntax for calling procedure
+* ✅︎ ``DECLARE ... BEGIN ... END`` - execute PL/SQL statement
+* ✅︎ other statements not mentioned here
+* ❌ ``SET ...; SELECT ...;`` - multiple statements not supported
+
+Examples
+^^^^^^^^
+
+.. code-block:: python
+
+    from onetl.connection import Oracle
+
+    oracle = Oracle(...)
+
+    with oracle:
+        oracle.execute("DROP TABLE schema.table")
+        oracle.execute(
+            """
+            CREATE TABLE schema.table AS (
+                id bigint GENERATED ALWAYS AS IDENTITY,
+                key VARCHAR2(4000),
+                value NUMBER
+            )
+            """,
+            options=Oracle.JDBCOptions(query_timeout=10),
+        )
+
+
+References
+----------
 
 .. currentmodule:: onetl.connection.db_connection.oracle.connection
 

diff --git a/docs/connection/db_connection/oracle/index.rst b/docs/connection/db_connection/oracle/index.rst
@@ -7,12 +7,20 @@ Oracle
     :maxdepth: 1
     :caption: Connection
 
+    prerequisites
     connection
 
 .. toctree::
     :maxdepth: 1
     :caption: Operations
 
     read
+    sql
     write
     execute
+
+.. toctree::
+    :maxdepth: 1
+    :caption: Troubleshooting
+
+    types