[DOP-13252] Improve Oracle documentation

MobileTeleSystems · Mar 12, 2024 · e45eb71 · e45eb71
1 parent 6476cc1
commit e45eb71
Show file tree

Hide file tree

Showing 23 changed files with 848 additions and 98 deletions.
diff --git a/docs/changelog/next_release/211.improvement.rst b/docs/changelog/next_release/211.improvement.rst
@@ -1,3 +1,5 @@
 Improve Clickhouse documentation:
   * Add "Types" section describing mapping between Clickhouse and Spark types
   * Add "Prerequisites" section describing different aspects of connecting to Clickhouse
+  * Separate documentation of ``DBReader`` and ``Clickhouse.sql``
+  * Add examples for ``Clickhouse.fetch`` and ``Clickhouse.execute``
diff --git a/docs/changelog/next_release/228.improvement.rst b/docs/changelog/next_release/228.improvement.rst
@@ -1,3 +1,4 @@
 Improve Greenplum documentation:
   * Add "Types" section describing mapping between Greenplum and Spark types
   * Add more examples of reading and writing data from Greenplum
+  * Add examples for ``Greenplum.fetch`` and ``Greenplum.execute``
diff --git a/docs/changelog/next_release/229.improvement.rst b/docs/changelog/next_release/229.improvement.rst
@@ -1,4 +1,5 @@
 Improve Postgres documentation:
   * Add "Types" section describing mapping between Postgres and Spark types
   * Add "Prerequisites" section describing different aspects of connecting to Postgres
-  * Separate documentation of DBReader and Postgres.sql
+  * Separate documentation of ``DBReader`` and ``Postgres.sql``
+  * Add examples for ``Postgres.fetch`` and ``Postgres.execute``
diff --git a/docs/changelog/next_release/233.improvement.rst b/docs/changelog/next_release/233.improvement.rst
@@ -0,0 +1,5 @@
+Improve Oracle documentation:
+  * Add "Types" section describing mapping between Oracle and Spark types
+  * Add "Prerequisites" section describing different aspects of connecting to Oracle
+  * Separate documentation of ``DBReader`` and ``Oracle.sql``
+  * Add examples for ``Oracle.fetch`` and ``Oracle.execute``
diff --git a/docs/connection/db_connection/clickhouse/execute.rst b/docs/connection/db_connection/clickhouse/execute.rst
@@ -59,9 +59,11 @@ Syntax support
 
 This method supports **any** query syntax supported by Clickhouse, like:
 
-* ``CREATE TABLE ...``
+* ``CREATE TABLE ...``, ``CREATE VIEW ...``, and so on
 * ``ALTER ...``
 * ``INSERT INTO ... AS SELECT ...``
+* ``DROP TABLE ...``, ``DROP VIEW ...``, and so on
+* ``SELECT func(arg1, arg2)``
 * etc
 
 It does not support multiple queries in the same operation, like ``SET ...; CREATE TABLE ...;``.

diff --git a/docs/connection/db_connection/clickhouse/prerequisites.rst b/docs/connection/db_connection/clickhouse/prerequisites.rst
@@ -46,7 +46,7 @@ used for creating a connection:
 
     .. code-tab:: sql Read + Write
 
-        -- allow external tables in the same schema as target table
+        -- allow creating tables in the target schema
         GRANT CREATE TABLE ON myschema.* TO username;
 
         -- allow read & write access to specific table
@@ -57,13 +57,4 @@ used for creating a connection:
         -- allow read access to specific table
         GRANT SELECT ON myschema.mytable TO username;
 
-    .. code-tab:: sql Write only
-
-        -- allow external tables in the same schema as target table
-        GRANT CREATE TABLE ON myschema.* TO username;
-
-        -- allow read access to specific table (to get column types)
-        -- allow write access to specific table
-        GRANT SELECT, INSERT ON myschema.mytable TO username;
-
 More details can be found in `official documentation <https://clickhouse.com/docs/en/sql-reference/statements/grant>`_.
diff --git a/docs/connection/db_connection/clickhouse/sql.rst b/docs/connection/db_connection/clickhouse/sql.rst
@@ -18,7 +18,7 @@ Syntax support
 Only queries with the following syntax are supported:
 
 * ``SELECT ...``
-* ``WITH ... SELECT ...``
+* ``WITH alias AS (...) SELECT ...``
 
 Queries like ``SHOW ...`` are not supported.
 

diff --git a/docs/connection/db_connection/clickhouse/types.rst b/docs/connection/db_connection/clickhouse/types.rst
@@ -113,6 +113,8 @@ Here you can find source code with type conversions:
 Supported types
 ---------------
 
+See `official documentation <https://clickhouse.com/docs/en/sql-reference/data-types>`_
+
 Generic types
 ~~~~~~~~~~~~~
 
@@ -304,8 +306,11 @@ This dialect does not have type conversion between some types, like Clickhouse `
 
 The is a way to avoid this - just cast everything to ``String``.
 
-Read unsupported column type
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Explicit type cast
+------------------
+
+``DBReader``
+~~~~~~~~~~~~
 
 Use ``CAST`` or ``toJSONString`` to get column data as string in JSON format,
 and then cast string column in resulting dataframe to proper type using `from_json <https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.from_json.html>`_:
@@ -332,8 +337,8 @@ and then cast string column in resulting dataframe to proper type using `from_js
         from_json(df.array_column, column_type).alias("array_column"),
     )
 
-Write unsupported column type
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+``DBWriter``
+~~~~~~~~~~~~
 
 Convert dataframe column to JSON using `to_json <https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.to_json.html>`_,
 and write it as ``String`` column in Clickhouse:
@@ -342,7 +347,7 @@ and write it as ``String`` column in Clickhouse:
 
     clickhouse.execute(
         """
-        CREATE TABLE target_tbl AS (
+        CREATE TABLE default.target_tbl AS (
             id Int32,
             array_column_json String,
         )
@@ -360,7 +365,7 @@ and write it as ``String`` column in Clickhouse:
 
     writer.run(df)
 
-Then you can parse this column on Clickhouse side:
+Then you can parse this column on Clickhouse side - for example, by creating a view:
 
 .. code:: sql
 

diff --git a/docs/connection/db_connection/greenplum/types.rst b/docs/connection/db_connection/greenplum/types.rst
@@ -272,8 +272,11 @@ Columns of these types cannot be read/written by Spark:
 
 The is a way to avoid this - just cast unsupported types to ``text``. But the way this can be done is not a straightforward.
 
-Read unsupported column type
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Explicit type cast
+------------------
+
+``DBReader``
+~~~~~~~~~~~~
 
 Unfortunately, it is not possible to cast unsupported column to some supported type on ``DBReader`` side:
 
@@ -334,8 +337,8 @@ You can then parse this column on Spark side using `from_json <https://spark.apa
         from_json(df.array_column_as_json, schema).alias("array_column"),
     )
 
-Write unsupported column type
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+``DBWriter``
+~~~~~~~~~~~~
 
 It is always possible to convert data on Spark side to string, and then write it to ``text`` column in Greenplum table.
 

diff --git a/docs/connection/db_connection/oracle/execute.rst b/docs/connection/db_connection/oracle/execute.rst
@@ -1,7 +1,100 @@
 .. _oracle-execute:
 
 Executing statements in Oracle
-==============================
+==================================
+
+How to
+------
+
+There are 2 ways to execute some statement in Oracle
+
+Use :obj:`Oracle.fetch <onetl.connection.db_connection.oracle.connection.Oracle.fetch>`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Use this method to execute some ``SELECT`` query which returns **small number or rows**, like reading
+Oracle config, or reading data from some reference table.
+
+Method accepts :obj:`JDBCOptions <onetl.connection.db_connection.jdbc_mixin.options.JDBCOptions>`.
+
+Connection opened using this method should be then closed with :obj:`Oracle.close <onetl.connection.db_connection.oracle.connection.Oracle.close>`.
+
+Syntax support
+^^^^^^^^^^^^^^
+
+This method supports **any** query syntax supported by Oracle, like:
+
+* ``SELECT ... FROM ...``
+* ``WITH alias AS (...) SELECT ...``
+* ``SHOW ...``
+
+It does not support multiple queries in the same operation, like ``SET ...; SELECT ...;``.
+
+Examples
+^^^^^^^^
+
+.. code-block:: python
+
+    from onetl.connection import Oracle
+
+    oracle = Oracle(...)
+
+    df = oracle.fetch(
+        "SELECT value FROM some.reference_table WHERE key = 'some_constant'",
+        options=Oracle.JDBCOptions(query_timeout=10),
+    )
+    oracle.close()
+    value = df.collect()[0][0]  # get value from first row and first column
+
+Use :obj:`Oracle.execute <onetl.connection.db_connection.oracle.connection.Oracle.execute>`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Use this method to execute DDL and DML operations. Each method call runs operation in a separated transaction, and then commits it.
+
+Method accepts :obj:`JDBCOptions <onetl.connection.db_connection.jdbc_mixin.options.JDBCOptions>`.
+
+Connection opened using this method should be then closed with :obj:`Oracle.close <onetl.connection.db_connection.oracle.connection.Oracle.close>`.
+
+Syntax support
+^^^^^^^^^^^^^^
+
+This method supports **any** query syntax supported by Oracle, like:
+
+* ``CREATE TABLE ...``, ``CREATE VIEW ...``
+* ``ALTER ...``
+* ``INSERT INTO ... AS SELECT ...``
+* ``DROP TABLE ...``, ``DROP VIEW ...``, and so on
+* ``CALL procedure(arg1, arg2) ...`` or ``{call procedure(arg1, arg2)}`` - special syntax for calling procedure
+* ``SELECT func(arg1, arg2)``
+* ``DECLARE ... BEGIN ... END`` - for executing PL/SQL statements
+* etc
+
+It does not support multiple queries in the same operation, like ``SET ...; CREATE TABLE ...;``.
+
+Examples
+^^^^^^^^
+
+.. code-block:: python
+
+    from onetl.connection import Oracle
+
+    oracle = Oracle(...)
+
+    with oracle:
+        oracle.execute("DROP TABLE schema.table")
+        oracle.execute(
+            """
+            CREATE TABLE schema.table AS (
+                id bigint GENERATED ALWAYS AS IDENTITY,
+                key VARCHAR2(4000),
+                value NUMBER
+            )
+            """,
+            options=Oracle.JDBCOptions(query_timeout=10),
+        )
+
+
+References
+----------
 
 .. currentmodule:: onetl.connection.db_connection.oracle.connection
 

diff --git a/docs/connection/db_connection/oracle/index.rst b/docs/connection/db_connection/oracle/index.rst
@@ -7,12 +7,20 @@ Oracle
     :maxdepth: 1
     :caption: Connection
 
+    prerequisites
     connection
 
 .. toctree::
     :maxdepth: 1
     :caption: Operations
 
     read
+    sql
     write
     execute
+
+.. toctree::
+    :maxdepth: 1
+    :caption: Troubleshooting
+
+    types
diff --git a/docs/connection/db_connection/oracle/prerequisites.rst b/docs/connection/db_connection/oracle/prerequisites.rst
@@ -0,0 +1,112 @@
+.. _oracle-prerequisites:
+
+Prerequisites
+=============
+
+Version Compatibility
+---------------------
+
+* Oracle Server versions: 23, 21, 19, 18, 12.2 and __probably__ 11.2 (tested, but it's not mentioned in official docs).
+* Spark versions: 2.3.x - 3.5.x
+* Java versions: 8 - 20
+
+See `official documentation <https://www.oracle.com/cis/database/technologies/appdev/jdbc-downloads.html>`_.
+
+Installing PySpark
+------------------
+
+To use Oracle connector you should have PySpark installed (or injected to ``sys.path``)
+BEFORE creating the connector instance.
+
+See :ref:`install-spark` installation instruction for more details.
+
+Connecting to Oracle
+--------------------
+
+Connection port
+~~~~~~~~~~~~~~~
+
+Connection is usually performed to port 1521. Port may differ for different Oracle instances.
+Please ask your Oracle administrator to provide required information.
+
+Connection host
+~~~~~~~~~~~~~~~
+
+It is possible to connect to Oracle by using either DNS name of host or it's IP address.
+
+If you're using Oracle cluster, it is currently possible to connect only to **one specific node**.
+Connecting to multiple nodes to perform load balancing, as well as automatic failover to new master/replica are not supported.
+
+Connect as proxy user
+~~~~~~~~~~~~~~~~~~~~~
+
+It is possible to connect to database as another user without knowing this user password.
+
+This can be enabled by granting user a special ``CONNECT THROUGH`` permission:
+
+.. code-block:: sql
+
+    ALTER USER schema_owner GRANT CONNECT THROUGH proxy_user;
+
+Then you can connect to Oracle using credentials of ``proxy_user`` but specify that you need permissions of ``schema_owner``:
+
+.. code-block:: python
+
+    oracle = Oracle(
+        ...,
+        user="proxy_user[schema_owner]",
+        password="proxy_user password",
+    )
+
+See `official documentation <https://oracle-base.com/articles/misc/proxy-users-and-connect-through>`_.
+
+Required grants
+~~~~~~~~~~~~~~~
+
+Ask your Oracle cluster administrator to set following grants for a user,
+used for creating a connection:
+
+.. tabs::
+
+    .. code-tab:: sql Read + Write (schema is owned by user)
+
+        -- allow user to log in
+        GRANT CREATE SESSION TO username;
+
+        -- allow creating tables in user schema
+        GRANT CREATE TABLE TO username;
+
+        -- allow read & write access to specific table
+        GRANT SELECT, INSERT ON username.mytable TO username;
+
+    .. code-tab:: sql Read + Write (schema is not owned by user)
+
+        -- allow user to log in
+        GRANT CREATE SESSION TO username;
+
+        -- allow creating tables in any schema,
+        -- as Oracle does not support specifying exact schema name
+        GRANT CREATE ANY TABLE TO username;
+
+        -- only if if_exists="replace_entire_table" is used:
+        -- allow dropping/truncating tables in any schema,
+        -- as Oracle does not support specifying exact schema name
+        GRANT DROP ANY TABLE TO username;
+
+        -- allow read & write access to specific table
+        GRANT SELECT, INSERT ON someschema.mytable TO username;
+
+    .. code-tab:: sql Read only
+
+        -- allow user to log in
+        GRANT CREATE SESSION TO username;
+
+        -- allow read access to specific table
+        GRANT SELECT ON someschema.mytable TO username;
+
+More details can be found in official documentation:
+    * `GRANT <https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/GRANT.html>`_
+    * `SELECT <https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/SELECT.html>`_
+    * `CREATE TABLE <https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/SELECT.html>`_
+    * `INSERT <https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/INSERT.html>`_
+    * `TRUNCATE TABLE <https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/TRUNCATE-TABLE.html>`_