Skip to content

Commit

Permalink
[DOP-13252] Improve Oracle documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
dolfinus committed Mar 12, 2024
1 parent b5a577a commit 02ba65a
Show file tree
Hide file tree
Showing 24 changed files with 885 additions and 142 deletions.
2 changes: 2 additions & 0 deletions docs/changelog/next_release/211.improvement.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
Improve Clickhouse documentation:
* Add "Types" section describing mapping between Clickhouse and Spark types
* Add "Prerequisites" section describing different aspects of connecting to Clickhouse
* Separate documentation of ``DBReader`` and ``Clickhouse.sql``
* Add examples for ``Clickhouse.fetch`` and ``Clickhouse.execute``
3 changes: 2 additions & 1 deletion docs/changelog/next_release/228.improvement.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
Improve Greenplum documentation:
* Add "Types" section describing mapping between Greenplum and Spark types
* Add more examples of reading and writing data from Greenplum
* Add notes about issues with IP resolution and building ``gpfdist`` URL.
* Add examples for ``Greenplum.fetch`` and ``Greenplum.execute``
* Add notes about issues with IP resolution and building ``gpfdist`` URL
3 changes: 2 additions & 1 deletion docs/changelog/next_release/229.improvement.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
Improve Postgres documentation:
* Add "Types" section describing mapping between Postgres and Spark types
* Add "Prerequisites" section describing different aspects of connecting to Postgres
* Separate documentation of DBReader and Postgres.sql
* Separate documentation of ``DBReader`` and ``Postgres.sql``
* Add examples for ``Postgres.fetch`` and ``Postgres.execute``
5 changes: 5 additions & 0 deletions docs/changelog/next_release/233.improvement.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Improve Oracle documentation:
* Add "Types" section describing mapping between Oracle and Spark types
* Add "Prerequisites" section describing different aspects of connecting to Oracle
* Separate documentation of ``DBReader`` and ``Oracle.sql``
* Add examples for ``Oracle.fetch`` and ``Oracle.execute``
22 changes: 11 additions & 11 deletions docs/connection/db_connection/clickhouse/execute.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,11 @@ Syntax support

This method supports **any** query syntax supported by Clickhouse, like:

* ``SELECT ... FROM ...``
* ``WITH alias AS (...) SELECT ...``
* ``SHOW ...``

It does not support multiple queries in the same operation, like ``SET ...; SELECT ...;``.
* ✅︎ ``SELECT ... FROM ...``
* ✅︎ ``WITH alias AS (...) SELECT ...``
* ✅︎ ``SELECT func(arg1, arg2)`` - call function
* ✅︎ ``SHOW ...``
* ❌ ``SET ...; SELECT ...;`` - multiple statements not supported

Examples
^^^^^^^^
Expand Down Expand Up @@ -59,12 +59,12 @@ Syntax support

This method supports **any** query syntax supported by Clickhouse, like:

* ``CREATE TABLE ...``
* ``ALTER ...``
* ``INSERT INTO ... AS SELECT ...``
* etc

It does not support multiple queries in the same operation, like ``SET ...; CREATE TABLE ...;``.
* ✅︎ ``CREATE TABLE ...``, ``CREATE VIEW ...``, and so on
* ✅︎ ``ALTER ...``
* ✅︎ ``INSERT INTO ... AS SELECT ...``
* ✅︎ ``DROP TABLE ...``, ``DROP VIEW ...``, and so on
* ✅︎ other statements not mentioned here
* ❌ ``SET ...; SELECT ...;`` - multiple statements not supported

Examples
^^^^^^^^
Expand Down
11 changes: 1 addition & 10 deletions docs/connection/db_connection/clickhouse/prerequisites.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ used for creating a connection:

.. code-tab:: sql Read + Write

-- allow external tables in the same schema as target table
-- allow creating tables in the target schema
GRANT CREATE TABLE ON myschema.* TO username;

-- allow read & write access to specific table
Expand All @@ -57,13 +57,4 @@ used for creating a connection:
-- allow read access to specific table
GRANT SELECT ON myschema.mytable TO username;

.. code-tab:: sql Write only

-- allow external tables in the same schema as target table
GRANT CREATE TABLE ON myschema.* TO username;

-- allow read access to specific table (to get column types)
-- allow write access to specific table
GRANT SELECT, INSERT ON myschema.mytable TO username;

More details can be found in `official documentation <https://clickhouse.com/docs/en/sql-reference/statements/grant>`_.
2 changes: 1 addition & 1 deletion docs/connection/db_connection/clickhouse/sql.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Syntax support
Only queries with the following syntax are supported:

* ``SELECT ...``
* ``WITH ... SELECT ...``
* ``WITH alias AS (...) SELECT ...``

Queries like ``SHOW ...`` are not supported.

Expand Down
23 changes: 14 additions & 9 deletions docs/connection/db_connection/clickhouse/types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ This is how Clickhouse connector performs this:
* Find corresponding ``Clickhouse type (read)`` -> ``Spark type`` combination (see below) for each DataFrame column. If no combination is found, raise exception.
* Create DataFrame from query with specific column names and Spark types.

Writing to some existing Clickhuse table
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Writing to some existing Clickhouse table
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is how Clickhouse connector performs this:

Expand Down Expand Up @@ -113,6 +113,8 @@ Here you can find source code with type conversions:
Supported types
---------------

See `official documentation <https://clickhouse.com/docs/en/sql-reference/data-types>`_

Generic types
~~~~~~~~~~~~~

Expand Down Expand Up @@ -243,7 +245,7 @@ Note: ``DateTime(P, TZ)`` has the same precision as ``DateTime(P)``.
.. [5]
Generic JDBC dialect generates DDL with Clickhouse type ``TIMESTAMP`` which is alias for ``DateTime32`` with precision up to seconds (``23:59:59``).
Inserting data with milliseconds precision (``23:59:59.999``) will lead to throwing away milliseconds (``23:59:59``).
Inserting data with milliseconds precision (``23:59:59.999``) will lead to **throwing away milliseconds**.
.. [6]
Clickhouse will raise an exception that data in format ``2001-01-01 23:59:59.999999`` has data ``.999999`` which does not match format ``YYYY-MM-DD hh:mm:ss``.
Expand Down Expand Up @@ -304,8 +306,11 @@ This dialect does not have type conversion between some types, like Clickhouse `

The is a way to avoid this - just cast everything to ``String``.

Read unsupported column type
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Explicit type cast
------------------

``DBReader``
~~~~~~~~~~~~

Use ``CAST`` or ``toJSONString`` to get column data as string in JSON format,
and then cast string column in resulting dataframe to proper type using `from_json <https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.from_json.html>`_:
Expand All @@ -332,8 +337,8 @@ and then cast string column in resulting dataframe to proper type using `from_js
from_json(df.array_column, column_type).alias("array_column"),
)
Write unsupported column type
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``DBWriter``
~~~~~~~~~~~~

Convert dataframe column to JSON using `to_json <https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.to_json.html>`_,
and write it as ``String`` column in Clickhouse:
Expand All @@ -342,7 +347,7 @@ and write it as ``String`` column in Clickhouse:
clickhouse.execute(
"""
CREATE TABLE target_tbl AS (
CREATE TABLE default.target_tbl AS (
id Int32,
array_column_json String,
)
Expand All @@ -360,7 +365,7 @@ and write it as ``String`` column in Clickhouse:
writer.run(df)
Then you can parse this column on Clickhouse side:
Then you can parse this column on Clickhouse side - for example, by creating a view:

.. code:: sql
Expand Down
27 changes: 12 additions & 15 deletions docs/connection/db_connection/greenplum/execute.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,10 @@ Syntax support

This method supports **any** query syntax supported by Greenplum, like:

* ``SELECT ... FROM ...``
* ``WITH alias AS (...) SELECT ...``

Queries like ``SHOW ...`` are not supported.

It does not support multiple queries in the same operation, like ``SET ...; SELECT ...;``.
* ✅︎ ``SELECT ... FROM ...``
* ✅︎ ``WITH alias AS (...) SELECT ...``
* ✅︎ ``SELECT func(arg1, arg2)`` or ``{call func(arg1, arg2)}`` - special syntax for calling functions
* ❌ ``SET ...; SELECT ...;`` - multiple statements not supported

Examples
^^^^^^^^
Expand Down Expand Up @@ -65,15 +63,14 @@ Syntax support

This method supports **any** query syntax supported by Greenplum, like:

* ``CREATE TABLE ...``, ``CREATE VIEW ...``, and so on
* ``ALTER ...``
* ``INSERT INTO ... AS SELECT ...``
* ``DROP TABLE ...``, ``DROP VIEW ...``, and so on
* ``CALL procedure(arg1, arg2) ...``
* ``SELECT func(arg1, arg2)`` or ``{call func(arg1, arg2)}`` - special syntax for calling functions
* etc

It does not support multiple queries in the same operation, like ``SET ...; CREATE TABLE ...;``.
* ✅︎ ``CREATE TABLE ...``, ``CREATE VIEW ...``, and so on
* ✅︎ ``ALTER ...``
* ✅︎ ``INSERT INTO ... AS SELECT ...``
* ✅︎ ``DROP TABLE ...``, ``DROP VIEW ...``, and so on
* ✅︎ ``CALL procedure(arg1, arg2) ...``
* ✅︎ ``SELECT func(arg1, arg2)`` or ``{call func(arg1, arg2)}`` - special syntax for calling functions
* ✅︎ other statements not mentioned here
* ❌ ``SET ...; SELECT ...;`` - multiple statements not supported

Examples
^^^^^^^^
Expand Down
15 changes: 9 additions & 6 deletions docs/connection/db_connection/greenplum/types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ This is how Greenplum connector performs this:
Yes, **all columns of a table**, not just selected ones.
This means that if source table **contains** columns with unsupported type, the entire table cannot be read.
Writing to some existing Clickhuse table
Writing to some existing Greenplum table
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is how Greenplum connector performs this:
Expand All @@ -33,7 +33,7 @@ This is how Greenplum connector performs this:
* For each column in query result get column name and Greenplum type.
* Match table columns with DataFrame columns (by name, case insensitive).
If some column is present only in target table, but not in DataFrame (like ``DEFAULT`` or ``SERIAL`` column), and vice versa, raise an exception.
See `Write unsupported column type`_.
See `Explicit type cast`_.
* Find corresponding ``Spark type`` -> ``Greenplumtype (write)`` combination (see below) for each DataFrame column. If no combination is found, raise exception.
* If ``Greenplumtype (write)`` match ``Greenplum type (read)``, no additional casts will be performed, DataFrame column will be written to Greenplum as is.
* If ``Greenplumtype (write)`` does not match ``Greenplum type (read)``, DataFrame column will be casted to target column type **on Greenplum side**. For example, you can write column with text data to ``json`` column which Greenplum connector currently does not support.
Expand Down Expand Up @@ -272,8 +272,11 @@ Columns of these types cannot be read/written by Spark:

The is a way to avoid this - just cast unsupported types to ``text``. But the way this can be done is not a straightforward.

Read unsupported column type
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Explicit type cast
------------------

``DBReader``
~~~~~~~~~~~~

Unfortunately, it is not possible to cast unsupported column to some supported type on ``DBReader`` side:

Expand Down Expand Up @@ -334,8 +337,8 @@ You can then parse this column on Spark side using `from_json <https://spark.apa
from_json(df.array_column_as_json, schema).alias("array_column"),
)
Write unsupported column type
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``DBWriter``
~~~~~~~~~~~~

It is always possible to convert data on Spark side to string, and then write it to ``text`` column in Greenplum table.

Expand Down
93 changes: 92 additions & 1 deletion docs/connection/db_connection/oracle/execute.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,98 @@
.. _oracle-execute:

Executing statements in Oracle
==============================
==================================

How to
------

There are 2 ways to execute some statement in Oracle

Use :obj:`Oracle.fetch <onetl.connection.db_connection.oracle.connection.Oracle.fetch>`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use this method to execute some ``SELECT`` query which returns **small number or rows**, like reading
Oracle config, or reading data from some reference table.

Method accepts :obj:`JDBCOptions <onetl.connection.db_connection.jdbc_mixin.options.JDBCOptions>`.

Connection opened using this method should be then closed with :obj:`Oracle.close <onetl.connection.db_connection.oracle.connection.Oracle.close>`.

Syntax support
^^^^^^^^^^^^^^

This method supports **any** query syntax supported by Oracle, like:

* ✅︎ ``SELECT ... FROM ...``
* ✅︎ ``WITH alias AS (...) SELECT ...``
* ✅︎ ``SELECT func(arg1, arg2) FROM DUAL`` - call function
* ✅︎ ``SHOW ...``
* ❌ ``SET ...; SELECT ...;`` - multiple statements not supported

Examples
^^^^^^^^

.. code-block:: python
from onetl.connection import Oracle
oracle = Oracle(...)
df = oracle.fetch(
"SELECT value FROM some.reference_table WHERE key = 'some_constant'",
options=Oracle.JDBCOptions(query_timeout=10),
)
oracle.close()
value = df.collect()[0][0] # get value from first row and first column
Use :obj:`Oracle.execute <onetl.connection.db_connection.oracle.connection.Oracle.execute>`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use this method to execute DDL and DML operations. Each method call runs operation in a separated transaction, and then commits it.

Method accepts :obj:`JDBCOptions <onetl.connection.db_connection.jdbc_mixin.options.JDBCOptions>`.

Connection opened using this method should be then closed with :obj:`Oracle.close <onetl.connection.db_connection.oracle.connection.Oracle.close>`.

Syntax support
^^^^^^^^^^^^^^

This method supports **any** query syntax supported by Oracle, like:

* ✅︎ ``CREATE TABLE ...``, ``CREATE VIEW ...``
* ✅︎ ``ALTER ...``
* ✅︎ ``INSERT INTO ... AS SELECT ...``
* ✅︎ ``DROP TABLE ...``, ``DROP VIEW ...``, and so on
* ✅︎ ``CALL procedure(arg1, arg2) ...`` or ``{call procedure(arg1, arg2)}`` - special syntax for calling procedure
* ✅︎ ``DECLARE ... BEGIN ... END`` - execute PL/SQL statement
* ✅︎ other statements not mentioned here
* ❌ ``SET ...; SELECT ...;`` - multiple statements not supported

Examples
^^^^^^^^

.. code-block:: python
from onetl.connection import Oracle
oracle = Oracle(...)
with oracle:
oracle.execute("DROP TABLE schema.table")
oracle.execute(
"""
CREATE TABLE schema.table AS (
id bigint GENERATED ALWAYS AS IDENTITY,
key VARCHAR2(4000),
value NUMBER
)
""",
options=Oracle.JDBCOptions(query_timeout=10),
)
References
----------

.. currentmodule:: onetl.connection.db_connection.oracle.connection

Expand Down
8 changes: 8 additions & 0 deletions docs/connection/db_connection/oracle/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,20 @@ Oracle
:maxdepth: 1
:caption: Connection

prerequisites
connection

.. toctree::
:maxdepth: 1
:caption: Operations

read
sql
write
execute

.. toctree::
:maxdepth: 1
:caption: Troubleshooting

types
Loading

0 comments on commit 02ba65a

Please sign in to comment.