Skip to content

Commit

Permalink
[DOP-13252] Improve Oracle documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
dolfinus committed Mar 12, 2024
1 parent 6476cc1 commit e45eb71
Show file tree
Hide file tree
Showing 23 changed files with 848 additions and 98 deletions.
2 changes: 2 additions & 0 deletions docs/changelog/next_release/211.improvement.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
Improve Clickhouse documentation:
* Add "Types" section describing mapping between Clickhouse and Spark types
* Add "Prerequisites" section describing different aspects of connecting to Clickhouse
* Separate documentation of ``DBReader`` and ``Clickhouse.sql``
* Add examples for ``Clickhouse.fetch`` and ``Clickhouse.execute``
1 change: 1 addition & 0 deletions docs/changelog/next_release/228.improvement.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
Improve Greenplum documentation:
* Add "Types" section describing mapping between Greenplum and Spark types
* Add more examples of reading and writing data from Greenplum
* Add examples for ``Greenplum.fetch`` and ``Greenplum.execute``
3 changes: 2 additions & 1 deletion docs/changelog/next_release/229.improvement.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
Improve Postgres documentation:
* Add "Types" section describing mapping between Postgres and Spark types
* Add "Prerequisites" section describing different aspects of connecting to Postgres
* Separate documentation of DBReader and Postgres.sql
* Separate documentation of ``DBReader`` and ``Postgres.sql``
* Add examples for ``Postgres.fetch`` and ``Postgres.execute``
5 changes: 5 additions & 0 deletions docs/changelog/next_release/233.improvement.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Improve Oracle documentation:
* Add "Types" section describing mapping between Oracle and Spark types
* Add "Prerequisites" section describing different aspects of connecting to Oracle
* Separate documentation of ``DBReader`` and ``Oracle.sql``
* Add examples for ``Oracle.fetch`` and ``Oracle.execute``
4 changes: 3 additions & 1 deletion docs/connection/db_connection/clickhouse/execute.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,11 @@ Syntax support

This method supports **any** query syntax supported by Clickhouse, like:

* ``CREATE TABLE ...``
* ``CREATE TABLE ...``, ``CREATE VIEW ...``, and so on
* ``ALTER ...``
* ``INSERT INTO ... AS SELECT ...``
* ``DROP TABLE ...``, ``DROP VIEW ...``, and so on
* ``SELECT func(arg1, arg2)``
* etc

It does not support multiple queries in the same operation, like ``SET ...; CREATE TABLE ...;``.
Expand Down
11 changes: 1 addition & 10 deletions docs/connection/db_connection/clickhouse/prerequisites.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ used for creating a connection:

.. code-tab:: sql Read + Write

-- allow external tables in the same schema as target table
-- allow creating tables in the target schema
GRANT CREATE TABLE ON myschema.* TO username;

-- allow read & write access to specific table
Expand All @@ -57,13 +57,4 @@ used for creating a connection:
-- allow read access to specific table
GRANT SELECT ON myschema.mytable TO username;

.. code-tab:: sql Write only

-- allow external tables in the same schema as target table
GRANT CREATE TABLE ON myschema.* TO username;

-- allow read access to specific table (to get column types)
-- allow write access to specific table
GRANT SELECT, INSERT ON myschema.mytable TO username;

More details can be found in `official documentation <https://clickhouse.com/docs/en/sql-reference/statements/grant>`_.
2 changes: 1 addition & 1 deletion docs/connection/db_connection/clickhouse/sql.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Syntax support
Only queries with the following syntax are supported:

* ``SELECT ...``
* ``WITH ... SELECT ...``
* ``WITH alias AS (...) SELECT ...``

Queries like ``SHOW ...`` are not supported.

Expand Down
17 changes: 11 additions & 6 deletions docs/connection/db_connection/clickhouse/types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,8 @@ Here you can find source code with type conversions:
Supported types
---------------

See `official documentation <https://clickhouse.com/docs/en/sql-reference/data-types>`_

Generic types
~~~~~~~~~~~~~

Expand Down Expand Up @@ -304,8 +306,11 @@ This dialect does not have type conversion between some types, like Clickhouse `

The is a way to avoid this - just cast everything to ``String``.

Read unsupported column type
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Explicit type cast
------------------

``DBReader``
~~~~~~~~~~~~

Use ``CAST`` or ``toJSONString`` to get column data as string in JSON format,
and then cast string column in resulting dataframe to proper type using `from_json <https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.from_json.html>`_:
Expand All @@ -332,8 +337,8 @@ and then cast string column in resulting dataframe to proper type using `from_js
from_json(df.array_column, column_type).alias("array_column"),
)
Write unsupported column type
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``DBWriter``
~~~~~~~~~~~~

Convert dataframe column to JSON using `to_json <https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.to_json.html>`_,
and write it as ``String`` column in Clickhouse:
Expand All @@ -342,7 +347,7 @@ and write it as ``String`` column in Clickhouse:
clickhouse.execute(
"""
CREATE TABLE target_tbl AS (
CREATE TABLE default.target_tbl AS (
id Int32,
array_column_json String,
)
Expand All @@ -360,7 +365,7 @@ and write it as ``String`` column in Clickhouse:
writer.run(df)
Then you can parse this column on Clickhouse side:
Then you can parse this column on Clickhouse side - for example, by creating a view:

.. code:: sql
Expand Down
11 changes: 7 additions & 4 deletions docs/connection/db_connection/greenplum/types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -272,8 +272,11 @@ Columns of these types cannot be read/written by Spark:

The is a way to avoid this - just cast unsupported types to ``text``. But the way this can be done is not a straightforward.

Read unsupported column type
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Explicit type cast
------------------

``DBReader``
~~~~~~~~~~~~

Unfortunately, it is not possible to cast unsupported column to some supported type on ``DBReader`` side:

Expand Down Expand Up @@ -334,8 +337,8 @@ You can then parse this column on Spark side using `from_json <https://spark.apa
from_json(df.array_column_as_json, schema).alias("array_column"),
)
Write unsupported column type
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``DBWriter``
~~~~~~~~~~~~

It is always possible to convert data on Spark side to string, and then write it to ``text`` column in Greenplum table.

Expand Down
95 changes: 94 additions & 1 deletion docs/connection/db_connection/oracle/execute.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,100 @@
.. _oracle-execute:

Executing statements in Oracle
==============================
==================================

How to
------

There are 2 ways to execute some statement in Oracle

Use :obj:`Oracle.fetch <onetl.connection.db_connection.oracle.connection.Oracle.fetch>`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use this method to execute some ``SELECT`` query which returns **small number or rows**, like reading
Oracle config, or reading data from some reference table.

Method accepts :obj:`JDBCOptions <onetl.connection.db_connection.jdbc_mixin.options.JDBCOptions>`.

Connection opened using this method should be then closed with :obj:`Oracle.close <onetl.connection.db_connection.oracle.connection.Oracle.close>`.

Syntax support
^^^^^^^^^^^^^^

This method supports **any** query syntax supported by Oracle, like:

* ``SELECT ... FROM ...``
* ``WITH alias AS (...) SELECT ...``
* ``SHOW ...``

It does not support multiple queries in the same operation, like ``SET ...; SELECT ...;``.

Examples
^^^^^^^^

.. code-block:: python
from onetl.connection import Oracle
oracle = Oracle(...)
df = oracle.fetch(
"SELECT value FROM some.reference_table WHERE key = 'some_constant'",
options=Oracle.JDBCOptions(query_timeout=10),
)
oracle.close()
value = df.collect()[0][0] # get value from first row and first column
Use :obj:`Oracle.execute <onetl.connection.db_connection.oracle.connection.Oracle.execute>`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use this method to execute DDL and DML operations. Each method call runs operation in a separated transaction, and then commits it.

Method accepts :obj:`JDBCOptions <onetl.connection.db_connection.jdbc_mixin.options.JDBCOptions>`.

Connection opened using this method should be then closed with :obj:`Oracle.close <onetl.connection.db_connection.oracle.connection.Oracle.close>`.

Syntax support
^^^^^^^^^^^^^^

This method supports **any** query syntax supported by Oracle, like:

* ``CREATE TABLE ...``, ``CREATE VIEW ...``
* ``ALTER ...``
* ``INSERT INTO ... AS SELECT ...``
* ``DROP TABLE ...``, ``DROP VIEW ...``, and so on
* ``CALL procedure(arg1, arg2) ...`` or ``{call procedure(arg1, arg2)}`` - special syntax for calling procedure
* ``SELECT func(arg1, arg2)``
* ``DECLARE ... BEGIN ... END`` - for executing PL/SQL statements
* etc

It does not support multiple queries in the same operation, like ``SET ...; CREATE TABLE ...;``.

Examples
^^^^^^^^

.. code-block:: python
from onetl.connection import Oracle
oracle = Oracle(...)
with oracle:
oracle.execute("DROP TABLE schema.table")
oracle.execute(
"""
CREATE TABLE schema.table AS (
id bigint GENERATED ALWAYS AS IDENTITY,
key VARCHAR2(4000),
value NUMBER
)
""",
options=Oracle.JDBCOptions(query_timeout=10),
)
References
----------

.. currentmodule:: onetl.connection.db_connection.oracle.connection

Expand Down
8 changes: 8 additions & 0 deletions docs/connection/db_connection/oracle/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,20 @@ Oracle
:maxdepth: 1
:caption: Connection

prerequisites
connection

.. toctree::
:maxdepth: 1
:caption: Operations

read
sql
write
execute

.. toctree::
:maxdepth: 1
:caption: Troubleshooting

types
112 changes: 112 additions & 0 deletions docs/connection/db_connection/oracle/prerequisites.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
.. _oracle-prerequisites:

Prerequisites
=============

Version Compatibility
---------------------

* Oracle Server versions: 23, 21, 19, 18, 12.2 and __probably__ 11.2 (tested, but it's not mentioned in official docs).
* Spark versions: 2.3.x - 3.5.x
* Java versions: 8 - 20

See `official documentation <https://www.oracle.com/cis/database/technologies/appdev/jdbc-downloads.html>`_.

Installing PySpark
------------------

To use Oracle connector you should have PySpark installed (or injected to ``sys.path``)
BEFORE creating the connector instance.

See :ref:`install-spark` installation instruction for more details.

Connecting to Oracle
--------------------

Connection port
~~~~~~~~~~~~~~~

Connection is usually performed to port 1521. Port may differ for different Oracle instances.
Please ask your Oracle administrator to provide required information.

Connection host
~~~~~~~~~~~~~~~

It is possible to connect to Oracle by using either DNS name of host or it's IP address.

If you're using Oracle cluster, it is currently possible to connect only to **one specific node**.
Connecting to multiple nodes to perform load balancing, as well as automatic failover to new master/replica are not supported.

Connect as proxy user
~~~~~~~~~~~~~~~~~~~~~

It is possible to connect to database as another user without knowing this user password.

This can be enabled by granting user a special ``CONNECT THROUGH`` permission:

.. code-block:: sql
ALTER USER schema_owner GRANT CONNECT THROUGH proxy_user;
Then you can connect to Oracle using credentials of ``proxy_user`` but specify that you need permissions of ``schema_owner``:

.. code-block:: python
oracle = Oracle(
...,
user="proxy_user[schema_owner]",
password="proxy_user password",
)
See `official documentation <https://oracle-base.com/articles/misc/proxy-users-and-connect-through>`_.

Required grants
~~~~~~~~~~~~~~~

Ask your Oracle cluster administrator to set following grants for a user,
used for creating a connection:

.. tabs::

.. code-tab:: sql Read + Write (schema is owned by user)

-- allow user to log in
GRANT CREATE SESSION TO username;

-- allow creating tables in user schema
GRANT CREATE TABLE TO username;

-- allow read & write access to specific table
GRANT SELECT, INSERT ON username.mytable TO username;

.. code-tab:: sql Read + Write (schema is not owned by user)

-- allow user to log in
GRANT CREATE SESSION TO username;

-- allow creating tables in any schema,
-- as Oracle does not support specifying exact schema name
GRANT CREATE ANY TABLE TO username;

-- only if if_exists="replace_entire_table" is used:
-- allow dropping/truncating tables in any schema,
-- as Oracle does not support specifying exact schema name
GRANT DROP ANY TABLE TO username;

-- allow read & write access to specific table
GRANT SELECT, INSERT ON someschema.mytable TO username;

.. code-tab:: sql Read only

-- allow user to log in
GRANT CREATE SESSION TO username;

-- allow read access to specific table
GRANT SELECT ON someschema.mytable TO username;

More details can be found in official documentation:
* `GRANT <https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/GRANT.html>`_
* `SELECT <https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/SELECT.html>`_
* `CREATE TABLE <https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/SELECT.html>`_
* `INSERT <https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/INSERT.html>`_
* `TRUNCATE TABLE <https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/TRUNCATE-TABLE.html>`_
Loading

0 comments on commit e45eb71

Please sign in to comment.