Skip to content

Commit

Permalink
[DOP-13252] Improve Oracle documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
dolfinus committed Mar 11, 2024
1 parent 6476cc1 commit b4f5580
Show file tree
Hide file tree
Showing 17 changed files with 761 additions and 30 deletions.
2 changes: 2 additions & 0 deletions docs/changelog/next_release/211.improvement.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
Improve Clickhouse documentation:
* Add "Types" section describing mapping between Clickhouse and Spark types
* Add "Prerequisites" section describing different aspects of connecting to Clickhouse
* Separate documentation of ``DBReader`` and ``Clickhouse.sql``
* Add examples for ``Clickhouse.fetch`` and ``Clickhouse.execute``
1 change: 1 addition & 0 deletions docs/changelog/next_release/228.improvement.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
Improve Greenplum documentation:
* Add "Types" section describing mapping between Greenplum and Spark types
* Add more examples of reading and writing data from Greenplum
* Add examples for ``Greenplum.fetch`` and ``Greenplum.execute``
3 changes: 2 additions & 1 deletion docs/changelog/next_release/229.improvement.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
Improve Postgres documentation:
* Add "Types" section describing mapping between Postgres and Spark types
* Add "Prerequisites" section describing different aspects of connecting to Postgres
* Separate documentation of DBReader and Postgres.sql
* Separate documentation of ``DBReader`` and ``Postgres.sql``
* Add examples for ``Postgres.fetch`` and ``Postgres.execute``
5 changes: 5 additions & 0 deletions docs/changelog/next_release/233.improvement.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Improve Oracle documentation:
* Add "Types" section describing mapping between Oracle and Spark types
* Add "Prerequisites" section describing different aspects of connecting to Oracle
* Separate documentation of ``DBReader`` and ``Oracle.sql``
* Add examples for ``Oracle.fetch`` and ``Oracle.execute``
95 changes: 94 additions & 1 deletion docs/connection/db_connection/oracle/execute.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,100 @@
.. _oracle-execute:

Executing statements in Oracle
==============================
==================================

How to
------

There are 2 ways to execute some statement in Oracle

Use :obj:`Oracle.fetch <onetl.connection.db_connection.oracle.connection.Oracle.fetch>`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use this method to execute some ``SELECT`` query which returns **small number or rows**, like reading
Oracle config, or reading data from some reference table.

Method accepts :obj:`JDBCOptions <onetl.connection.db_connection.jdbc_mixin.options.JDBCOptions>`.

Connection opened using this method should be then closed with :obj:`Oracle.close <onetl.connection.db_connection.oracle.connection.Oracle.close>`.

Syntax support
^^^^^^^^^^^^^^

This method supports **any** query syntax supported by Oracle, like:

* ``SELECT ... FROM ...``
* ``WITH alias AS (...) SELECT ...``
* ``SHOW ...``

It does not support multiple queries in the same operation, like ``SET ...; SELECT ...;``.

Examples
^^^^^^^^

.. code-block:: python
from onetl.connection import Oracle
oracle = Oracle(...)
df = oracle.fetch(
"SELECT value FROM some.reference_table WHERE key = 'some_constant'",
options=Oracle.JDBCOptions(query_timeout=10),
)
oracle.close()
value = df.collect()[0][0] # get value from first row and first column
Use :obj:`Oracle.execute <onetl.connection.db_connection.oracle.connection.Oracle.execute>`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use this method to execute DDL and DML operations. Each method call runs operation in a separated transaction, and then commits it.

Method accepts :obj:`JDBCOptions <onetl.connection.db_connection.jdbc_mixin.options.JDBCOptions>`.

Connection opened using this method should be then closed with :obj:`Oracle.close <onetl.connection.db_connection.oracle.connection.Oracle.close>`.

Syntax support
^^^^^^^^^^^^^^

This method supports **any** query syntax supported by Oracle, like:

* ``CREATE TABLE ...``, ``CREATE VIEW ...``
* ``ALTER ...``
* ``INSERT INTO ... AS SELECT ...``
* ``DROP TABLE ...``, ``DROP VIEW ...``, and so on
* ``CALL procedure(arg1, arg2) ...`` or ``{call procedure(arg1, arg2)`` - special syntax for calling procedure
* ``SELECT func(arg1, arg2)``
* ``DECLARE ... BEGIN ... END`` - for executing PL/SQL statements
* etc

It does not support multiple queries in the same operation, like ``SET ...; CREATE TABLE ...;``.

Examples
^^^^^^^^

.. code-block:: python
from onetl.connection import Oracle
oracle = Oracle(...)
with oracle:
oracle.execute("DROP TABLE schema.table")
oracle.execute(
"""
CREATE TABLE schema.table AS (
id biging GENERATED ALWAYS AS IDENTITY,
key VARCHAR2(4000),
value NUMBER
)
""",
options=Oracle.JDBCOptions(query_timeout=10),
)
References
----------

.. currentmodule:: onetl.connection.db_connection.oracle.connection

Expand Down
8 changes: 8 additions & 0 deletions docs/connection/db_connection/oracle/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,20 @@ Oracle
:maxdepth: 1
:caption: Connection

prerequisites
connection

.. toctree::
:maxdepth: 1
:caption: Operations

read
sql
write
execute

.. toctree::
:maxdepth: 1
:caption: Troubleshooting

types
87 changes: 87 additions & 0 deletions docs/connection/db_connection/oracle/prerequisites.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
.. _oracle-prerequisites:

Prerequisites
=============

Version Compatibility
---------------------

* Oracle Server versions: 23, 21, 19, 18, 12.2 and __probably__ 11.2 (tested, but it's not mentioned in official docs).
* Spark versions: 2.3.x - 3.5.x
* Java versions: 8 - 20

See `official documentation <https://www.oracle.com/cis/database/technologies/appdev/jdbc-downloads.html>`_.

Installing PySpark
------------------

To use Oracle connector you should have PySpark installed (or injected to ``sys.path``)
BEFORE creating the connector instance.

See :ref:`install-spark` installation instruction for more details.

Connecting to Oracle
--------------------

Connection port
~~~~~~~~~~~~~~~

Connection is usually performed to port 1521. Port may differ for different Oracle instances.
Please ask your Oracle administrator to provide required information.

Oracle cluster interaction
~~~~~~~~~~~~~~~~~~~~~~~~~~

If you're using Oracle cluster, it is currently possible to connect only to **one specific node**.
Connecting to multiple nodes to perform load balancing, as well as automatic failover to new master/replica are not supported.

Required grants
~~~~~~~~~~~~~~~

Ask your Oracle cluster administrator to set following grants for a user,
used for creating a connection:

.. tabs::

.. code-tab:: sql Read + Write (schema is owned by user)

-- allow user to log in
GRANT CREATE SESSION TO username;

-- allow creating tables in user schema
GRANT CREATE TABLE TO username;

-- allow read & write access to specific table
GRANT SELECT, INSERT ON username.mytable TO username;

.. code-tab:: sql Read + Write (schema is not owned by user)

-- allow user to log in
GRANT CREATE SESSION TO username;

-- allow creating tables in any schema,
-- as Oracle does not support specifying exact schema name
GRANT CREATE ANY TABLE TO username;

-- only if if_exists="replace_entire_table" is used:
-- allow dropping/truncating tables in any schema,
-- as Oracle does not support specifying exact schema name
GRANT DROP ANY TABLE TO username;

-- allow read & write access to specific table
GRANT SELECT, INSERT ON someschema.mytable TO username;

.. code-tab:: sql Read only

-- allow user to log in
GRANT CREATE SESSION TO username;

-- allow read access to specific table
GRANT SELECT ON someschema.mytable TO username;

More details can be found in official documentation:
* `GRANT <https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/GRANT.html>`_
* `SELECT <https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/SELECT.html>`_
* `CREATE TABLE <https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/SELECT.html>`_
* `INSERT <https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/INSERT.html>`_
* `TRUNCATE TABLE <https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/TRUNCATE-TABLE.html>`_
72 changes: 64 additions & 8 deletions docs/connection/db_connection/oracle/read.rst
Original file line number Diff line number Diff line change
@@ -1,18 +1,74 @@
.. _oracle-read:

Reading from Oracle
===================
Reading from Oracle using ``DBReader``
======================================

There are 2 ways of distributed data reading from Oracle:
.. warning::

* Using :obj:`DBReader <onetl.db.db_reader.db_reader.DBReader>` with different :ref:`strategy`
* Using :obj:`Oracle.sql <onetl.connection.db_connection.oracle.connection.Oracle.sql>`
Please take into account :ref:`oracle-types`

Both methods accept :obj:`JDBCReadOptions <onetl.connection.db_connection.jdbc.options.JDBCReadOptions>`
:obj:`DBReader <onetl.db.db_reader.db_reader.DBReader>` supports :ref:`strategy` for incremental data reading,
but does not support custom queries, like JOINs.

.. currentmodule:: onetl.connection.db_connection.oracle.connection
Supported DBReader features
---------------------------

.. automethod:: Oracle.sql
* ✅︎ ``columns``
* ✅︎ ``where``
* ✅︎ ``hwm``, supported strategies:
* * ✅︎ :ref:`snapshot-strategy`
* * ✅︎ :ref:`incremental-strategy`
* * ✅︎ :ref:`snapshot-batch-strategy`
* * ✅︎ :ref:`incremental-batch-strategy`
* ✅︎ ``hint``
* ❌ ``df_schema``
* ✅︎ ``options`` (see :obj:`JDBCReadOptions <onetl.connection.db_connection.jdbc.options.JDBCReadOptions>`)

Examples
--------

Snapshot strategy:

.. code-block:: python
from onetl.connection import Oracle
from onetl.db import DBReader
oracle = Oracle(...)
reader = DBReader(
connection=oracle,
source="schema.table",
columns=["id", "key", "CAST(value AS VARCHAR2(4000)) value", "updated_dt"],
where="key = 'something'",
options=Oracle.ReadOptions(partition_column="id", num_partitions=10),
)
df = reader.run()
Incremental strategy:

.. code-block:: python
from onetl.connection import Oracle
from onetl.db import DBReader
from onetl.strategy import IncrementalStrategy
oracle = Oracle(...)
reader = DBReader(
connection=oracle,
source="schema.table",
columns=["id", "key", "CAST(value AS VARCHAR2(4000)) value", "updated_dt"],
where="key = 'something'",
hwm=DBReader.AutoDetectHWM(name="oracle_hwm", expression="updated_dt"),
options=Oracle.ReadOptions(partition_column="id", num_partitions=10),
)
with IncrementalStrategy():
df = reader.run()
Read options
------------

.. currentmodule:: onetl.connection.db_connection.jdbc_connection.options

Expand Down
55 changes: 55 additions & 0 deletions docs/connection/db_connection/oracle/sql.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
.. _oracle-sql:

Reading from Oracle using ``Oracle.sql``
========================================

.. warning::

Please take into account :ref:`oracle-types`

:obj:`Oracle.sql <onetl.connection.db_connection.oracle.connection.Oracle.sql>` allows passing custom SQL query,
but does not support incremental strategies.

Method also accepts :obj:`JDBCReadOptions <onetl.connection.db_connection.jdbc.options.JDBCReadOptions>`.

Syntax support
--------------

Only queries with the following syntax are supported:

* ``SELECT ...``
* ``WITH ... SELECT ...``

Queries like ``SHOW ...`` are not supported.

This method also does not support multiple queries in the same operation, like ``SET ...; SELECT ...;``.

Examples
--------

.. code-block:: python
from onetl.connection import Oracle
oracle = Oracle(...)
df = oracle.sql(
"""
SELECT
id,
key,
CAST(value AS VARCHAR2(4000)) value,
updated_at
FROM
some.mytable
WHERE
key = 'something'
""",
options=Oracle.ReadOptions(partition_column="id", num_partitions=10),
)
References
----------

.. currentmodule:: onetl.connection.db_connection.oracle.connection

.. automethod:: Oracle.sql
Loading

0 comments on commit b4f5580

Please sign in to comment.