diff --git a/docs/changelog/next_release/211.improvement.rst b/docs/changelog/next_release/211.improvement.rst index 853a7cf94..5deec2051 100644 --- a/docs/changelog/next_release/211.improvement.rst +++ b/docs/changelog/next_release/211.improvement.rst @@ -1,3 +1,5 @@ Improve Clickhouse documentation: * Add "Types" section describing mapping between Clickhouse and Spark types * Add "Prerequisites" section describing different aspects of connecting to Clickhouse + * Separate documentation of ``DBReader`` and ``Clickhouse.sql`` + * Add examples for ``Clickhouse.fetch`` and ``Clickhouse.execute`` diff --git a/docs/changelog/next_release/228.improvement.rst b/docs/changelog/next_release/228.improvement.rst index c5295422e..8b5c4c284 100644 --- a/docs/changelog/next_release/228.improvement.rst +++ b/docs/changelog/next_release/228.improvement.rst @@ -1,4 +1,5 @@ Improve Greenplum documentation: * Add "Types" section describing mapping between Greenplum and Spark types * Add more examples of reading and writing data from Greenplum - * Add notes about issues with IP resolution and building ``gpfdist`` URL. + * Add examples for ``Greenplum.fetch`` and ``Greenplum.execute`` + * Add notes about issues with IP resolution and building ``gpfdist`` URL diff --git a/docs/changelog/next_release/229.improvement.rst b/docs/changelog/next_release/229.improvement.rst index 30eae3549..d0d2d6b57 100644 --- a/docs/changelog/next_release/229.improvement.rst +++ b/docs/changelog/next_release/229.improvement.rst @@ -1,4 +1,5 @@ Improve Postgres documentation: * Add "Types" section describing mapping between Postgres and Spark types * Add "Prerequisites" section describing different aspects of connecting to Postgres - * Separate documentation of DBReader and Postgres.sql + * Separate documentation of ``DBReader`` and ``Postgres.sql`` + * Add examples for ``Postgres.fetch`` and ``Postgres.execute`` diff --git a/docs/changelog/next_release/233.improvement.rst b/docs/changelog/next_release/233.improvement.rst new file mode 100644 index 000000000..5b1913eba --- /dev/null +++ b/docs/changelog/next_release/233.improvement.rst @@ -0,0 +1,5 @@ +Improve Oracle documentation: + * Add "Types" section describing mapping between Oracle and Spark types + * Add "Prerequisites" section describing different aspects of connecting to Oracle + * Separate documentation of ``DBReader`` and ``Oracle.sql`` + * Add examples for ``Oracle.fetch`` and ``Oracle.execute`` diff --git a/docs/connection/db_connection/clickhouse/execute.rst b/docs/connection/db_connection/clickhouse/execute.rst index 59d956a89..ef9ad2ecf 100644 --- a/docs/connection/db_connection/clickhouse/execute.rst +++ b/docs/connection/db_connection/clickhouse/execute.rst @@ -23,11 +23,11 @@ Syntax support This method supports **any** query syntax supported by Clickhouse, like: -* ``SELECT ... FROM ...`` -* ``WITH alias AS (...) SELECT ...`` -* ``SHOW ...`` - -It does not support multiple queries in the same operation, like ``SET ...; SELECT ...;``. +* ✅︎ ``SELECT ... FROM ...`` +* ✅︎ ``WITH alias AS (...) SELECT ...`` +* ✅︎ ``SELECT func(arg1, arg2)`` - call function +* ✅︎ ``SHOW ...`` +* ❌ ``SET ...; SELECT ...;`` - multiple statements not supported Examples ^^^^^^^^ @@ -59,12 +59,12 @@ Syntax support This method supports **any** query syntax supported by Clickhouse, like: -* ``CREATE TABLE ...`` -* ``ALTER ...`` -* ``INSERT INTO ... AS SELECT ...`` -* etc - -It does not support multiple queries in the same operation, like ``SET ...; CREATE TABLE ...;``. +* ✅︎ ``CREATE TABLE ...``, ``CREATE VIEW ...``, and so on +* ✅︎ ``ALTER ...`` +* ✅︎ ``INSERT INTO ... AS SELECT ...`` +* ✅︎ ``DROP TABLE ...``, ``DROP VIEW ...``, and so on +* ✅︎ other statements not mentioned here +* ❌ ``SET ...; SELECT ...;`` - multiple statements not supported Examples ^^^^^^^^ diff --git a/docs/connection/db_connection/clickhouse/prerequisites.rst b/docs/connection/db_connection/clickhouse/prerequisites.rst index 35c3f04e6..654add047 100644 --- a/docs/connection/db_connection/clickhouse/prerequisites.rst +++ b/docs/connection/db_connection/clickhouse/prerequisites.rst @@ -46,7 +46,7 @@ used for creating a connection: .. code-tab:: sql Read + Write - -- allow external tables in the same schema as target table + -- allow creating tables in the target schema GRANT CREATE TABLE ON myschema.* TO username; -- allow read & write access to specific table @@ -57,13 +57,4 @@ used for creating a connection: -- allow read access to specific table GRANT SELECT ON myschema.mytable TO username; - .. code-tab:: sql Write only - - -- allow external tables in the same schema as target table - GRANT CREATE TABLE ON myschema.* TO username; - - -- allow read access to specific table (to get column types) - -- allow write access to specific table - GRANT SELECT, INSERT ON myschema.mytable TO username; - More details can be found in `official documentation `_. diff --git a/docs/connection/db_connection/clickhouse/sql.rst b/docs/connection/db_connection/clickhouse/sql.rst index 5550cde7b..cd331ef33 100644 --- a/docs/connection/db_connection/clickhouse/sql.rst +++ b/docs/connection/db_connection/clickhouse/sql.rst @@ -18,7 +18,7 @@ Syntax support Only queries with the following syntax are supported: * ``SELECT ...`` -* ``WITH ... SELECT ...`` +* ``WITH alias AS (...) SELECT ...`` Queries like ``SHOW ...`` are not supported. diff --git a/docs/connection/db_connection/clickhouse/types.rst b/docs/connection/db_connection/clickhouse/types.rst index d34b427a6..e9bdd22e9 100644 --- a/docs/connection/db_connection/clickhouse/types.rst +++ b/docs/connection/db_connection/clickhouse/types.rst @@ -17,8 +17,8 @@ This is how Clickhouse connector performs this: * Find corresponding ``Clickhouse type (read)`` -> ``Spark type`` combination (see below) for each DataFrame column. If no combination is found, raise exception. * Create DataFrame from query with specific column names and Spark types. -Writing to some existing Clickhuse table -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Writing to some existing Clickhouse table +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is how Clickhouse connector performs this: @@ -113,6 +113,8 @@ Here you can find source code with type conversions: Supported types --------------- +See `official documentation `_ + Generic types ~~~~~~~~~~~~~ @@ -243,7 +245,7 @@ Note: ``DateTime(P, TZ)`` has the same precision as ``DateTime(P)``. .. [5] Generic JDBC dialect generates DDL with Clickhouse type ``TIMESTAMP`` which is alias for ``DateTime32`` with precision up to seconds (``23:59:59``). - Inserting data with milliseconds precision (``23:59:59.999``) will lead to throwing away milliseconds (``23:59:59``). + Inserting data with milliseconds precision (``23:59:59.999``) will lead to **throwing away milliseconds**. .. [6] Clickhouse will raise an exception that data in format ``2001-01-01 23:59:59.999999`` has data ``.999999`` which does not match format ``YYYY-MM-DD hh:mm:ss``. @@ -304,8 +306,11 @@ This dialect does not have type conversion between some types, like Clickhouse ` The is a way to avoid this - just cast everything to ``String``. -Read unsupported column type -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Explicit type cast +------------------ + +``DBReader`` +~~~~~~~~~~~~ Use ``CAST`` or ``toJSONString`` to get column data as string in JSON format, and then cast string column in resulting dataframe to proper type using `from_json `_: @@ -332,8 +337,8 @@ and then cast string column in resulting dataframe to proper type using `from_js from_json(df.array_column, column_type).alias("array_column"), ) -Write unsupported column type -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +``DBWriter`` +~~~~~~~~~~~~ Convert dataframe column to JSON using `to_json `_, and write it as ``String`` column in Clickhouse: @@ -342,7 +347,7 @@ and write it as ``String`` column in Clickhouse: clickhouse.execute( """ - CREATE TABLE target_tbl AS ( + CREATE TABLE default.target_tbl AS ( id Int32, array_column_json String, ) @@ -360,7 +365,7 @@ and write it as ``String`` column in Clickhouse: writer.run(df) -Then you can parse this column on Clickhouse side: +Then you can parse this column on Clickhouse side - for example, by creating a view: .. code:: sql diff --git a/docs/connection/db_connection/greenplum/execute.rst b/docs/connection/db_connection/greenplum/execute.rst index 356df27b3..25fbe8962 100644 --- a/docs/connection/db_connection/greenplum/execute.rst +++ b/docs/connection/db_connection/greenplum/execute.rst @@ -28,12 +28,10 @@ Syntax support This method supports **any** query syntax supported by Greenplum, like: -* ``SELECT ... FROM ...`` -* ``WITH alias AS (...) SELECT ...`` - -Queries like ``SHOW ...`` are not supported. - -It does not support multiple queries in the same operation, like ``SET ...; SELECT ...;``. +* ✅︎ ``SELECT ... FROM ...`` +* ✅︎ ``WITH alias AS (...) SELECT ...`` +* ✅︎ ``SELECT func(arg1, arg2)`` or ``{call func(arg1, arg2)}`` - special syntax for calling functions +* ❌ ``SET ...; SELECT ...;`` - multiple statements not supported Examples ^^^^^^^^ @@ -65,15 +63,14 @@ Syntax support This method supports **any** query syntax supported by Greenplum, like: -* ``CREATE TABLE ...``, ``CREATE VIEW ...``, and so on -* ``ALTER ...`` -* ``INSERT INTO ... AS SELECT ...`` -* ``DROP TABLE ...``, ``DROP VIEW ...``, and so on -* ``CALL procedure(arg1, arg2) ...`` -* ``SELECT func(arg1, arg2)`` or ``{call func(arg1, arg2)}`` - special syntax for calling functions -* etc - -It does not support multiple queries in the same operation, like ``SET ...; CREATE TABLE ...;``. +* ✅︎ ``CREATE TABLE ...``, ``CREATE VIEW ...``, and so on +* ✅︎ ``ALTER ...`` +* ✅︎ ``INSERT INTO ... AS SELECT ...`` +* ✅︎ ``DROP TABLE ...``, ``DROP VIEW ...``, and so on +* ✅︎ ``CALL procedure(arg1, arg2) ...`` +* ✅︎ ``SELECT func(arg1, arg2)`` or ``{call func(arg1, arg2)}`` - special syntax for calling functions +* ✅︎ other statements not mentioned here +* ❌ ``SET ...; SELECT ...;`` - multiple statements not supported Examples ^^^^^^^^ diff --git a/docs/connection/db_connection/greenplum/types.rst b/docs/connection/db_connection/greenplum/types.rst index d6404569e..fb3077e9f 100644 --- a/docs/connection/db_connection/greenplum/types.rst +++ b/docs/connection/db_connection/greenplum/types.rst @@ -23,7 +23,7 @@ This is how Greenplum connector performs this: Yes, **all columns of a table**, not just selected ones. This means that if source table **contains** columns with unsupported type, the entire table cannot be read. -Writing to some existing Clickhuse table +Writing to some existing Greenplum table ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is how Greenplum connector performs this: @@ -33,7 +33,7 @@ This is how Greenplum connector performs this: * For each column in query result get column name and Greenplum type. * Match table columns with DataFrame columns (by name, case insensitive). If some column is present only in target table, but not in DataFrame (like ``DEFAULT`` or ``SERIAL`` column), and vice versa, raise an exception. - See `Write unsupported column type`_. + See `Explicit type cast`_. * Find corresponding ``Spark type`` -> ``Greenplumtype (write)`` combination (see below) for each DataFrame column. If no combination is found, raise exception. * If ``Greenplumtype (write)`` match ``Greenplum type (read)``, no additional casts will be performed, DataFrame column will be written to Greenplum as is. * If ``Greenplumtype (write)`` does not match ``Greenplum type (read)``, DataFrame column will be casted to target column type **on Greenplum side**. For example, you can write column with text data to ``json`` column which Greenplum connector currently does not support. @@ -272,8 +272,11 @@ Columns of these types cannot be read/written by Spark: The is a way to avoid this - just cast unsupported types to ``text``. But the way this can be done is not a straightforward. -Read unsupported column type -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Explicit type cast +------------------ + +``DBReader`` +~~~~~~~~~~~~ Unfortunately, it is not possible to cast unsupported column to some supported type on ``DBReader`` side: @@ -334,8 +337,8 @@ You can then parse this column on Spark side using `from_json ` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Use this method to execute some ``SELECT`` query which returns **small number or rows**, like reading +Oracle config, or reading data from some reference table. + +Method accepts :obj:`JDBCOptions `. + +Connection opened using this method should be then closed with :obj:`Oracle.close `. + +Syntax support +^^^^^^^^^^^^^^ + +This method supports **any** query syntax supported by Oracle, like: + +* ✅︎ ``SELECT ... FROM ...`` +* ✅︎ ``WITH alias AS (...) SELECT ...`` +* ✅︎ ``SELECT func(arg1, arg2) FROM DUAL`` - call function +* ✅︎ ``SHOW ...`` +* ❌ ``SET ...; SELECT ...;`` - multiple statements not supported + +Examples +^^^^^^^^ + +.. code-block:: python + + from onetl.connection import Oracle + + oracle = Oracle(...) + + df = oracle.fetch( + "SELECT value FROM some.reference_table WHERE key = 'some_constant'", + options=Oracle.JDBCOptions(query_timeout=10), + ) + oracle.close() + value = df.collect()[0][0] # get value from first row and first column + +Use :obj:`Oracle.execute ` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Use this method to execute DDL and DML operations. Each method call runs operation in a separated transaction, and then commits it. + +Method accepts :obj:`JDBCOptions `. + +Connection opened using this method should be then closed with :obj:`Oracle.close `. + +Syntax support +^^^^^^^^^^^^^^ + +This method supports **any** query syntax supported by Oracle, like: + +* ✅︎ ``CREATE TABLE ...``, ``CREATE VIEW ...`` +* ✅︎ ``ALTER ...`` +* ✅︎ ``INSERT INTO ... AS SELECT ...`` +* ✅︎ ``DROP TABLE ...``, ``DROP VIEW ...``, and so on +* ✅︎ ``CALL procedure(arg1, arg2) ...`` or ``{call procedure(arg1, arg2)}`` - special syntax for calling procedure +* ✅︎ ``DECLARE ... BEGIN ... END`` - execute PL/SQL statement +* ✅︎ other statements not mentioned here +* ❌ ``SET ...; SELECT ...;`` - multiple statements not supported + +Examples +^^^^^^^^ + +.. code-block:: python + + from onetl.connection import Oracle + + oracle = Oracle(...) + + with oracle: + oracle.execute("DROP TABLE schema.table") + oracle.execute( + """ + CREATE TABLE schema.table AS ( + id bigint GENERATED ALWAYS AS IDENTITY, + key VARCHAR2(4000), + value NUMBER + ) + """, + options=Oracle.JDBCOptions(query_timeout=10), + ) + + +References +---------- .. currentmodule:: onetl.connection.db_connection.oracle.connection diff --git a/docs/connection/db_connection/oracle/index.rst b/docs/connection/db_connection/oracle/index.rst index 519250fb5..ec9005397 100644 --- a/docs/connection/db_connection/oracle/index.rst +++ b/docs/connection/db_connection/oracle/index.rst @@ -7,6 +7,7 @@ Oracle :maxdepth: 1 :caption: Connection + prerequisites connection .. toctree:: @@ -14,5 +15,12 @@ Oracle :caption: Operations read + sql write execute + +.. toctree:: + :maxdepth: 1 + :caption: Troubleshooting + + types diff --git a/docs/connection/db_connection/oracle/prerequisites.rst b/docs/connection/db_connection/oracle/prerequisites.rst new file mode 100644 index 000000000..c86bc393c --- /dev/null +++ b/docs/connection/db_connection/oracle/prerequisites.rst @@ -0,0 +1,112 @@ +.. _oracle-prerequisites: + +Prerequisites +============= + +Version Compatibility +--------------------- + +* Oracle Server versions: 23, 21, 19, 18, 12.2 and __probably__ 11.2 (tested, but it's not mentioned in official docs). +* Spark versions: 2.3.x - 3.5.x +* Java versions: 8 - 20 + +See `official documentation `_. + +Installing PySpark +------------------ + +To use Oracle connector you should have PySpark installed (or injected to ``sys.path``) +BEFORE creating the connector instance. + +See :ref:`install-spark` installation instruction for more details. + +Connecting to Oracle +-------------------- + +Connection port +~~~~~~~~~~~~~~~ + +Connection is usually performed to port 1521. Port may differ for different Oracle instances. +Please ask your Oracle administrator to provide required information. + +Connection host +~~~~~~~~~~~~~~~ + +It is possible to connect to Oracle by using either DNS name of host or it's IP address. + +If you're using Oracle cluster, it is currently possible to connect only to **one specific node**. +Connecting to multiple nodes to perform load balancing, as well as automatic failover to new master/replica are not supported. + +Connect as proxy user +~~~~~~~~~~~~~~~~~~~~~ + +It is possible to connect to database as another user without knowing this user password. + +This can be enabled by granting user a special ``CONNECT THROUGH`` permission: + +.. code-block:: sql + + ALTER USER schema_owner GRANT CONNECT THROUGH proxy_user; + +Then you can connect to Oracle using credentials of ``proxy_user`` but specify that you need permissions of ``schema_owner``: + +.. code-block:: python + + oracle = Oracle( + ..., + user="proxy_user[schema_owner]", + password="proxy_user password", + ) + +See `official documentation `_. + +Required grants +~~~~~~~~~~~~~~~ + +Ask your Oracle cluster administrator to set following grants for a user, +used for creating a connection: + +.. tabs:: + + .. code-tab:: sql Read + Write (schema is owned by user) + + -- allow user to log in + GRANT CREATE SESSION TO username; + + -- allow creating tables in user schema + GRANT CREATE TABLE TO username; + + -- allow read & write access to specific table + GRANT SELECT, INSERT ON username.mytable TO username; + + .. code-tab:: sql Read + Write (schema is not owned by user) + + -- allow user to log in + GRANT CREATE SESSION TO username; + + -- allow creating tables in any schema, + -- as Oracle does not support specifying exact schema name + GRANT CREATE ANY TABLE TO username; + + -- only if if_exists="replace_entire_table" is used: + -- allow dropping/truncating tables in any schema, + -- as Oracle does not support specifying exact schema name + GRANT DROP ANY TABLE TO username; + + -- allow read & write access to specific table + GRANT SELECT, INSERT ON someschema.mytable TO username; + + .. code-tab:: sql Read only + + -- allow user to log in + GRANT CREATE SESSION TO username; + + -- allow read access to specific table + GRANT SELECT ON someschema.mytable TO username; + +More details can be found in official documentation: + * `GRANT `_ + * `SELECT `_ + * `CREATE TABLE `_ + * `INSERT `_ + * `TRUNCATE TABLE `_ diff --git a/docs/connection/db_connection/oracle/read.rst b/docs/connection/db_connection/oracle/read.rst index ffd393e6e..5877e2caf 100644 --- a/docs/connection/db_connection/oracle/read.rst +++ b/docs/connection/db_connection/oracle/read.rst @@ -1,18 +1,76 @@ .. _oracle-read: -Reading from Oracle -=================== +Reading from Oracle using ``DBReader`` +====================================== -There are 2 ways of distributed data reading from Oracle: +.. warning:: -* Using :obj:`DBReader ` with different :ref:`strategy` -* Using :obj:`Oracle.sql ` + Please take into account :ref:`oracle-types` -Both methods accept :obj:`JDBCReadOptions ` +:obj:`DBReader ` supports :ref:`strategy` for incremental data reading, +but does not support custom queries, like JOINs. -.. currentmodule:: onetl.connection.db_connection.oracle.connection +Supported DBReader features +--------------------------- -.. automethod:: Oracle.sql +* ✅︎ ``columns`` +* ✅︎ ``where`` +* ✅︎ ``hwm``, supported strategies: +* * ✅︎ :ref:`snapshot-strategy` +* * ✅︎ :ref:`incremental-strategy` +* * ✅︎ :ref:`snapshot-batch-strategy` +* * ✅︎ :ref:`incremental-batch-strategy` +* ✅︎ ``hint`` (see `official documentation `_) +* ❌ ``df_schema`` +* ✅︎ ``options`` (see :obj:`JDBCReadOptions `) + +Examples +-------- + +Snapshot strategy: + +.. code-block:: python + + from onetl.connection import Oracle + from onetl.db import DBReader + + oracle = Oracle(...) + + reader = DBReader( + connection=oracle, + source="schema.table", + columns=["id", "key", "CAST(value AS VARCHAR2(4000)) value", "updated_dt"], + where="key = 'something'", + hint="INDEX(schema.table key_index)", + options=Oracle.ReadOptions(partition_column="id", num_partitions=10), + ) + df = reader.run() + +Incremental strategy: + +.. code-block:: python + + from onetl.connection import Oracle + from onetl.db import DBReader + from onetl.strategy import IncrementalStrategy + + oracle = Oracle(...) + + reader = DBReader( + connection=oracle, + source="schema.table", + columns=["id", "key", "CAST(value AS VARCHAR2(4000)) value", "updated_dt"], + where="key = 'something'", + hint="INDEX(schema.table key_index)", + hwm=DBReader.AutoDetectHWM(name="oracle_hwm", expression="updated_dt"), + options=Oracle.ReadOptions(partition_column="id", num_partitions=10), + ) + + with IncrementalStrategy(): + df = reader.run() + +Read options +------------ .. currentmodule:: onetl.connection.db_connection.jdbc_connection.options diff --git a/docs/connection/db_connection/oracle/sql.rst b/docs/connection/db_connection/oracle/sql.rst new file mode 100644 index 000000000..3ea0832d1 --- /dev/null +++ b/docs/connection/db_connection/oracle/sql.rst @@ -0,0 +1,55 @@ +.. _oracle-sql: + +Reading from Oracle using ``Oracle.sql`` +======================================== + +.. warning:: + + Please take into account :ref:`oracle-types` + +:obj:`Oracle.sql ` allows passing custom SQL query, +but does not support incremental strategies. + +Method also accepts :obj:`JDBCReadOptions `. + +Syntax support +-------------- + +Only queries with the following syntax are supported: + +* ``SELECT ...`` +* ``WITH alias AS (...) SELECT ...`` + +Queries like ``SHOW ...`` are not supported. + +This method also does not support multiple queries in the same operation, like ``SET ...; SELECT ...;``. + +Examples +-------- + +.. code-block:: python + + from onetl.connection import Oracle + + oracle = Oracle(...) + df = oracle.sql( + """ + SELECT + id, + key, + CAST(value AS VARCHAR2(4000)) value, + updated_at + FROM + some.mytable + WHERE + key = 'something' + """, + options=Oracle.ReadOptions(partition_column="id", num_partitions=10), + ) + +References +---------- + +.. currentmodule:: onetl.connection.db_connection.oracle.connection + +.. automethod:: Oracle.sql diff --git a/docs/connection/db_connection/oracle/types.rst b/docs/connection/db_connection/oracle/types.rst new file mode 100644 index 000000000..907afe1c9 --- /dev/null +++ b/docs/connection/db_connection/oracle/types.rst @@ -0,0 +1,382 @@ +.. _oracle-types: + +Oracle <-> Spark type mapping +============================= + +Type detection & casting +------------------------ + +Spark's DataFrames always have a ``schema`` which is a list of columns with corresponding Spark types. All operations on a column are performed using column type. + +Reading from Oracle +~~~~~~~~~~~~~~~~~~~ + +This is how Oracle connector performs this: + +* For each column in query result (``SELECT column1, column2, ... FROM table ...``) get column name and Oracle type. +* Find corresponding ``Oracle type (read)`` -> ``Spark type`` combination (see below) for each DataFrame column. If no combination is found, raise exception. +* Create DataFrame from query with specific column names and Spark types. + +Writing to some existing Oracle table +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This is how Oracle connector performs this: + +* Get names of columns in DataFrame. [1]_ +* Perform ``SELECT * FROM table LIMIT 0`` query. +* Take only columns present in DataFrame (by name, case insensitive). For each found column get Clickhouse type. +* **Find corresponding** ``Oracle type (read)`` -> ``Spark type`` **combination** (see below) for each DataFrame column. If no combination is found, raise exception. [2]_ +* Find corresponding ``Spark type`` -> ``Oracle type (write)`` combination (see below) for each DataFrame column. If no combination is found, raise exception. +* If ``Oracle type (write)`` match ``Oracle type (read)``, no additional casts will be performed, DataFrame column will be written to Oracle as is. +* If ``Oracle type (write)`` does not match ``Oracle type (read)``, DataFrame column will be casted to target column type **on Oracle side**. + For example, you can write column with text data to ``int`` column, if column contains valid integer values within supported value range and precision. + +.. [1] + This allows to write data to tables with ``DEFAULT`` and ``GENERATED`` columns - if DataFrame has no such column, + it will be populated by Oracle. + +.. [2] + + Yes, this is weird. + +Create new table using Spark +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. warning:: + + ABSOLUTELY NOT RECOMMENDED! + +This is how Oracle connector performs this: + +* Find corresponding ``Spark type`` -> ``Oracle type (create)`` combination (see below) for each DataFrame column. If no combination is found, raise exception. +* Generate DDL for creating table in Oracle, like ``CREATE TABLE (col1 ...)``, and run it. +* Write DataFrame to created table as is. + +But Oracle connector support only limited number of types and almost no custom clauses (like ``PARTITION BY``, ``INDEX``, etc). +So instead of relying on Spark to create tables: + +.. dropdown:: See example + + .. code:: python + + writer = DBWriter( + connection=oracle, + table="public.table", + options=Oracle.WriteOptions(if_exists="append"), + ) + writer.run(df) + +Always prefer creating table with desired DDL **BEFORE WRITING DATA**: + +.. dropdown:: See example + + .. code:: python + + oracle.execute( + """ + CREATE TABLE username.table AS ( + id NUMBER, + business_dt TIMESTAMP(6), + value VARCHAR2(2000) + ) + """, + ) + + writer = DBWriter( + connection=oracle, + table="public.table", + options=Oracle.WriteOptions(if_exists="append"), + ) + writer.run(df) + +See Oracle `CREATE TABLE `_ documentation. + +Supported types +--------------- + +References +~~~~~~~~~~ + +See `List of Oracle types `_. + +Here you can find source code with type conversions: + +* `JDBC -> Spark `_ +* `Spark -> JDBC `_ + +Numeric types +~~~~~~~~~~~~~ + ++----------------------------------+-----------------------------------+-------------------------------+---------------------------+ +| Oracle type (read) | Spark type | Oracle type (write) | Oracle type (create) | ++==================================+===================================+===============================+===========================+ +| ``NUMBER`` | ``DecimalType(P=38, S=10)`` | ``NUMBER(P=38, S=10)`` | ``NUMBER(P=38, S=10)`` | ++----------------------------------+-----------------------------------+-------------------------------+---------------------------+ +| ``NUMBER(P=0..38)`` | ``DecimalType(P=0..38, S=0)`` | ``NUMBER(P=0..38, S=0)`` | ``NUMBER(P=38, S=0)`` | ++----------------------------------+-----------------------------------+-------------------------------+---------------------------+ +| ``NUMBER(P=0..38, S=0..38)`` | ``DecimalType(P=0..38, S=0..38)`` | ``NUMBER(P=0..38, S=0..38)`` | ``NUMBER(P=38, S=0..38)`` | ++----------------------------------+-----------------------------------+-------------------------------+---------------------------+ +| ``NUMBER(P=..., S=-127..-1)`` | unsupported [3]_ | | | ++----------------------------------+-----------------------------------+-------------------------------+---------------------------+ +| ``FLOAT`` | ``DecimalType(P=38, S=10)`` | ``NUMBER(P=38, S=10)`` | ``NUMBER(P=38, S=10)`` | ++----------------------------------+ | | | +| ``FLOAT(N=1..126)`` | | | | ++----------------------------------+ | | | +| ``REAL`` | | | | ++----------------------------------+ | | | +| ``DOUBLE PRECISION`` | | | | ++----------------------------------+-----------------------------------+-------------------------------+---------------------------+ +| ``BINARY_FLOAT`` | ``FloatType()`` | ``NUMBER(P=19, S=4)`` | ``NUMBER(P=19, S=4)`` | ++----------------------------------+-----------------------------------+ | | +| ``BINARY_DOUBLE`` | ``DoubleType()`` | | | ++----------------------------------+-----------------------------------+-------------------------------+---------------------------+ +| ``SMALLINT`` | ``DecimalType(P=38, S=0)`` | ``NUMBER(P=38, S=0)`` | ``NUMBER(P=38, S=0)`` | ++----------------------------------+ | | | +| ``INTEGER`` | | | | ++----------------------------------+-----------------------------------+-------------------------------+---------------------------+ +| ``LONG`` | ``StringType()`` | ``CLOB`` | ``CLOB`` | ++----------------------------------+-----------------------------------+-------------------------------+---------------------------+ + +.. [3] + + Oracle support decimal types with negative scale, like ``NUMBER(38, -10)``. Spark doesn't. + +Temporal types +~~~~~~~~~~~~~~ + ++--------------------------------------------+------------------------------------+---------------------------------+---------------------------------+ +| Oracle type (read) | Spark type | Oracle type (write) | Oracle type (create) | ++============================================+====================================+=================================+=================================+ +| ``DATE``, days | ``TimestampType()``, microseconds | ``TIMESTAMP(6)``, microseconds | ``TIMESTAMP(6)``, microseconds | ++--------------------------------------------+------------------------------------+---------------------------------+---------------------------------+ +| ``TIMESTAMP``, microseconds | ``TimestampType()``, microseconds | ``TIMESTAMP(6)``, microseconds | ``TIMESTAMP(6)``, microseconds | ++--------------------------------------------+ | | | +| ``TIMESTAMP(0)``, seconds | | | | ++--------------------------------------------+ | | | +| ``TIMESTAMP(3)``, milliseconds | | | | ++--------------------------------------------+ | | | +| ``TIMESTAMP(6)``, microseconds | | | | ++--------------------------------------------+------------------------------------+---------------------------------+---------------------------------+ +| ``TIMESTAMP(9)``, nanoseconds | ``TimestampType()``, microseconds, | ``TIMESTAMP(6)``, microseconds, | ``TIMESTAMP(6)``, microseconds, | +| | **precision loss** [4]_ | **precision loss** | **precision loss** | ++--------------------------------------------+------------------------------------+---------------------------------+---------------------------------+ +| ``TIMESTAMP WITH TIME ZONE`` | unsupported | | | ++--------------------------------------------+ | | | +| ``TIMESTAMP(N=0..9) WITH TIME ZONE`` | | | | ++--------------------------------------------+ | | | +| ``TIMESTAMP WITH LOCAL TIME ZONE`` | | | | ++--------------------------------------------+ | | | +| ``TIMESTAMP(N=0..9) WITH LOCAL TIME ZONE`` | | | | ++--------------------------------------------+ | | | +| ``INTERVAL YEAR TO MONTH`` | | | | ++--------------------------------------------+ | | | +| ``INTERVAL DAY TO SECOND`` | | | | ++--------------------------------------------+------------------------------------+---------------------------------+---------------------------------+ + +.. [4] + Oracle support timestamp up to nanoseconds precision (``23:59:59.999999999``), + but Spark ``TimestampType()`` supports datetime up to microseconds precision (``23:59:59.999999``). + Nanoseconds will be lost during read or write operations. + +String types +~~~~~~~~~~~~ + ++-----------------------------+------------------+---------------------+----------------------+ +| Oracle type (read) | Spark type | Oracle type (write) | Oracle type (create) | ++=============================+==================+=====================+======================+ +| ``CHAR`` | ``StringType()`` | ``CLOB`` | ``CLOB`` | ++-----------------------------+ | | | +| ``CHAR(N CHAR)`` | | | | ++-----------------------------+ | | | +| ``CHAR(N BYTE)`` | | | | ++-----------------------------+ | | | +| ``NCHAR`` | | | | ++-----------------------------+ | | | +| ``NCHAR(N)`` | | | | ++-----------------------------+ | | | +| ``VARCHAR(N)`` | | | | ++-----------------------------+ | | | +| ``LONG VARCHAR`` | | | | ++-----------------------------+ | | | +| ``VARCHAR2(N CHAR)`` | | | | ++-----------------------------+ | | | +| ``VARCHAR2(N BYTE)`` | | | | ++-----------------------------+ | | | +| ``NVARCHAR2(N)`` | | | | ++-----------------------------+ | | | +| ``CLOB`` | | | | ++-----------------------------+ | | | +| ``NCLOB`` | | | | ++-----------------------------+------------------+---------------------+----------------------+ + +Binary types +~~~~~~~~~~~~ + ++--------------------------+------------------+---------------------+----------------------+ +| Oracle type (read) | Spark type | Oracle type (write) | Oracle type (create) | ++==========================+==================+=====================+======================+ +| ``RAW(N)`` | ``BinaryType()`` | ``BLOB`` | ``BLOB`` | ++--------------------------+ | | | +| ``LONG RAW`` | | | | ++--------------------------+ | | | +| ``BLOB`` | | | | ++--------------------------+------------------+---------------------+----------------------+ +| ``BFILE`` | unsupported | | | ++--------------------------+------------------+---------------------+----------------------+ + +Struct types +~~~~~~~~~~~~ + ++-------------------------------------+------------------+---------------------+----------------------+ +| Oracle type (read) | Spark type | Oracle type (write) | Oracle type (create) | ++=====================================+==================+=====================+======================+ +| ``XMLType`` | ``StringType()`` | ``CLOB`` | ``CLOB`` | ++-------------------------------------+ | | | +| ``URIType`` | | | | ++-------------------------------------+ | | | +| ``DBURIType`` | | | | ++-------------------------------------+ | | | +| ``XDBURIType`` | | | | ++-------------------------------------+ | | | +| ``HTTPURIType`` | | | | ++-------------------------------------+ | | | +| ``CREATE TYPE ... AS OBJECT (...)`` | | | | ++-------------------------------------+------------------+---------------------+----------------------+ +| ``JSON`` | unsupported | | | ++-------------------------------------+ | | | +| ``CREATE TYPE ... AS VARRAY ...`` | | | | ++-------------------------------------+ | | | +| ``CREATE TYPE ... AS TABLE OF ...`` | | | | ++-------------------------------------+------------------+---------------------+----------------------+ + +Special types +~~~~~~~~~~~~~ + ++--------------------+-------------------+---------------------+----------------------+ +| Oracle type (read) | Spark type | Oracle type (write) | Oracle type (create) | ++====================+===================+=====================+======================+ +| ``BOOLEAN`` | ``BooleanType()`` | ``BOOLEAN`` | ``NUMBER(P=1, S=0)`` | ++--------------------+-------------------+---------------------+----------------------+ +| ``ROWID`` | ``StringType()`` | ``CLOB`` | ``CLOB`` | ++--------------------+ | | | +| ``UROWID`` | | | | ++--------------------+ | | | +| ``UROWID(N)`` | | | | ++--------------------+-------------------+---------------------+----------------------+ +| ``ANYTYPE`` | unsupported | | | ++--------------------+ | | | +| ``ANYDATA`` | | | | ++--------------------+ | | | +| ``ANYDATASET`` | | | | ++--------------------+-------------------+---------------------+----------------------+ + +Explicit type cast +------------------ + +``DBReader`` +~~~~~~~~~~~~ + +It is possible to explicitly cast column of unsupported type using ``DBReader(columns=...)`` syntax. + +For example, you can use ``CAST(column AS CLOB)`` to convert data to string representation on Oracle side, and so it will be read as Spark's ``StringType()``. + +It is also possible to use `JSON_ARRAY `_ +or `JSON_OBJECT `_ Oracle functions +to convert column of any type to string representation, and then parse this column on Spark side using +`from_json `_: + +.. code-block:: python + + from pyspark.sql.types import IntegerType + + from onetl.connection import Oracle + from onetl.db import DBReader + + oracle = Oracle(...) + + DBReader( + connection=oracle, + columns=[ + "id", + "supported_column", + "CAST(unsupported_column AS VARCHAR2(4000)) unsupported_column_str", + # or + "JSON_ARRAY(array_column) array_column_json", + ], + ) + df = reader.run() + + # Spark requires all columns to have some type, describe it + column_type = IntegerType() + + # cast column content to proper Spark type + df = df.select( + df.id, + df.supported_column, + # explicit cast + df.unsupported_column_str.cast("integer").alias("parsed_integer"), + # or explicit json parsing + from_json(df.array_column_json, schema).alias("array_column"), + ) + +``DBWriter`` +~~~~~~~~~~~~ + +It is always possible to convert data on Spark side to string, and then write it to ``text`` column in Oracle table. + +For example, you can convert data using `to_json `_ function. + +.. code:: python + + from pyspark.sql.functions import to_json + + from onetl.connection import Oracle + from onetl.db import DBReader + + oracle = Oracle(...) + + oracle.execute( + """ + CREATE TABLE schema.target_table ( + id INTEGER, + supported_column TIMESTAMP, + array_column_json VARCHAR2(4000) -- any string type, actually + ) + """, + ) + + write_df = df.select( + df.id, + df.supported_column, + to_json(df.unsupported_column).alias("array_column_json"), + ) + + writer = DBWriter( + connection=oracle, + target="schema.target_table", + ) + writer.run(write_df) + +Then you can parse this column on Oracle side - for example, by creating a view: + +.. code-block:: sql + + SELECT + id, + supported_column, + JSON_VALUE(array_column_json, '$[0]' RETURNING NUMBER) AS array_item_0 + FROM + schema.target_table + +Or by using `VIRTUAL column `_: + +.. code-block:: sql + + CREATE TABLE schema.target_table ( + id INTEGER, + supported_column TIMESTAMP, + array_column_json VARCHAR2(4000), -- any string type, actually + array_item_0 GENERATED ALWAYS AS (JSON_VALUE(array_column_json, '$[0]' RETURNING NUMBER)) VIRTUAL + ) + +But data will be parsed on each table read in any case, as Oracle does no support ``GENERATED ALWAYS AS (...) STORED`` columns. diff --git a/docs/connection/db_connection/oracle/write.rst b/docs/connection/db_connection/oracle/write.rst index 78c57d915..0f21ec2ac 100644 --- a/docs/connection/db_connection/oracle/write.rst +++ b/docs/connection/db_connection/oracle/write.rst @@ -1,9 +1,47 @@ .. _oracle-write: -Writing to Oracle -================= +Writing to Oracle using ``DBWriter`` +==================================== -For writing data to Oracle, use :obj:`DBWriter ` with options below. +For writing data to Oracle, use :obj:`DBWriter `. + +.. warning:: + + Please take into account :ref:`oracle-types` + +.. warning:: + + It is always recommended to create table explicitly using :obj:`Oracle.execute ` + instead of relying on Spark's table DDL generation. + + This is because Spark's DDL generator can create columns with different precision and types than it is expected, + causing precision loss or other issues. + +Examples +-------- + +.. code-block:: python + + from onetl.connection import Oracle + from onetl.db import DBWriter + + oracle = Oracle(...) + + df = ... # data is here + + writer = DBWriter( + connection=oracle, + target="schema.table", + options=Oracle.WriteOptions(if_exists="append"), + ) + + writer.run(df) + + +Write options +------------- + +Method above accepts :obj:`JDBCWriteOptions ` .. currentmodule:: onetl.connection.db_connection.jdbc_connection.options diff --git a/docs/connection/db_connection/postgres/execute.rst b/docs/connection/db_connection/postgres/execute.rst index ea9bd5048..ac9f553a5 100644 --- a/docs/connection/db_connection/postgres/execute.rst +++ b/docs/connection/db_connection/postgres/execute.rst @@ -1,7 +1,7 @@ .. _postgres-execute: Executing statements in Postgres -================================== +================================ How to ------ @@ -9,7 +9,7 @@ How to There are 2 ways to execute some statement in Postgres Use :obj:`Postgres.fetch ` -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Use this method to execute some ``SELECT`` query which returns **small number or rows**, like reading Postgres config, or reading data from some reference table. @@ -23,12 +23,9 @@ Syntax support This method supports **any** query syntax supported by Postgres, like: -* ``SELECT ... FROM ...`` -* ``WITH alias AS (...) SELECT ...`` - -Queries like ``SHOW ...`` are not supported. - -It does not support multiple queries in the same operation, like ``SET ...; SELECT ...;``. +* ✅︎ ``SELECT ... FROM ...`` +* ✅︎ ``WITH alias AS (...) SELECT ...``\ +* ❌ ``SET ...; SELECT ...;`` - multiple statements not supported Examples ^^^^^^^^ @@ -60,15 +57,14 @@ Syntax support This method supports **any** query syntax supported by Postgres, like: -* ``CREATE TABLE ...``, ``CREATE VIEW ...`` -* ``ALTER ...`` -* ``INSERT INTO ... AS SELECT ...`` -* ``DROP TABLE ...``, ``DROP VIEW ...``, and so on -* ``CALL procedure(arg1, arg2) ...`` -* ``SELECT func(arg1, arg2)`` or ``{call func(arg1, arg2)}`` - special syntax for calling functions -* etc - -It does not support multiple queries in the same operation, like ``SET ...; CREATE TABLE ...;``. +* ✅︎ ``CREATE TABLE ...``, ``CREATE VIEW ...``, and so on +* ✅︎ ``ALTER ...`` +* ✅︎ ``INSERT INTO ... AS SELECT ...`` +* ✅︎ ``DROP TABLE ...``, ``DROP VIEW ...``, and so on +* ✅︎ ``CALL procedure(arg1, arg2) ...`` +* ✅︎ ``SELECT func(arg1, arg2)`` or ``{call func(arg1, arg2)}`` - special syntax for calling functions +* ✅︎ other statements not mentioned here +* ❌ ``SET ...; SELECT ...;`` - multiple statements not supported Examples ^^^^^^^^ @@ -84,7 +80,7 @@ Examples postgres.execute( """ CREATE TABLE schema.table AS ( - id biging ALWAYS GENERATED AS IDENTITY, + id bigint GENERATED ALWAYS AS IDENTITY, key text, value real ) diff --git a/docs/connection/db_connection/postgres/prerequisites.rst b/docs/connection/db_connection/postgres/prerequisites.rst index 1921a3830..509b54bc0 100644 --- a/docs/connection/db_connection/postgres/prerequisites.rst +++ b/docs/connection/db_connection/postgres/prerequisites.rst @@ -23,12 +23,6 @@ See :ref:`install-spark` installation instruction for more details. Connecting to Postgres ----------------------- -Connection port -~~~~~~~~~~~~~~~ - -Connection is usuallu performed to port 5432. Port may differ for different Postgres instances. -Please ask your Postgres administrator to provide required information. - Allowing connection to Postgres instance ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -37,8 +31,16 @@ e.g. by updating ``pg_hba.conf`` file. See `official documentation `_. -Postgres cluster interaction -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Connection port +~~~~~~~~~~~~~~~ + +Connection is usually performed to port 5432. Port may differ for different Postgres instances. +Please ask your Postgres administrator to provide required information. + +Connection host +~~~~~~~~~~~~~~~ + +It is possible to connect to Postgres by using either DNS name of host or it's IP address. If you're using Postgres cluster, it is currently possible to connect only to **one specific node**. Connecting to multiple nodes to perform load balancing, as well as automatic failover to new master/replica are not supported. @@ -57,7 +59,10 @@ used for creating a connection: GRANT USAGE, CREATE ON SCHEMA myschema TO username; -- allow read & write access to specific table - GRANT SELECT, INSERT, TRUNCATE ON myschema.mytable TO username; + GRANT SELECT, INSERT ON myschema.mytable TO username; + + -- only if if_exists="replace_entire_table" is used: + GRANT TRUNCATE ON myschema.mytable TO username; .. code-tab:: sql Read only diff --git a/docs/connection/db_connection/postgres/read.rst b/docs/connection/db_connection/postgres/read.rst index fc4a8e3b3..0733f8e08 100644 --- a/docs/connection/db_connection/postgres/read.rst +++ b/docs/connection/db_connection/postgres/read.rst @@ -1,7 +1,7 @@ .. _postgres-read: Reading from Postgres using ``DBReader`` -========================================== +======================================== .. warning:: diff --git a/docs/connection/db_connection/postgres/sql.rst b/docs/connection/db_connection/postgres/sql.rst index ae6d5602c..1430381a1 100644 --- a/docs/connection/db_connection/postgres/sql.rst +++ b/docs/connection/db_connection/postgres/sql.rst @@ -1,7 +1,7 @@ .. _postgres-sql: Reading from Postgres using ``Postgres.sql`` -================================================ +============================================ .. warning:: @@ -18,7 +18,7 @@ Syntax support Only queries with the following syntax are supported: * ``SELECT ...`` -* ``WITH ... SELECT ...`` +* ``WITH alias AS (...) SELECT ...`` Queries like ``SHOW ...`` are not supported. diff --git a/docs/connection/db_connection/postgres/types.rst b/docs/connection/db_connection/postgres/types.rst index b149b64bd..224d4ee4d 100644 --- a/docs/connection/db_connection/postgres/types.rst +++ b/docs/connection/db_connection/postgres/types.rst @@ -9,7 +9,7 @@ Type detection & casting Spark's DataFrames always have a ``schema`` which is a list of columns with corresponding Spark types. All operations on a column are performed using column type. Reading from Postgres -~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~ This is how Postgres connector performs this: @@ -20,8 +20,8 @@ This is how Postgres connector performs this: .. [1] All Postgres types that doesn't have corresponding Java type are converted to ``String``. -Writing to some existing Clickhuse table -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Writing to some existing Postgres table +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is how Postgres connector performs this: @@ -319,17 +319,18 @@ Geo types +----------------------+-----------------------+-----------------------+-------------------------+ Explicit type cast -------------------- +------------------ ``DBReader`` -~~~~~~~~~~~ +~~~~~~~~~~~~ It is possible to explicitly cast column of unsupported type using ``DBReader(columns=...)`` syntax. For example, you can use ``CAST(column AS text)`` to convert data to string representation on Postgres side, and so it will be read as Spark's ``StringType()``. -It is also possible to use ``to_json`` Postgres function for convert column of any type to string representation, -and then parse this column on Spark side using `from_json `_: +It is also possible to use `to_json `_ Postgres function +to convert column of any type to string representation, and then parse this column on Spark side using +`from_json `_: .. code-block:: python @@ -348,7 +349,7 @@ and then parse this column on Spark side using `from_json 'field' + array_column_json->'0' AS array_item_0 FROM schema.target_table diff --git a/docs/connection/db_connection/postgres/write.rst b/docs/connection/db_connection/postgres/write.rst index 1f5f2ed5c..ad0c42f68 100644 --- a/docs/connection/db_connection/postgres/write.rst +++ b/docs/connection/db_connection/postgres/write.rst @@ -1,7 +1,7 @@ .. _postgres-write: Writing to Postgres using ``DBWriter`` -======================================== +====================================== For writing data to Postgres, use :obj:`DBWriter `. diff --git a/onetl/connection/db_connection/oracle/connection.py b/onetl/connection/db_connection/oracle/connection.py index 7008d09b5..f6b1ad2d9 100644 --- a/onetl/connection/db_connection/oracle/connection.py +++ b/onetl/connection/db_connection/oracle/connection.py @@ -73,29 +73,9 @@ class Oracle(JDBCConnection): Based on Maven package ``com.oracle.database.jdbc:ojdbc8:23.2.0.0`` (`official Oracle JDBC driver `_). - .. dropdown:: Version compatibility - - * Oracle Server versions: 23, 21, 19, 18, 12.2 and probably 11.2 (tested, but that's not official). - * Spark versions: 2.3.x - 3.5.x - * Java versions: 8 - 20 - - See `official documentation `_. - .. warning:: - To use Oracle connector you should have PySpark installed (or injected to ``sys.path``) - BEFORE creating the connector instance. - - You can install PySpark as follows: - - .. code:: bash - - pip install onetl[spark] # latest PySpark version - - # or - pip install onetl pyspark=3.5.0 # pass specific PySpark version - - See :ref:`install-spark` installation instruction for more details. + Before using this connector please take into account :ref:`oracle-prerequisites` Parameters ---------- @@ -116,16 +96,16 @@ class Oracle(JDBCConnection): .. warning :: - Be careful, to correct work you must provide ``sid`` or ``service_name`` + You should provide either ``sid`` or ``service_name``, not both of them service_name : str, default: ``None`` Specifies one or more names by which clients can connect to the instance. - For example: ``MYDATA``. + For example: ``PDB1``. .. warning :: - Be careful, for correct work you must provide ``sid`` or ``service_name`` + You should provide either ``sid`` or ``service_name``, not both of them spark : :obj:`pyspark.sql.SparkSession` Spark session. @@ -133,24 +113,22 @@ class Oracle(JDBCConnection): extra : dict, default: ``None`` Specifies one or more extra parameters by which clients can connect to the instance. - For example: ``{"defaultBatchValue": 100}`` + For example: ``{"remarksReporting": "false"}`` - See `Oracle JDBC driver properties documentation - `_ - for more details + See official documentation: + * `Connection parameters `_ + * `Connection properties `_ Examples -------- - Oracle connection initialization + Connect to Oracle using ``sid``: .. code:: python from onetl.connection import Oracle from pyspark.sql import SparkSession - extra = {"defaultBatchValue": 100} - # Create Spark session with Oracle driver loaded maven_packages = Oracle.get_packages() spark = ( @@ -165,7 +143,22 @@ class Oracle(JDBCConnection): user="user", password="*****", sid="XE", - extra=extra, + extra={"remarksReporting": "false"}, + spark=spark, + ) + + Using ``service_name``: + + .. code:: python + + ... + + oracle = Oracle( + host="database.host.or.ip", + user="user", + password="*****", + service_name="PDB1", + extra={"remarksReporting": "false"}, spark=spark, )