Skip to content

Commit

Permalink
[DOP-13252] Improve MySQL documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
dolfinus committed Mar 12, 2024
1 parent f306a97 commit b5a577a
Show file tree
Hide file tree
Showing 9 changed files with 716 additions and 35 deletions.
5 changes: 5 additions & 0 deletions docs/changelog/next_release/234.improvement.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Improve MySQL documentation:
* Add "Types" section describing mapping between MySQL and Spark types
* Add "Prerequisites" section describing different aspects of connecting to MySQL
* Separate documentation of ``DBReader`` and ``MySQL.sql``
* Add examples for ``MySQL.fetch`` and ``MySQL.execute``
92 changes: 91 additions & 1 deletion docs/connection/db_connection/mysql/execute.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,97 @@
.. _mysql-execute:

Executing statements in MySQL
=============================
==================================

How to
------

There are 2 ways to execute some statement in MySQL

Use :obj:`MySQL.fetch <onetl.connection.db_connection.mysql.connection.MySQL.fetch>`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use this method to execute some ``SELECT`` query which returns **small number or rows**, like reading
MySQL config, or reading data from some reference table.

Method accepts :obj:`JDBCOptions <onetl.connection.db_connection.jdbc_mixin.options.JDBCOptions>`.

Connection opened using this method should be then closed with :obj:`MySQL.close <onetl.connection.db_connection.mysql.connection.MySQL.close>`.

Syntax support
^^^^^^^^^^^^^^

This method supports **any** query syntax supported by MySQL, like:

* ✅︎ ``SELECT ... FROM ...``
* ✅︎ ``WITH alias AS (...) SELECT ...``
* ✅︎ ``SELECT func(arg1, arg2)`` or ``{?= call func(arg1, arg2)}`` - special syntax for calling function
* ✅︎ ``SHOW ...``
* ❌ ``SET ...; SELECT ...;`` - multiple statements not supported

Examples
^^^^^^^^

.. code-block:: python
from onetl.connection import MySQL
mysql = MySQL(...)
df = mysql.fetch(
"SELECT value FROM some.reference_table WHERE key = 'some_constant'",
options=MySQL.JDBCOptions(query_timeout=10),
)
mysql.close()
value = df.collect()[0][0] # get value from first row and first column
Use :obj:`MySQL.execute <onetl.connection.db_connection.mysql.connection.MySQL.execute>`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use this method to execute DDL and DML operations. Each method call runs operation in a separated transaction, and then commits it.

Method accepts :obj:`JDBCOptions <onetl.connection.db_connection.jdbc_mixin.options.JDBCOptions>`.

Connection opened using this method should be then closed with :obj:`MySQL.close <onetl.connection.db_connection.mysql.connection.MySQL.close>`.

Syntax support
^^^^^^^^^^^^^^

This method supports **any** query syntax supported by MySQL, like:

* ✅︎ ``CREATE TABLE ...``, ``CREATE VIEW ...``, and so on
* ✅︎ ``ALTER ...``
* ✅︎ ``INSERT INTO ... AS SELECT ...``
* ✅︎ ``DROP TABLE ...``, ``DROP VIEW ...``, and so on
* ✅︎ ``CALL procedure(arg1, arg2) ...`` or ``{call procedure(arg1, arg2)}`` - special syntax for calling procedure
* ✅︎ other statements not mentioned here
* ❌ ``SET ...; SELECT ...;`` - multiple statements not supported

Examples
^^^^^^^^

.. code-block:: python
from onetl.connection import MySQL
mysql = MySQL(...)
with mysql:
mysql.execute("DROP TABLE schema.table")
mysql.execute(
"""
CREATE TABLE schema.table AS (
id bigint,
key text,
value float
)
ENGINE = InnoDB
""",
options=MySQL.JDBCOptions(query_timeout=10),
)
References
----------

.. currentmodule:: onetl.connection.db_connection.mysql.connection

Expand Down
8 changes: 8 additions & 0 deletions docs/connection/db_connection/mysql/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,20 @@ MySQL
:maxdepth: 1
:caption: Connection

prerequisites
connection

.. toctree::
:maxdepth: 1
:caption: Operations

read
sql
write
execute

.. toctree::
:maxdepth: 1
:caption: Troubleshooting

types
64 changes: 64 additions & 0 deletions docs/connection/db_connection/mysql/prerequisites.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
.. _mysql-prerequisites:

Prerequisites
=============

Version Compatibility
---------------------

* MySQL server versions: 5.7, 8.0
* Spark versions: 2.3.x - 3.5.x
* Java versions: 8 - 20

See `official documentation <https://dev.mysql.com/doc/relnotes/connector-j/en/news-8-0-33.html>`_.

Installing PySpark
------------------

To use MySQL connector you should have PySpark installed (or injected to ``sys.path``)
BEFORE creating the connector instance.

See :ref:`install-spark` installation instruction for more details.

Connecting to MySQL
-----------------------

Connection host
~~~~~~~~~~~~~~~

It is possible to connect to MySQL by using either DNS name of host or it's IP address.

If you're using MySQL cluster, it is currently possible to connect only to **one specific node**.
Connecting to multiple nodes to perform load balancing, as well as automatic failover to new master/replica are not supported.

Connection port
~~~~~~~~~~~~~~~

Connection is usually performed to port 3306. Port may differ for different MySQL instances.
Please ask your MySQL administrator to provide required information.

Required grants
~~~~~~~~~~~~~~~

Ask your MySQL cluster administrator to set following grants for a user,
used for creating a connection:

.. tabs::

.. code-tab:: sql Read + Write

-- allow external tables in the same schema as target table
GRANT CREATE ON myschema.* TO username@'192.168.1.%';

-- allow read & write access to specific table
GRANT SELECT, INSERT ON myschema.mytable TO username@'192.168.1.%';

.. code-tab:: sql Read only

-- allow read access to specific table
GRANT SELECT ON myschema.mytable TO username@'192.168.1.%';

In example above ``'192.168.1.%''`` is a network subnet ``192.168.1.0 - 192.168.1.255``
where Spark driver and executors are running. To allow connecting user from any IP, use ``'%'`` (not secure!).

More details can be found in `official documentation <https://dev.mysql.com/doc/refman/en/grant.html>`_.
74 changes: 66 additions & 8 deletions docs/connection/db_connection/mysql/read.rst
Original file line number Diff line number Diff line change
@@ -1,18 +1,76 @@
.. _mysql-read:

Reading from MySQL
==================
Reading from MySQL using ``DBReader``
=====================================

There are 2 ways of distributed data reading from MySQL:
.. warning::

* Using :obj:`DBReader <onetl.db.db_reader.db_reader.DBReader>` with different :ref:`strategy`
* Using :obj:`MySQL.sql <onetl.connection.db_connection.mysql.connection.MySQL.sql>`
Please take into account :ref:`mysql-types`

Both methods accept :obj:`JDBCReadOptions <onetl.connection.db_connection.jdbc.options.JDBCReadOptions>`
:obj:`DBReader <onetl.db.db_reader.db_reader.DBReader>` supports :ref:`strategy` for incremental data reading,
but does not support custom queries, like JOINs.

.. currentmodule:: onetl.connection.db_connection.mysql.connection
Supported DBReader features
---------------------------

.. automethod:: MySQL.sql
* ✅︎ ``columns``
* ✅︎ ``where``
* ✅︎ ``hwm``, supported strategies:
* * ✅︎ :ref:`snapshot-strategy`
* * ✅︎ :ref:`incremental-strategy`
* * ✅︎ :ref:`snapshot-batch-strategy`
* * ✅︎ :ref:`incremental-batch-strategy`
* ✅︎ ``hint`` (see `official documentation <https://dev.mysql.com/doc/refman/en/optimizer-hints.html>`_)
* ❌ ``df_schema``
* ✅︎ ``options`` (see :obj:`JDBCReadOptions <onetl.connection.db_connection.jdbc.options.JDBCReadOptions>`)

Examples
--------

Snapshot strategy:

.. code-block:: python
from onetl.connection import MySQL
from onetl.db import DBReader
mysql = MySQL(...)
reader = DBReader(
connection=mysql,
source="schema.table",
columns=["id", "key", "CAST(value AS text) value", "updated_dt"],
where="key = 'something'",
hint="SKIP_SCAN(schema.table key_index)",
options=MySQL.ReadOptions(partition_column="id", num_partitions=10),
)
df = reader.run()
Incremental strategy:

.. code-block:: python
from onetl.connection import MySQL
from onetl.db import DBReader
from onetl.strategy import IncrementalStrategy
mysql = MySQL(...)
reader = DBReader(
connection=mysql,
source="schema.table",
columns=["id", "key", "CAST(value AS text) value", "updated_dt"],
where="key = 'something'",
hint="SKIP_SCAN(schema.table key_index)",
hwm=DBReader.AutoDetectHWM(name="mysql_hwm", expression="updated_dt"),
options=MySQL.ReadOptions(partition_column="id", num_partitions=10),
)
with IncrementalStrategy():
df = reader.run()
Read options
------------

.. currentmodule:: onetl.connection.db_connection.jdbc_connection.options

Expand Down
55 changes: 55 additions & 0 deletions docs/connection/db_connection/mysql/sql.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
.. _clickhouse-sql:

Reading from Clickhouse using ``Clickhouse.sql``
================================================

.. warning::

Please take into account :ref:`clickhouse-types`

:obj:`Clickhouse.sql <onetl.connection.db_connection.clickhouse.connection.Clickhouse.sql>` allows passing custom SQL query,
but does not support incremental strategies.

Method also accepts :obj:`JDBCReadOptions <onetl.connection.db_connection.jdbc.options.JDBCReadOptions>`.

Syntax support
--------------

Only queries with the following syntax are supported:

* ``SELECT ...``
* ``WITH alias AS (...) SELECT ...``

Queries like ``SHOW ...`` are not supported.

This method also does not support multiple queries in the same operation, like ``SET ...; SELECT ...;``.

Examples
--------

.. code-block:: python
from onetl.connection import Clickhouse
clickhouse = Clickhouse(...)
df = clickhouse.sql(
"""
SELECT
id,
key,
CAST(value AS text) value,
updated_at
FROM
some.mytable
WHERE
key = 'something'
""",
options=Clickhouse.ReadOptions(partition_column="id", num_partitions=10),
)
References
----------

.. currentmodule:: onetl.connection.db_connection.clickhouse.connection

.. automethod:: Clickhouse.sql
Loading

0 comments on commit b5a577a

Please sign in to comment.