Functions Connection Data Types Testing
Uses the SQLAlchemy Inspector
class to:
- Extract metadata from database dialects supported by SQLAlchemy
- Convert the extracted output into a mojap
Metadata
object
Currently installs SQLAlchemy v1.4 but compatible with SQLAlchemy v2.0
-
convert_to_mojap_type() converts a SQLAlchemy data type into a mojap
Metadata
data type -
generate_to_meta() extracts the metadata for a given table and schema name and returns a
Metadata
object -
generate_to_meta_list() returns a list of
Metadata
objects for all the tables in a given schema sorted by table name
Inspector
comes with many functions to extract metadata such as get_schema_names()
.
These functions are not recreated in order to limit boiler plate code and maintenance. You can use the inpector
object which is instantiated by SQLAlchemyConverter
instead of creating your own one.
You will need to provide a SQLAlchemy database engine or connection for a given dialect and database when instantiating a SQLAlchemyConverter
object for example:
```
from sqlalchemy import create_engine
from mojap_metadata.converters.sqlalchemy_converter import SQLAlchemyConverter
engine = create_engine("postgresql+psycopg2://scott:tiger@localhost:5432/mydatabase")
sqlc = SQLAlchemyConverter(engine)
```
See Engine Configuration for more details and how to configure for the different database dialects.
"oracledb" has replaced "oracle_cx" drivers
To create an oracle SQLAlchemy.engine:
```
import sys
import oracledb
oracledb.version = "8.3.0"
sys.modules["cx_Oracle"] = oracledb
engine = create_engine("oracle://scott:tiger@127.0.0.1:1521/sidname")
```
SQLAlchemy converts specific dialects into a common type variant.
Therefore, SQLAlchemy has its own Type definitions sqlalchemy.sql.sqltypes
.
There are three categories:
- “CamelCase” datatypes are to the greatest degree possible database agnostic.
- “UPPERCASE” datatypes are always inherited from a particular “CamelCase” datatype, and always represent an exact datatype.
- Backend-specific “UPPERCASE” datatypes are either fully specific to those databases, or add additional arguments that are specific to those databases.
There is a class private method called _get_dtype
which infers the corresponding mojap data type by comparing with the SQlAlchemy instance type. Order is important: the mojap data type will correspond to the instance type that first gets matched.
This is approximate and the mapping might need to be modified as less familiar data types are encountered.
In the oracle documentation it conflates int, smallint, numeric, number and decimal. There some inter-operability that makes it confusing. https://www.oracletutorial.com/oracle-basics/oracle-number-data-type/
- NUMBER(precision=9, scale=0, asdecimal=False) - should return integer
- NUMBER(precision=10, scale=2, asdecimal=True) - should return decimal
SQLAlchemy v1.4 does appear to handle this. Where the 'asdecimal' flag triggers a SQLAlchemy type switch.
In the event the data-type received back from SQL-Alchemy is not found, the default return value from the private method _approx_dtype
is 'string'.
When converting the SQLAlchemy inherited metadata to mojap-metadata, we have made the choice to (by default) convert all letters to lower case. This is because Athena is case insensitive. However, many SQL databases use camel/pascal case and so the safest option to convert to is to snake case. For this we have created an option in the SQL converter called convert_to_snake
, which is False by default, but if set to true will convert camel and pascal case column names in the metadata to snake case.
The SQLAlchemyConverter
is tested against the following database dialects in test_sqlachemy_converter.py
:
- sqlite
- duckdb
- postgres
- oracle
Whilst all return a Metadata
objects with broadly the same features, there are differences. This is because whilst Inspector
provides a consistent interface, a feature may not be supported by the database or by the sqlalchemy dialect.
For example only the postgres dialect recognises the table comment. For more examples of differences have a look at the parameters passed in to test_generate_to_meta()
.
The sqlite and duckdb databases are in memory and can be tested directly.
The postgres and oracle dialects are tested in the [test-sqlalchemy.yml] GitHub action by creating service containers and specifying the relevant Docker image. It should be straightforward to add more service containers and test more dialects.
The oracle dialect is tested against the Oracle Database Express Edition Container.
To run the test locally first create a docker oracle container:
source tests/scripts/.sqlalchemy_envs.sh
sh tests/scripts/.oracle_image.sh
If you are using an M1 please follow these instructions.
The postgres dialect is tested against the Postgres Container.
To run the test locally first create a docker postgres container:
source tests/scripts/.sqlalchemy_envs.sh
sh tests/scripts/.postgres_image.sh