Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error get sample data in Oracle when columns in quotes #17660

Open
AlexeySmirnov74 opened this issue Sep 2, 2024 · 0 comments
Open

Error get sample data in Oracle when columns in quotes #17660

AlexeySmirnov74 opened this issue Sep 2, 2024 · 0 comments
Assignees
Labels
P0 Highest priority profiler

Comments

@AlexeySmirnov74
Copy link

Affected module
sqlalchemy

Describe the bug
Error get sample data in Table Oracle 19 when columns in quotes.

To Reproduce

  1. Create a table TESTTABLE with quotes in columns:
CREATE TABLE DQ."TESTTABLE" ("col1" NUMBER,"col2" NVARCHAR2(255) COLLATE "USING_NLS_COMP")
INSERT INTO DQ."TESTTABLE" ("col1","col2") VALUES (1,'testvalue')
  1. Create a table TESTTABLE2 without quotes in columns:
CREATE TABLE "DQ"."TESTTABLE2" (col1 NUMBER,col2 NVARCHAR2(255) COLLATE "USING_NLS_COMP") 
INSERT INTO DQ."TESTTABLE2" (col1,col2) VALUES (1,'testvalue')
  1. Go to Profiler and scan table ^TESTTABLE., ^TESTTABLE2.

image
image

Expected behavior

Shows sample data with column in quotes

Version:

  • OS: Ubuntu 22.04.4 LTS
  • Docker version 27.0.3
  • Python version: 3.10
  • OpenMetadata version: 1.4.7.0
  • OpenMetadata Ingestion package version: 1.4.7
  • Oracle: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0
  • scheme: oracle+cx_oracle
  • instantClientDirectory: /usr/lib/oracle/19.23
  • encoding: UTF-8

Additional context

1. Error Log in TESTTABLE

[2024-08-28T07:50:30.823+0000] {profiler_interface.py:458} ERROR - TESTTABLE.col1 metric_type.value: (oracledb.exceptions.DatabaseError) ORA-00904: "COL1": invalid identifier
[SQL: /* {"app": "OpenMetadata", "version": "1.4.7.0"} */
SELECT anon_1.mean, anon_1."valuesCount", anon_1."distinctCount", anon_1.min, anon_1.max, anon_1."nullCount", anon_1.stddev, anon_1.sum 
FROM (SELECT avg(col1) AS mean, count(col1) AS "valuesCount", count(DISTINCT col1) AS "distinctCount", MIN(col1) AS min, MAX(col1) AS max, SUM(CAST(CASE WHEN (col1 IS NULL) THEN :param_1 ELSE :param_2 END AS NUMBER)) AS "nullCount", STDDEV_POP(col1) AS stddev, SUM(CAST(col1 AS NUMBER)) AS sum 
FROM dq."TESTTABLE") anon_1 
WHERE ROWNUM <= 1]
[parameters: {'param_1': 1, 'param_2': 0}]
(Background on this error at: https://sqlalche.me/e/14/4xp6)
[2024-08-28T07:50:30.828+0000] {profiler_interface.py:458} ERROR - TESTTABLE.col2 metric_type.value: (oracledb.exceptions.DatabaseError) ORA-00904: "COL2": invalid identifier
[SQL: /* {"app": "OpenMetadata", "version": "1.4.7.0"} */
SELECT anon_1.mean, anon_1."valuesCount", anon_1."distinctCount", anon_1.min, anon_1.max, anon_1."nullCount", anon_1.stddev, anon_1.sum 
FROM (SELECT avg(LENGTH(col2)) AS mean, count(col2) AS "valuesCount", count(DISTINCT col2) AS "distinctCount", MIN(LENGTH(col2)) AS min, MAX(LENGTH(col2)) AS max, SUM(CAST(CASE WHEN (col2 IS NULL) THEN :param_1 ELSE :param_2 END AS NUMBER)) AS "nullCount", STDDEV_POP(LENGTH(col2)) AS stddev, SUM(CAST(LENGTH(col2) AS NUMBER)) AS sum 
FROM dq."TESTTABLE") anon_1 
WHERE ROWNUM <= 1]
[parameters: {'param_1': 1, 'param_2': 0}]
(Background on this error at: https://sqlalche.me/e/14/4xp6)
[2024-08-28T07:50:30.841+0000] {profiler_interface.py:59} DEBUG - Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
  File "/home/airflow/.local/lib/python3.10/site-packages/oracledb/cursor.py", line 383, in execute
    impl.execute(self)
  File "src/oracledb/impl/thick/cursor.pyx", line 243, in oracledb.thick_impl.ThickCursorImpl.execute
  File "src/oracledb/impl/thick/utils.pyx", line 431, in oracledb.thick_impl._raise_from_odpi
  File "src/oracledb/impl/thick/utils.pyx", line 421, in oracledb.thick_impl._raise_from_info
oracledb.exceptions.DatabaseError: ORA-00904: "COL1": invalid identifier
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/profiler/interface/sqlalchemy/profiler_interface.py", line 274, in _compute_query_metrics
    row = runner.select_first_from_query(metric_query)
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/profiler/processor/runner.py", line 132, in select_first_from_query
    return query.first()
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 2824, in first
    return self.limit(1)._iter().first()
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 2916, in _iter
    result = self.session.execute(
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1717, in execute
    result = conn._execute_20(statement, params or {}, execution_options)
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1710, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1577, in _execute_clauseelement
    ret = self._execute_context(
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1953, in _execute_context
    self._handle_dbapi_exception(
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2134, in _handle_dbapi_exception
    util.raise_(
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
  File "/home/airflow/.local/lib/python3.10/site-packages/oracledb/cursor.py", line 383, in execute
    impl.execute(self)
  File "src/oracledb/impl/thick/cursor.pyx", line 243, in oracledb.thick_impl.ThickCursorImpl.execute
  File "src/oracledb/impl/thick/utils.pyx", line 431, in oracledb.thick_impl._raise_from_odpi
  File "src/oracledb/impl/thick/utils.pyx", line 421, in oracledb.thick_impl._raise_from_info
sqlalchemy.exc.DatabaseError: (oracledb.exceptions.DatabaseError) ORA-00904: "COL1": invalid identifier
[SQL: /* {"app": "OpenMetadata", "version": "1.4.7.0"} */
SELECT anon_1."uniqueCount" 
FROM (SELECT count(*) AS "uniqueCount" 
FROM (SELECT count(col1) AS count_1 
FROM dq."TESTTABLE" GROUP BY col1 
HAVING count(col1) = :count_2) only_once) anon_1 
WHERE ROWNUM <= 1]
[parameters: {'count_2': 1}]
(Background on this error at: https://sqlalche.me/e/14/4xp6)
[2024-08-28T07:50:30.841+0000] {profiler_interface.py:60} WARNING - Error trying to compute profile for TESTTABLE.col1: (oracledb.exceptions.DatabaseError) ORA-00904: "COL1": invalid identifier
[SQL: /* {"app": "OpenMetadata", "version": "1.4.7.0"} */
SELECT anon_1."uniqueCount" 
FROM (SELECT count(*) AS "uniqueCount" 
FROM (SELECT count(col1) AS count_1 
FROM dq."TESTTABLE" GROUP BY col1 
HAVING count(col1) = :count_2) only_once) anon_1 
WHERE ROWNUM <= 1]
[parameters: {'count_2': 1}]
(Background on this error at: https://sqlalche.me/e/14/4xp6)
[2024-08-28T07:50:30.842+0000] {status.py:76} WARNING - TESTTABLE.col2 metric_type.value: (oracledb.exceptions.DatabaseError) ORA-00904: "COL2": invalid identifier
[SQL: /* {"app": "OpenMetadata", "version": "1.4.7.0"} */
SELECT anon_1.mean, anon_1."valuesCount", anon_1."distinctCount", anon_1.min, anon_1.max, anon_1."nullCount", anon_1.stddev, anon_1.sum 
FROM (SELECT avg(LENGTH(col2)) AS mean, count(col2) AS "valuesCount", count(DISTINCT col2) AS "distinctCount", MIN(LENGTH(col2)) AS min, MAX(LENGTH(col2)) AS max, SUM(CAST(CASE WHEN (col2 IS NULL) THEN :param_1 ELSE :param_2 END AS NUMBER)) AS "nullCount", STDDEV_POP(LENGTH(col2)) AS stddev, SUM(CAST(LENGTH(col2) AS NUMBER)) AS sum 
FROM dq."TESTTABLE") anon_1 
WHERE ROWNUM <= 1]
[parameters: {'param_1': 1, 'param_2': 0}]
(Background on this error at: https://sqlalche.me/e/14/4xp6)
[2024-08-28T07:50:30.842+0000] {status.py:77} DEBUG - Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(
  File "/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
  File "/home/airflow/.local/lib/python3.10/site-packages/oracledb/cursor.py", line 383, in execute
    impl.execute(self)
  File "src/oracledb/impl/thick/cursor.pyx", line 243, in oracledb.thick_impl.ThickCursorImpl.execute
  File "src/oracledb/impl/thick/utils.pyx", line 431, in oracledb.thick_impl._raise_from_odpi
  File "src/oracledb/impl/thick/utils.pyx", line 421, in oracledb.thick_impl._raise_from_info
oracledb.exceptions.DatabaseError: ORA-00904: "COL2": invalid identifier
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/profiler/interface/sqlalchemy/profiler_interface.py", line 446, in compute_metrics_in_thread
    row = self._get_metric_fn[metric_func.metric_type.value](
  File "/home/airflow/.local/lib/python3.10/site-packages/metadata/profiler/interface/sqlalchemy/profiler_interface.py", line 236, in _compute_static_metrics

2. OK Log in TESTTABLE2

[2024-08-28T00:08:34.982+0000] {profiler_interface.py:426} DEBUG - Running profiler for TESTTABLE2 on thread <Thread(ThreadPoolExecutor-1_1, started 140331118184192)>
[2024-08-28T00:08:35.052+0000] {core.py:276} DEBUG - Running post Profiler...
[2024-08-28T00:08:35.052+0000] {core.py:289} DEBUG - Running composed metric distinctProportion for col1
[2024-08-28T00:08:35.053+0000] {core.py:289} DEBUG - Running composed metric nullProportion for col1
[2024-08-28T00:08:35.053+0000] {core.py:289} DEBUG - Running composed metric uniqueProportion for col1
[2024-08-28T00:08:35.053+0000] {core.py:289} DEBUG - Running composed metric interQuartileRange for col1
[2024-08-28T00:08:35.053+0000] {core.py:289} DEBUG - Running composed metric nonParametricSkew for col1
[2024-08-28T00:08:35.054+0000] {core.py:305} DEBUG - Running distribution metrics...
[2024-08-28T00:08:35.054+0000] {core.py:315} DEBUG - Running hybrid metric histogram for col1
[2024-08-28T00:08:35.054+0000] {core.py:276} DEBUG - Running post Profiler...
[2024-08-28T00:08:35.054+0000] {core.py:289} DEBUG - Running composed metric distinctProportion for col2
[2024-08-28T00:08:35.054+0000] {core.py:289} DEBUG - Running composed metric nullProportion for col2
[2024-08-28T00:08:35.055+0000] {core.py:289} DEBUG - Running composed metric uniqueProportion for col2
[2024-08-28T00:08:35.055+0000] {core.py:289} DEBUG - Running composed metric interQuartileRange for col2
[2024-08-28T00:08:35.055+0000] {core.py:289} DEBUG - Running composed metric nonParametricSkew for col2
[2024-08-28T00:08:35.055+0000] {core.py:305} DEBUG - Running distribution metrics...
[2024-08-28T00:08:35.055+0000] {core.py:315} DEBUG - Running hybrid metric histogram for col2
[2024-08-28T00:08:35.055+0000] {core.py:518} DEBUG - Fetching sample data for oracle-il.IL.dq.TESTTABLE2...
[2024-08-28T00:08:35.245+0000] {metadata_rest.py:534} DEBUG - Successfully ingested profile metrics for oracle-il.IL.dq.TESTTABLE2
[2024-08-28T00:08:35.267+0000] {metadata_rest.py:550} DEBUG - Successfully ingested sample data for oracle-il.IL.dq.TESTTABLE2
@harshach harshach moved this to Integration in Release 1.7.0 Dec 14, 2024
@harshach harshach assigned TeddyCr and unassigned Siddhanttimeline Dec 14, 2024
@harshach harshach moved this from Integration to Data Observability in Release 1.7.0 Dec 14, 2024
@ShaileshParmar11 ShaileshParmar11 added the P0 Highest priority label Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P0 Highest priority profiler
Projects
Status: Data Observability
Development

No branches or pull requests

4 participants