Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to load_file to SQLite when using sqlite_default #158

Closed
tatiana opened this issue Mar 3, 2022 · 2 comments
Closed

Unable to load_file to SQLite when using sqlite_default #158

tatiana opened this issue Mar 3, 2022 · 2 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@tatiana
Copy link
Collaborator

tatiana commented Mar 3, 2022

Version: astro 0.6.0

How to reproduce the problem (originally found in branch readme as part of the PR #157 ):

from datetime import datetime
from airflow import DAG
from astro import sql as aql
from astro.sql.table import Table

START_DATE = datetime(2000, 1, 1)

with DAG( "example_sqlite_load_transform", schedule_interval=None, start_date=START_DATE) as dag:

    imdb_movies = aql.load_file(
        path="https://raw.githubusercontent.com/astro-projects/astro/readme/tests/data/imdb.csv",
        task_id="load_csv",
        output_table=Table(
            table_name="imdb_movies", database="sqlite", conn_id="sqlite_default"
        )
    )

Error message:

[2022-03-03 14:19:58,989] {base_executor.py:82} INFO - Adding to queue: ['<TaskInstance: example_sqlite_load_transform.load_csv backfill__2022-03-03T00:00:00+00:00 [queued]>']
[2022-03-03 14:20:03,997] {taskinstance.py:1429} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=example_sqlite_load_transform
AIRFLOW_CTX_TASK_ID=load_csv
AIRFLOW_CTX_EXECUTION_DATE=2022-03-03T00:00:00+00:00
AIRFLOW_CTX_DAG_RUN_ID=backfill__2022-03-03T00:00:00+00:00
[2022-03-03 14:20:04,311] {base.py:70} INFO - Using connection to: id: sqlite_default. Host: /tmp/sqlite_default.db, Port: None, Schema: None, Login: None, Password: None, extra: {}
[2022-03-03 14:20:04,315] {base.py:70} INFO - Using connection to: id: sqlite_default. Host: /tmp/sqlite_default.db, Port: None, Schema: None, Login: None, Password: None, extra: {}
[2022-03-03 14:20:04,329] {taskinstance.py:1718} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/home/tati/.virtualenvs/astro-main/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2336, in _wrap_pool_connect
    return fn()
  File "/home/tati/.virtualenvs/astro-main/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 364, in connect
    return _ConnectionFairy._checkout(self)
  File "/home/tati/.virtualenvs/astro-main/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 778, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/home/tati/.virtualenvs/astro-main/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 495, in checkout
    rec = pool._do_get()
  File "/home/tati/.virtualenvs/astro-main/lib/python3.9/site-packages/sqlalchemy/pool/impl.py", line 241, in _do_get
    return self._create_connection()
  File "/home/tati/.virtualenvs/astro-main/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 309, in _create_connection
    return _ConnectionRecord(self)
  File "/home/tati/.virtualenvs/astro-main/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 440, in __init__
    self.__connect(first_connect_check=True)
  File "/home/tati/.virtualenvs/astro-main/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 661, in __connect
    pool.logger.debug("Error on connect(): %s", e)
  File "/home/tati/.virtualenvs/astro-main/lib/python3.9/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/home/tati/.virtualenvs/astro-main/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/home/tati/.virtualenvs/astro-main/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 656, in __connect
    connection = pool._invoke_creator(self)
  File "/home/tati/.virtualenvs/astro-main/lib/python3.9/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
    return dialect.connect(*cargs, **cparams)
  File "/home/tati/.virtualenvs/astro-main/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 508, in connect
    return self.dbapi.connect(*cargs, **cparams)
sqlite3.OperationalError: unable to open database file
@tatiana tatiana added the bug Something isn't working label Mar 3, 2022
@tatiana tatiana added this to the 0.6.1 milestone Mar 3, 2022
@tatiana tatiana self-assigned this Mar 3, 2022
@tatiana
Copy link
Collaborator Author

tatiana commented Mar 3, 2022

There seems to be an issue with apache-airflow==2.2.4, when using the sqlite_default connection:

>>> hook = SqliteHook()
>>> hook.get_uri()
'sqlite:///tmp/sqlite_default.db'

According to RFC 3986 and the SQLite documentation, the URI should be:

file:///tmp/sqlite_default.db

Airflow's SqliteHook.get_uri is sort of compatible with SQAlchemy URI representation of URIs, except it is not representing the absolute path it should - in this case.

The URI to refer to the absolute path /tmp/sqlite_default.db should be:

'sqlite:////tmp/sqlite_default.db'

I reported this issue to @kaxil, and in the meantime, I've done a workaround. Temporarily, for SQLite, we'll be using the function create_engine as opposed to getting the SQAlchemy engine from the SqliteHook.

tatiana added a commit that referenced this issue Mar 4, 2022
@tatiana
Copy link
Collaborator Author

tatiana commented Mar 4, 2022

Ready for review: #157

@tatiana tatiana mentioned this issue Mar 4, 2022
@tatiana tatiana closed this as completed Mar 4, 2022
utkarsharma2 pushed a commit that referenced this issue Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant