Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_via_dbt_ls() requires dbt is installed in base airflow environment #670

Closed
dylanharper-qz opened this issue Nov 13, 2023 · 6 comments
Closed
Labels
area:dependencies Related to dependencies, like Python packages, library versions, etc area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc dbt:list Primarily related to dbt list command or functionality execution:virtualenv Related to Virtualenv execution environment parsing:dbt_ls Issues, questions, or features related to dbt_ls parsing priority:high High priority issues are blocking or critical issues without a workaround and large impact

Comments

@dylanharper-qz
Copy link
Contributor

Apologies if I'm misunderstanding anything here, we're big fans of Cosmos at Quizlet.

In version 1.2.2, we are seeing this error when using LoadMode.DBT_LS:

Broken DAG: [/usr/local/airflow/dags/my_dbt_hourly.py] Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/cosmos/dbt/graph.py", line 184, in load
    load_method[method]()
  File "/usr/local/lib/python3.11/site-packages/cosmos/dbt/graph.py", line 233, in load_via_dbt_ls
    raise CosmosLoadDbtException(f"Unable to find the dbt executable: {self.dbt_cmd}")
cosmos.dbt.graph.CosmosLoadDbtException: Unable to find the dbt executable: dbt

If we add dbt to our default airflow environment, the dag works as expected.

It is required to install dbt outside of the venv described in the docs?

This code seems to be checking for dbt in the airflow environment:

if not shutil.which(self.dbt_cmd):
raise CosmosLoadDbtException(f"Unable to find the dbt executable: {self.dbt_cmd}")

Copy link

dosubot bot commented Nov 13, 2023

🤖

Hello @dylanharper-qz,

Nice to meet you! I'm Dosu, an AI bot, here to assist you with your questions, bugs, and help you with your contributions to the Cosmos project. While we wait for a human maintainer, I'll do my best to provide you with some immediate help.

I see you've posted an issue about LoadMode.DBT_LS in Cosmos v1.2.2, and I'm currently looking into it. I'll provide a full answer to your issue shortly. Let's solve this together!

@dosubot dosubot bot added area:dependencies Related to dependencies, like Python packages, library versions, etc area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc dbt:list Primarily related to dbt list command or functionality execution:virtualenv Related to Virtualenv execution environment parsing:dbt_ls Issues, questions, or features related to dbt_ls parsing priority:high High priority issues are blocking or critical issues without a workaround and large impact labels Nov 13, 2023
Copy link

dosubot bot commented Nov 13, 2023

🤖

Thank you for bringing this to our attention. The error you're encountering is due to the load_via_dbt_ls() method not being able to find the dbt executable in your environment. This method is used to load a dbt project into a DbtGraph instance using the dbt ls command, which is considered the most accurate way of loading and filtering dbt projects as it uses the dbt command line for both parsing and filtering the nodes.

The load_via_dbt_ls() method checks if the necessary configurations are available and if the dbt executable is available in the system. If not, it raises a CosmosLoadDbtException. This is likely the exception you're seeing.

A similar issue was encountered in the past (issue #401) and was resolved by a pull request (PR #402). The changes made in the pull request involve the addition of a condition to check if the execution mode is ExecutionMode.LOCAL and if a profile configuration is available. If these conditions are met, the method tries to load the project using dbt ls and falls back to the custom parser in case of a FileNotFoundError.

However, since you're still encountering this issue in version 1.2.2, it's possible that your setup doesn't meet these conditions or there might be a different underlying issue.

To help us better understand and resolve your issue, could you please provide more information about your setup? Specifically, we would like to know:

  1. What is your execution mode (ExecutionMode.LOCAL or otherwise)?
  2. Do you have a profile configuration available?
  3. Is dbt installed in your base Airflow environment or in a virtual environment?

This information will help us determine if the issue is with the load_via_dbt_ls() method or with your specific setup. Thank you for your patience and cooperation.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dylanharper-qz
Copy link
Contributor Author

dylanharper-qz commented Nov 13, 2023

@dosu-bot, thank you.

  1. execution mode is ExecutionMode.LOCAL
  2. profile configuration is shared below
  3. dbt is installed in a virtual environment
from cosmos import DbtTaskGroup, ProjectConfig, ProfileConfig, ExecutionConfig, RenderConfig
from cosmos.constants import LoadMode, TestBehavior
from cosmos.profiles import GoogleCloudServiceAccountDictProfileMapping
from pendulum import datetime
import os


CONNECTION_ID = "dbt_conn"
DBT_PROJECT_PATH = f"{os.environ['AIRFLOW_HOME']}/dbt/quizlet"
DBT_EXECUTABLE_PATH = f"{os.environ['AIRFLOW_HOME']}/dbt_venv/bin/dbt"

profile_config = ProfileConfig(
    profile_name="default",
    target_name="prod",
    profile_mapping=GoogleCloudServiceAccountDictProfileMapping(
        conn_id=CONNECTION_ID,
        profile_args={
            "project": "dbt-project",
            "dataset": "dbt_prod",
        },
    ),
)

execution_config = ExecutionConfig(
    dbt_executable_path=DBT_EXECUTABLE_PATH,
)


@dag(
    start_date=datetime(2023, 8, 1),
    schedule=None,
    catchup=False,
)
def my_dbt_hourly():
    transform_data = DbtTaskGroup(
        group_id="transform_data",
        project_config=ProjectConfig(
            dbt_project_path=DBT_PROJECT_PATH,
        ),
        profile_config=profile_config,
        execution_config=execution_config,
        operator_args={
            "install_deps": True,
            "vars": {
                "execution_date": "{{ data_interval_start.strftime('%Y-%m-%d') }}",
            },
        },
        render_config=RenderConfig(
            load_method=LoadMode.DBT_LS,
            test_behavior=TestBehavior.NONE,
            select=["+tag:hourly"],
            exclude=["config.materialized:model", "tag:manual_training_data"],
        ),
    )

my_dbt_hourly()

Copy link

dosubot bot commented Nov 13, 2023

🤖

Based on the information you've provided and the context from the Cosmos repository, it seems like the issue you're encountering is related to how Cosmos is trying to find the dbt executable. The get_system_dbt function in cosmos/dbt/executable.py uses shutil.which("dbt") to find the dbt executable. This function returns the path of the dbt executable if it's found in the system's PATH. If it's not found, it returns "dbt".

If your dbt executable is in a virtual environment and not in the system's PATH, shutil.which("dbt") won't be able to find it, and the function will return "dbt". This could be why Cosmos is unable to find the dbt executable.

To resolve this issue, you could modify the get_system_dbt function to also look for the dbt executable in your virtual environment. Here's an example of how you could do this:

import os
import shutil

def get_system_dbt() -> str:
    """
    Tries to identify which is the path to the dbt executable, return "dbt" otherwise.
    """
    # Try to find the dbt executable in the system's PATH
    dbt_path = shutil.which("dbt")

    # If the dbt executable wasn't found in the system's PATH, try to find it in the virtual environment
    if dbt_path is None:
        venv_path = os.environ.get('VIRTUAL_ENV')
        if venv_path is not None:
            dbt_path = os.path.join(venv_path, 'bin', 'dbt')

    # If the dbt executable wasn't found in the virtual environment either, return "dbt"
    return dbt_path or "dbt"

This modified function first tries to find the dbt executable in the system's PATH. If it doesn't find it there, it tries to find it in the virtual environment. If it doesn't find it in either of these places, it returns "dbt".

In the context shared, you can also set the dbt_executable_path in both RenderConfig and ExecutionConfig classes in cosmos/config.py to the path of the dbt executable in your virtual environment. This should override the default behavior of get_system_dbt and allow Cosmos to find the dbt executable.

I hope this helps! Let me know if you have any other questions.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dylanharper-qz
Copy link
Contributor Author

@dosu-bot I have confirmed that setting dbt_executable_path in the RenderConfig resolved this issue.

Thank you, you have been very helpful.

Copy link

dosubot bot commented Nov 14, 2023

🤖

Based on the information you've provided and the similar issues I found in the Cosmos repository, setting dbt_executable_path in the RenderConfig is indeed a correct solution to resolve the error Unable to find the dbt executable: dbt when using LoadMode.DBT_LS in Cosmos v1.2.2.

The RenderConfig class in cosmos/config.py has a method validate_dbt_command which checks if the dbt_executable_path is valid and if the dbt executable can be found at that path. If the dbt executable cannot be found, it raises a CosmosConfigException with the error message Unable to find the dbt executable.

In the context shared, the get_system_dbt function in cosmos/dbt/executable.py tries to identify the path to the dbt executable and returns "dbt" if it can't find it. This function can be used to get the system dbt executable path, which can then be set as the dbt_executable_path in the RenderConfig.

Additionally, the RenderConfig class also has a fallback mechanism. If the dbt_executable_path is not found, it tries to use the fallback_cmd as the dbt executable path. If the fallback_cmd is also not found, it raises a CosmosConfigException.

Here is the relevant code:

def validate_dbt_command(self, fallback_cmd: str | Path = "") -> None:
    """
    When using LoadMode.DBT_LS, the dbt executable path is necessary for rendering.

    Validates that the original dbt command works, if not, attempt to use the fallback_dbt_cmd.
    If neither works, raise an exception.

    The fallback behaviour is necessary for Cosmos < 1.2.2 backwards compatibility.
    """
    if not shutil.which(self.dbt_executable_path):
        if isinstance(fallback_cmd, Path):
            fallback_cmd = fallback_cmd.as_posix()

        if fallback_cmd and shutil.which(fallback_cmd):
            self.dbt_executable_path = fallback_cmd
        else:
            raise CosmosConfigException(
                "Unable to find the dbt executable, attempted: "
                f"<{self.dbt_executable_path}>" + (f" and <{fallback_cmd}>." if fallback_cmd else ".")
            )

So, if you are getting the error Unable to find the dbt executable: dbt, you can resolve it by setting dbt_executable_path in the RenderConfig to the correct path of your dbt executable. If the path is not found, you can also provide a fallback_cmd as a backup.

For more information, you can refer to the following sources:

I hope this helps! If you have any other questions, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dependencies Related to dependencies, like Python packages, library versions, etc area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc dbt:list Primarily related to dbt list command or functionality execution:virtualenv Related to Virtualenv execution environment parsing:dbt_ls Issues, questions, or features related to dbt_ls parsing priority:high High priority issues are blocking or critical issues without a workaround and large impact
Projects
None yet
Development

No branches or pull requests

1 participant