Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ModuleNotFoundError: No module named 'kafka' in Airflow DAG #3

Closed
yehoon17 opened this issue Jan 20, 2025 · 4 comments
Closed

ModuleNotFoundError: No module named 'kafka' in Airflow DAG #3

yehoon17 opened this issue Jan 20, 2025 · 4 comments

Comments

@yehoon17
Copy link
Owner

I encountered an issue with the Airflow DAG kafka_to_hdfs.py. The DAG is failing due to a missing Python module, kafka. The error traceback is as follows:

Broken DAG: [/opt/airflow/dags/test/kafka_to_hdfs.py]
Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/opt/airflow/dags/test/kafka_to_hdfs.py", line 4, in <module>
    from kafka import KafkaConsumer
ModuleNotFoundError: No module named 'kafka'
@yehoon17
Copy link
Owner Author

After attempting to update the Docker Compose configuration with the following:

environment: 
  - _PIP_ADDITIONAL_REQUIREMENTS="kafka-python"

I encountered the following error:

Invalid requirement: '"kafka-python"': Expected package name at the start of dependency specifier "kafka-python"

It seems the quotes around the package name are causing the issue. I'll try to remove them and re-test.

@yehoon17
Copy link
Owner Author

I removed the quotes around the package name in the Docker Compose configuration, as follows:

environment: 
  - _PIP_ADDITIONAL_REQUIREMENTS=kafka-python

After rebuilding the Airflow container, I verified that kafka-python is installed by running pip list inside the container, and it appears in the output.

However, I still encounter the following error in Airflow webserver:

Traceback (most recent call last): 
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/opt/airflow/dags/test/kafka_to_hdfs.py", line 4, in <module>
    from kafka import KafkaConsumer
ModuleNotFoundError: No module named 'kafka'

Additionally, when I enter the Airflow webserver container and run python, trying to import kafka results in:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/airflow/.local/lib/python3.12/site-packages/kafka/__init__.py", line 23, in <module>
    from kafka.consumer import KafkaConsumer
  File "/home/airflow/.local/lib/python3.12/site-packages/kafka/consumer/__init__.py", line 3, in <module>
    from kafka.consumer.group import KafkaConsumer
  File "/home/airflow/.local/lib/python3.12/site-packages/kafka/consumer/group.py", line 13, in <module>
    from kafka.consumer.fetcher import Fetcher
  File "/home/airflow/.local/lib/python3.12/site-packages/kafka/consumer/fetcher.py", line 19, in <module>
    from kafka.record import MemoryRecords
  File "/home/airflow/.local/lib/python3.12/site-packages/kafka/record/__init__.py", line 1, in <module>
    from kafka.record.memory_records import MemoryRecords, MemoryRecordsBuilder
  File "/home/airflow/.local/lib/python3.12/site-packages/kafka/record/memory_records.py", line 27, in <module>
    from kafka.record.legacy_records import LegacyRecordBatch, LegacyRecordBatchBuilder
  File "/home/airflow/.local/lib/python3.12/site-packages/kafka/record/legacy_records.py", line 50, in <module>
    from kafka.codec import (
  File "/home/airflow/.local/lib/python3.12/site-packages/kafka/codec.py", line 9, in <module>
    from kafka.vendor.six.moves import range
ModuleNotFoundError: No module named 'kafka.vendor.six.moves'

@yehoon17
Copy link
Owner Author

After further research, I found that this issue is related to Python 3.12 compatibility. Specifically, the error is due to the kafka-python package not being fully compatible with Python 3.12. I came across the following resources that explain this issue:

To resolve this, the suggested solution is to either downgrade to Python 3.11 or install an alternative package called kafka-python-ng.

To implement this solution, I updated the Docker Compose configuration to use the kafka-python-ng package instead:

environment: 
  - _PIP_ADDITIONAL_REQUIREMENTS=kafka-python-ng

I will rebuild the Docker container and verify if this resolves the issue.

@yehoon17
Copy link
Owner Author

I encountered the following error in the Airflow webserver:

Traceback (most recent call last): 
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/opt/airflow/dags/test/kafka_to_hdfs.py", line 4, in <module>
    from kafka import KafkaConsumer
ModuleNotFoundError: No module named 'kafka'

However, when I entered the Airflow webserver container and ran python, importing kafka worked fine. This led me to discover that the kafka-python-ng package was installed only in the webserver container and not in the scheduler container.

To fix this, I updated the Airflow scheduler container to include the kafka-python-ng package by adding the following to the Docker Compose configuration:

environment: 
  - _PIP_ADDITIONAL_REQUIREMENTS=kafka-python-ng

After rebuilding the scheduler container with this change, the issue was resolved, and the DAG is now working properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant