diff --git a/metadata-ingestion/README.md b/metadata-ingestion/README.md index 34187900ef6d8..81c8cb8afa87f 100644 --- a/metadata-ingestion/README.md +++ b/metadata-ingestion/README.md @@ -979,8 +979,11 @@ If you're simply looking to run ingestion on a schedule, take a look at these sa The Airflow lineage backend is only supported in Airflow 1.10.15+ and 2.0.2+. ::: - -1. First, you must configure an Airflow hook for Datahub. We support both a Datahub REST hook and a Kafka-based hook, but you only need one. +1. You need to install the required dependency in your airflow. See https://registry.astronomer.io/providers/datahub/modules/datahublineagebackend + ```shell + pip install acryl-datahub[airflow] + ``` +2. You must configure an Airflow hook for Datahub. We support both a Datahub REST hook and a Kafka-based hook, but you only need one. ```shell # For REST-based: @@ -989,7 +992,7 @@ The Airflow lineage backend is only supported in Airflow 1.10.15+ and 2.0.2+. airflow connections add --conn-type 'datahub_kafka' 'datahub_kafka_default' --conn-host 'broker:9092' --conn-extra '{}' ``` -2. Add the following lines to your `airflow.cfg` file. +3. Add the following lines to your `airflow.cfg` file. ```ini [lineage] backend = datahub_provider.lineage.datahub.DatahubLineageBackend @@ -1005,8 +1008,8 @@ The Airflow lineage backend is only supported in Airflow 1.10.15+ and 2.0.2+. - `capture_ownership_info` (defaults to true): If true, the owners field of the DAG will be capture as a DataHub corpuser. - `capture_tags_info` (defaults to true): If true, the tags field of the DAG will be captured as DataHub tags. - `graceful_exceptions` (defaults to true): If set to true, most runtime errors in the lineage backend will be suppressed and will not cause the overall task to fail. Note that configuration issues will still throw exceptions. -3. Configure `inlets` and `outlets` for your Airflow operators. For reference, look at the sample DAG in [`lineage_backend_demo.py`](./src/datahub_provider/example_dags/lineage_backend_demo.py), or reference [`lineage_backend_taskflow_demo.py`](./src/datahub_provider/example_dags/lineage_backend_taskflow_demo.py) if you're using the [TaskFlow API](https://airflow.apache.org/docs/apache-airflow/stable/concepts/taskflow.html). -4. [optional] Learn more about [Airflow lineage](https://airflow.apache.org/docs/apache-airflow/stable/lineage.html), including shorthand notation and some automation. +4. Configure `inlets` and `outlets` for your Airflow operators. For reference, look at the sample DAG in [`lineage_backend_demo.py`](./src/datahub_provider/example_dags/lineage_backend_demo.py), or reference [`lineage_backend_taskflow_demo.py`](./src/datahub_provider/example_dags/lineage_backend_taskflow_demo.py) if you're using the [TaskFlow API](https://airflow.apache.org/docs/apache-airflow/stable/concepts/taskflow.html). +5. [optional] Learn more about [Airflow lineage](https://airflow.apache.org/docs/apache-airflow/stable/lineage.html), including shorthand notation and some automation. ### Emitting lineage via a separate operator