Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingest): add salesforce connector #5104

Merged
Merged
6 changes: 6 additions & 0 deletions datahub-web-react/src/images/logo-salesforce.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
30 changes: 30 additions & 0 deletions metadata-ingestion/docs/sources/salesforce/salesforce.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
### Prerequisites

In order to ingest metadata from Salesforce, you will need:

- Salesforce username, password, [security token](https://developer.Salesforce.com/docs/atlas.en-us.api.meta/api/sforce_api_concepts_security.htm) OR
- Salesforce instance url and access token/session id (suitable for one-shot ingestion only, as access token typically expires after 2 hours of inactivity)

## Integration Details
This plugin extracts Salesforce Standard and Custom Objects and their details (fields, record count, etc) from a Salesforce instance.
Python library [simple-salesforce](https://pypi.org/project/simple-salesforce/) is used for authenticating and calling [Salesforce REST API](https://developer.Salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/intro_what_is_rest_api.htm) to retrive details from Salesforce instance.

### REST API Resources used in this integration
- [Versions](https://developer.Salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/resources_versions.htm)
- [Tooling API Query](https://developer.salesforce.com/docs/atlas.en-us.api_tooling.meta/api_tooling/intro_rest_resources.htm) on objects EntityDefinition, EntityParticle, CustomObject, CustomField
- [Record Count](https://developer.Salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/resources_record_count.htm)

### Concept Mapping

This ingestion source maps the following Source System Concepts to DataHub Concepts:

| Source Concept | DataHub Concept | Notes |
| -- | -- | -- |
| `Salesforce` | [Data Platform](../../metamodel/entities/dataPlatform.md) | |
|Standard Object | [Dataset](../../metamodel/entities/dataset.md) | subtype "Standard Object" |
|Custom Object | [Dataset](../../metamodel/entities/dataset.md) | subtype "Custom Object" |

### Caveats
- This connector has only been tested with Salesforce Developer Edition.
- This connector only supports table level profiling (Row and Column counts) as of now. Row counts are approximate as returned by [Salesforce RecordCount REST API](https://developer.Salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/resources_record_count.htm).
- This integration does not support ingesting Salesforce [External Objects](https://developer.Salesforce.com/docs/atlas.en-us.object_reference.meta/object_reference/sforce_api_objects_external_objects.htm)
25 changes: 25 additions & 0 deletions metadata-ingestion/docs/sources/salesforce/salesforce_recipe.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
pipeline_name: my_salesforce_pipeline
source:
type: "salesforce"
config:
instance_url: "https://mydomain.my.salesforce.com/"
username: user@company
password: password_for_user
security_token: security_token_for_user
platform_instance: mydomain-dev-ed
domain:
sales:
allow:
- "Opportunity$"
- "Lead$"

object_pattern:
allow:
- "Account$"
- "Opportunity$"
- "Lead$"

sink:
type: "datahub-rest"
config:
server: "http://localhost:8080"
3 changes: 3 additions & 0 deletions metadata-ingestion/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -260,6 +260,7 @@ def get_long_description():
"sqllineage==1.3.5",
},
"sagemaker": aws_common,
"salesforce":{"simple-salesforce"},
"snowflake": snowflake_common,
"snowflake-usage": snowflake_common
| usage_common
Expand Down Expand Up @@ -364,6 +365,7 @@ def get_long_description():
"starburst-trino-usage",
"powerbi",
"vertica",
"salesforce"
# airflow is added below
]
for dependency in plugins[plugin]
Expand Down Expand Up @@ -507,6 +509,7 @@ def get_long_description():
"vertica = datahub.ingestion.source.sql.vertica:VerticaSource",
"presto-on-hive = datahub.ingestion.source.sql.presto_on_hive:PrestoOnHiveSource",
"pulsar = datahub.ingestion.source.pulsar:PulsarSource",
"salesforce = datahub.ingestion.source.salesforce:SalesforceSource",
],
"datahub.ingestion.sink.plugins": [
"file = datahub.ingestion.sink.file:FileSink",
Expand Down
Loading