Skip to content

Commit

Permalink
feat(ingest): salesforce - add connector (#5104)
Browse files Browse the repository at this point in the history
Co-authored-by: Shirshanka Das <shirshanka@apache.org>
Co-authored-by: Vincent Koc <koconder@users.noreply.github.com>
  • Loading branch information
3 people authored Jul 6, 2022
1 parent e6662a7 commit 4b515e0
Show file tree
Hide file tree
Showing 16 changed files with 6,797 additions and 0 deletions.
6 changes: 6 additions & 0 deletions datahub-web-react/src/images/logo-salesforce.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
30 changes: 30 additions & 0 deletions metadata-ingestion/docs/sources/salesforce/salesforce.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
### Prerequisites

In order to ingest metadata from Salesforce, you will need:

- Salesforce username, password, [security token](https://developer.Salesforce.com/docs/atlas.en-us.api.meta/api/sforce_api_concepts_security.htm) OR
- Salesforce instance url and access token/session id (suitable for one-shot ingestion only, as access token typically expires after 2 hours of inactivity)

## Integration Details
This plugin extracts Salesforce Standard and Custom Objects and their details (fields, record count, etc) from a Salesforce instance.
Python library [simple-salesforce](https://pypi.org/project/simple-salesforce/) is used for authenticating and calling [Salesforce REST API](https://developer.Salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/intro_what_is_rest_api.htm) to retrive details from Salesforce instance.

### REST API Resources used in this integration
- [Versions](https://developer.Salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/resources_versions.htm)
- [Tooling API Query](https://developer.salesforce.com/docs/atlas.en-us.api_tooling.meta/api_tooling/intro_rest_resources.htm) on objects EntityDefinition, EntityParticle, CustomObject, CustomField
- [Record Count](https://developer.Salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/resources_record_count.htm)

### Concept Mapping

This ingestion source maps the following Source System Concepts to DataHub Concepts:

| Source Concept | DataHub Concept | Notes |
| -- | -- | -- |
| `Salesforce` | [Data Platform](../../metamodel/entities/dataPlatform.md) | |
|Standard Object | [Dataset](../../metamodel/entities/dataset.md) | subtype "Standard Object" |
|Custom Object | [Dataset](../../metamodel/entities/dataset.md) | subtype "Custom Object" |

### Caveats
- This connector has only been tested with Salesforce Developer Edition.
- This connector only supports table level profiling (Row and Column counts) as of now. Row counts are approximate as returned by [Salesforce RecordCount REST API](https://developer.Salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/resources_record_count.htm).
- This integration does not support ingesting Salesforce [External Objects](https://developer.Salesforce.com/docs/atlas.en-us.object_reference.meta/object_reference/sforce_api_objects_external_objects.htm)
25 changes: 25 additions & 0 deletions metadata-ingestion/docs/sources/salesforce/salesforce_recipe.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
pipeline_name: my_salesforce_pipeline
source:
type: "salesforce"
config:
instance_url: "https://mydomain.my.salesforce.com/"
username: user@company
password: password_for_user
security_token: security_token_for_user
platform_instance: mydomain-dev-ed
domain:
sales:
allow:
- "Opportunity$"
- "Lead$"

object_pattern:
allow:
- "Account$"
- "Opportunity$"
- "Lead$"

sink:
type: "datahub-rest"
config:
server: "http://localhost:8080"
3 changes: 3 additions & 0 deletions metadata-ingestion/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,7 @@ def get_long_description():
"redshift": sql_common | redshift_common,
"redshift-usage": sql_common | usage_common | redshift_common,
"sagemaker": aws_common,
"salesforce":{"simple-salesforce"},
"snowflake": snowflake_common,
"snowflake-usage": snowflake_common
| usage_common
Expand Down Expand Up @@ -366,6 +367,7 @@ def get_long_description():
"starburst-trino-usage",
"powerbi",
"vertica",
"salesforce"
# airflow is added below
]
for dependency in plugins[plugin]
Expand Down Expand Up @@ -509,6 +511,7 @@ def get_long_description():
"vertica = datahub.ingestion.source.sql.vertica:VerticaSource",
"presto-on-hive = datahub.ingestion.source.sql.presto_on_hive:PrestoOnHiveSource",
"pulsar = datahub.ingestion.source.pulsar:PulsarSource",
"salesforce = datahub.ingestion.source.salesforce:SalesforceSource",
],
"datahub.ingestion.sink.plugins": [
"file = datahub.ingestion.sink.file:FileSink",
Expand Down
Loading

0 comments on commit 4b515e0

Please sign in to comment.