[docs] Vendor-Specific limitations added to troubleshooting sections (#…

…38351) Co-authored-by: Stephane Geneix <147216312+stephane-airbyte@users.noreply.github.com>
airbytehq · May 21, 2024 · a914027 · a914027
1 parent 5dac10a
commit a914027
Show file tree

Hide file tree

Showing 12 changed files with 285 additions and 85 deletions.
diff --git a/docs/integrations/destinations/postgres.md b/docs/integrations/destinations/postgres.md
@@ -2,21 +2,11 @@
 
 This page guides you through the process of setting up the Postgres destination connector.
 
-:::caution
+## Warning
 
-Postgres, while an excellent relational database, is not a data warehouse. Please only consider using postgres as a destination for small data volumes (e.g. less than 10GB) or for testing purposes. For larger data volumes, we recommend using a data warehouse like BigQuery, Snowflake, or Redshift.
+:::warning
 
-1. Postgres is likely to perform poorly with large data volumes. Even postgres-compatible
-   destinations (e.g. AWS Aurora) are not immune to slowdowns when dealing with large writes or
-   updates over ~100GB. Especially when using [typing and deduplication](/using-airbyte/core-concepts/typing-deduping) with `destination-postgres`, be sure to
-   monitor your database's memory and CPU usage during your syncs. It is possible for your
-   destination to 'lock up', and incur high usage costs with large sync volumes.
-2. When attempting to scale a postgres database to handle larger data volumes, scaling IOPS (disk throughput) is as important as increasing memory and compute capacity.
-3. Postgres column [name length limitations](https://www.postgresql.org/docs/current/limits.html)
-   are likely to cause collisions when used as a destination receiving data from highly-nested and
-   flattened sources, e.g. `{63 byte name}_a` and `{63 byte name}_b` will both be truncated to
-   `{63 byte name}` which causes postgres to throw an error that a duplicate column name was
-   specified. This limit is applicable to table names too.
+Postgres, while an excellent relational database, is not a data warehouse. Please only consider using postgres as a destination for small data volumes (e.g. less than 10GB) or for testing purposes. For larger data volumes, we recommend using a data warehouse like BigQuery, Snowflake, or Redshift. Learn more [here](/integrations/destinations/postgres/postgres-troubleshooting#postgres-is-not-a-data-warehouse).
 
 :::
 
@@ -261,6 +251,15 @@ Now that you have set up the Postgres destination connector, check out the follo
 - [Migrate from mysql to postgres](https://airbyte.com/tutorials/migrate-from-mysql-to-postgresql)
 - [Postgres replication](https://airbyte.com/tutorials/postgres-replication)
 
+## Vendor-Specific Connector Limitations
+
+:::warning
+
+Not all implementations or deployments of a database will be the same. This section lists specific limitations and known issues with the connector based on _how_ or
+_where_ it is deployed.
+
+:::
+
 ## Changelog
 
 | Version | Date       | Pull Request                                               | Subject                                                                                                  |

diff --git a/docs/integrations/destinations/postgres/postgres-troubleshooting.md b/docs/integrations/destinations/postgres/postgres-troubleshooting.md
@@ -0,0 +1,35 @@
+# Troubleshooting S3 Destinations
+
+## Connector Limitations
+
+### Postgres is not a Data Warehouse
+
+:::danger
+
+Postgres, while an excellent relational database, is not a data warehouse. Please only consider using postgres as a destination for small data volumes (e.g. less than 10GB) or for testing purposes. For larger data volumes, we recommend using a data warehouse like BigQuery, Snowflake, or Redshift.
+
+:::
+
+1. Postgres is likely to perform poorly with large data volumes. Even postgres-compatible
+   destinations (e.g. AWS Aurora) are not immune to slowdowns when dealing with large writes or
+   updates over ~100GB. Especially when using [typing and deduplication](/using-airbyte/core-concepts/typing-deduping) with `destination-postgres`, be sure to
+   monitor your database's memory and CPU usage during your syncs. It is possible for your
+   destination to 'lock up', and incur high usage costs with large sync volumes.
+2. When attempting to scale a postgres database to handle larger data volumes, scaling IOPS (disk throughput) is as important as increasing memory and compute capacity.
+3. Postgres column [name length limitations](https://www.postgresql.org/docs/current/limits.html)
+   are likely to cause collisions when used as a destination receiving data from highly-nested and
+   flattened sources, e.g. `{63 byte name}_a` and `{63 byte name}_b` will both be truncated to
+   `{63 byte name}` which causes postgres to throw an error that a duplicate column name was
+   specified. This limit is applicable to table names too.
+
+### Vendor-Specific Connector Limitations
+
+:::warning
+
+Not all implementations or deployments of a database will be the same. This section lists specific limitations and known issues with the connector based on _how_ or _where_ it is deployed.
+
+:::
+
+#### Disk Access
+
+The Airbyte Postgres destination relies on sending files to the database's temporary storage to then load in bulk. If your Postgres database does not have access to the `/tmp` file system, data loading will not succeed.
diff --git a/docs/integrations/destinations/s3.md b/docs/integrations/destinations/s3.md
@@ -19,7 +19,7 @@ Otherwise, if you are using AWS credentials you must provide the following:
 - **Access Key ID**
 - **Secret Access Key**
 
-If you are using an Instance Profile, you may omit the Access Key ID and Secret Access Key, 
+If you are using an Instance Profile, you may omit the Access Key ID and Secret Access Key,
 as well as, the Role ARN.
 
 Additionally the following prerequisites are required:
@@ -46,9 +46,11 @@ support for AWS's
 [shared responsibility model](https://aws.amazon.com/compliance/shared-responsibility-model/)
 
 #### Create bucket a Policy
+
 1. Open the [IAM console](https://console.aws.amazon.com/iam/home#home).
 2. In the IAM dashboard, select **Policies**, then click **Create Policy**.
 3. Select the **JSON** tab, then paste the following JSON into the Policy editor (be sure to substitute in your bucket name):
+
 ```json
 {
   "Version": "2012-10-17",
@@ -83,14 +85,17 @@ At this time, object-level permissions alone are not sufficient to successfully
 #### Authentication Option 1: Using an IAM Role (Most secure)
 
 <!-- env:cloud -->
+
 :::note
 This authentication method is currently in the testing phase. To enable it for your workspace, please contact our Support Team.
 :::
+
 <!-- /env:cloud -->
 
 1. In the IAM dashboard, click **Roles**, then **Create role**. <!-- env:oss -->
 2. Choose the appropriate trust entity and attach the policy you created.
 3. Set up a trust relationship for the role. For example for **AWS account** trusted entity use default AWS account on your instance (it will be used to assume role). To use **External ID** set it to environment variables as `export AWS_ASSUME_ROLE_EXTERNAL_ID="{your-external-id}"`. Edit the trust relationship policy to reflect this:
+
 ```
 {
     "Version": "2012-10-17",
@@ -109,11 +114,14 @@ This authentication method is currently in the testing phase. To enable it for y
         }
     ]
 }
-``` 
+```
+
 <!-- /env:oss -->
 <!-- env:cloud -->
+
 2. Choose the **AWS account** trusted entity type.
 3. Set up a trust relationship for the role. This allows the Airbyte instance's AWS account to assume this role. You will also need to specify an external ID, which is a secret key that the trusting service (Airbyte) and the trusted role (the role you're creating) both know. This ID is used to prevent the "confused deputy" problem. The External ID should be your Airbyte workspace ID, which can be found in the URL of your workspace page. Edit the trust relationship policy to include the external ID:
+
 ```
 {
     "Version": "2012-10-17",
@@ -133,11 +141,12 @@ This authentication method is currently in the testing phase. To enable it for y
     ]
 }
 ```
+
 <!-- /env:cloud -->
+
 4. Complete the role creation and note the Role ARN.
 5. Select **Attach policies directly**, then find and check the box for your new policy. Click **Next**, then **Add permissions**.
 
-
 ##### Authentication Option 2: Using an IAM User
 
 Use an existing or create new
@@ -212,7 +221,7 @@ Use an existing or create new
    on how to create a instanceprofile. _ We recommend creating an Airbyte-specific user. This user
    will require
    [read and write permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_examples_s3_rw-bucket.html)
-   to objects in the staging bucket. _ If the Access Key and Secret Access Key are not provided, the
+   to objects in the staging bucket. \_ If the Access Key and Secret Access Key are not provided, the
    authentication will rely either on the Role ARN using STS Assume Role or on the instanceprofile.
 5. _ **Secret Access Key** _ Corresponding key to
    the above key id. _ Make sure your S3 bucket is accessible from the machine running Airbyte. _
@@ -237,7 +246,7 @@ Use an existing or create new
    placeholders, as they won't recognized.
    <!-- /env:oss -->
 
-5. Click `Set up destination`.
+6. Click `Set up destination`.
 
 The full path of the output data with the default S3 Path Format
 `${NAMESPACE}/${STREAM_NAME}/${YEAR}_${MONTH}_${DAY}_${EPOCH}_` is:
@@ -494,6 +503,10 @@ In order for everything to work correctly, it is also necessary that the user wh
 }
 ```
 
+## Limitations & Troubleshooting
+
+To see connector limitations, or troubleshoot your S3 connector, see more [in our s3 troubleshooting guide](/integrations/destinations/s3/s3-troubleshooting).
+
 ## CHANGELOG
 
 | Version | Date       | Pull Request                                               | Subject                                                                                                                                              |

diff --git a/docs/integrations/destinations/s3/s3-troubleshooting.md b/docs/integrations/destinations/s3/s3-troubleshooting.md
@@ -0,0 +1,16 @@
+# Troubleshooting S3 Destinations
+
+## Connector Limitations
+
+### Vendor-Specific Connector Limitations
+
+:::warning
+
+Not all implementations or deployments an "S3-compatible destinations" will be the same. This section lists specific limitations and known issues with the connector based on _how_ or
+_where_ it is deployed.
+
+:::
+
+#### Linode Object Storage
+
+Liniode Object Storage does not properly return etags after setting them, which Airbyte relies on to verify the integrity of the data. This makes this destination currently incompatible with Airbyte.
diff --git a/docs/integrations/sources/mongodb-v2.md b/docs/integrations/sources/mongodb-v2.md
@@ -172,43 +172,7 @@ When Schema is not enforced there is not way to deselect fields as all fields ar
 
 ## Limitations & Troubleshooting
 
-### MongoDB Oplog and Change Streams
-
-[MongoDB's Change Streams](https://www.mongodb.com/docs/manual/changeStreams/) are based on the [Replica Set Oplog](https://www.mongodb.com/docs/manual/core/replica-set-oplog/). This has retention limitations. Syncs that run less frequently than the retention period of the Oplog may encounter issues with missing data.
-
-We recommend adjusting the Oplog size for your MongoDB cluster to ensure it holds at least 24 hours of changes. For optimal results, we suggest expanding it to maintain a week's worth of data. To adjust your Oplog size, see the corresponding tutorials for [MongoDB Atlas](https://www.mongodb.com/docs/atlas/cluster-additional-settings/#set-oplog-size) (fully-managed) and [MongoDB shell](https://www.mongodb.com/docs/manual/tutorial/change-oplog-size/) (self-hosted).
-
-If you are running into an issue similar to "invalid resume token", it may mean you need to:
-
-1. Increase the Oplog retention period.
-2. Increase the Oplog size.
-3. Increase the Airbyte sync frequency.
-
-You can run the commands outlined [in this tutorial](https://www.mongodb.com/docs/manual/tutorial/troubleshoot-replica-sets/#check-the-size-of-the-oplog) to verify the current of your Oplog. The expect output is:
-
-```yaml
-configured oplog size: 10.10546875MB
-log length start to end: 94400 (26.22hrs)
-oplog first event time: Mon Mar 19 2012 13:50:38 GMT-0400 (EDT)
-oplog last event time: Wed Oct 03 2012 14:59:10 GMT-0400 (EDT)
-now: Wed Oct 03 2012 15:00:21 GMT-0400 (EDT)
-```
-
-When importing a large MongoDB collection for the first time, the import duration might exceed the Oplog retention period. The Oplog is crucial for incremental updates, and an invalid resume token will require the MongoDB collection to be re-imported to ensure no source updates were missed.
-
-### Supported MongoDB Clusters
-
-- Only supports [replica set](https://www.mongodb.com/docs/manual/replication/) cluster type.
-- TLS/SSL is required by this connector. TLS/SSL is enabled by default for MongoDB Atlas clusters. To enable TSL/SSL connection for a self-hosted MongoDB instance, please refer to [MongoDb Documentation](https://docs.mongodb.com/manual/tutorial/configure-ssl/).
-- Views, capped collections and clustered collections are not supported.
-- Empty collections are excluded from schema discovery.
-- Collections with different data types for the values in the `_id` field among the documents in a collection are not supported. All `_id` values within the collection must be the same data type.
-- Atlas DB cluster are only supported in a dedicated M10 tier and above. Lower tiers may fail during connection setup.
-
-### Schema Discovery & Enforcement
-
-- Schema discovery uses [sampling](https://www.mongodb.com/docs/manual/reference/operator/aggregation/sample/) of the documents to collect all distinct top-level fields. This value is universally applied to all collections discovered in the target database. The approach is modelled after [MongoDB Compass sampling](https://www.mongodb.com/docs/compass/current/sampling/) and is used for efficiency. By default, 10,000 documents are sampled. This value can be increased up to 100,000 documents to increase the likelihood that all fields will be discovered. However, the trade-off is time, as a higher value will take the process longer to sample the collection.
-- When Running with Schema Enforced set to `false` there is no attempt to discover any schema. See more in [Schema Enforcement](#Schema-Enforcement).
+To see connector limitations, or troubleshoot your MongoDB connector, see more [in our MongoDB troubleshooting guide](/integrations/sources/mongodb-v2/mongodb-v2-troubleshooting).
 
 ## Configuration Parameters
 
@@ -231,8 +195,8 @@ For more information regarding configuration parameters, please see [MongoDb Doc
 ## Changelog
 
 | Version | Date       | Pull Request                                             | Subject                                                                                                   |
-|:--------|:-----------| :------------------------------------------------------- |:----------------------------------------------------------------------------------------------------------|
-| 1.3.13  | 2024-05-09 | [36851](https://github.com/airbytehq/airbyte/pull/36851) | Support reading collection with a binary _id type.                                                        |
+| :------ | :--------- | :------------------------------------------------------- | :-------------------------------------------------------------------------------------------------------- |
+| 1.3.13  | 2024-05-09 | [36851](https://github.com/airbytehq/airbyte/pull/36851) | Support reading collection with a binary \_id type.                                                       |
 | 1.3.12  | 2024-05-07 | [36851](https://github.com/airbytehq/airbyte/pull/36851) | Upgrade debezium to version 2.5.1.                                                                        |
 | 1.3.11  | 2024-05-02 | [37753](https://github.com/airbytehq/airbyte/pull/37753) | Chunk size(limit) should correspond to ~1GB of data.                                                      |
 | 1.3.10  | 2024-05-02 | [37781](https://github.com/airbytehq/airbyte/pull/37781) | Adopt latest CDK.                                                                                         |

diff --git a/...grations/sources/mongodb-v2-migrations.md → ...urces/mongodb-v2/mongodb-v2-migrations.md b/...grations/sources/mongodb-v2-migrations.md → ...urces/mongodb-v2/mongodb-v2-migrations.md
@@ -1,16 +1,16 @@
-# MongoDb Migration Guide
+# Mongo DB Migration Guide
 
 ## Upgrading to 1.0.0
 
 This version introduces a general availability version of the MongoDB V2 source connector, which leverages
-[Change Data Capture (CDC)](https://docs.airbyte.com/understanding-airbyte/cdc) to improve the performance and
+[Change Data Capture (CDC)](/understanding-airbyte/cdc) to improve the performance and
 reliability of syncs. This version provides better error handling, incremental delivery of data and improved
 reliability of large syncs via frequent checkpointing.
 
 **THIS VERSION INCLUDES BREAKING CHANGES FROM PREVIOUS VERSIONS OF THE CONNECTOR!**
 
 The changes will require you to reconfigure your existing MongoDB V2 configured source connectors. To review the
-breaking changes and to learn how to upgrade the connector, refer to the [MongoDB V2 source connector documentation](mongodb-v2#upgrade-from-previous-version).
+breaking changes and to learn how to upgrade the connector, refer to the [MongoDB V2 source connector documentation](/integrations/sources/mongodb-v2#upgrade-from-previous-version).
 Additionally, you can manually update existing connections prior to the next scheduled sync to perform the upgrade or
 re-create the source using the new configuration.
 
@@ -22,4 +22,4 @@ Worthy of specific mention, this version includes:
 - Sampling of fields for schema discovery
 - Required SSL/TLS connections
 
-Learn more about what's new in the connection, view the updated documentation [here](mongodb-v2).
+Learn more about what's new in the connection, view the updated documentation [here](/integrations/sources/mongodb-v2/).