Skip to content

Commit

Permalink
[AD-1014] Developer Guide. (#451)
Browse files Browse the repository at this point in the history
* [AD-1014] Developer Guide.

* Commit Code Coverage Badge

* [AD-1014] Updates to use existing GETTING_STARTED.md and added schema-caching.md

* Commit Code Coverage Badge

Co-authored-by: birschick-bq <birschick-bq@users.noreply.github.com>
  • Loading branch information
Bruce Irschick and birschick-bq authored Dec 12, 2022
1 parent b1ee65c commit fa7a513
Show file tree
Hide file tree
Showing 4 changed files with 202 additions and 24 deletions.
102 changes: 79 additions & 23 deletions GETTING_STARTED.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,8 +156,45 @@ rather than the cluster endpoint since we have set up the SSH tunnel.
~~~
mongo --host 127.0.0.1:27017 --username <master-username> --password <master-password>
~~~

## Database User Account Definitions

The integration tests assume the following two user accounts are created
in the target database server.

### Administrative User

User: `documentdb`

#### Definition:

```json
{
"user" : "documentdb",
"roles" : [ {
"db" : "admin",
"role" : "root"
} ]
}
```

### Restricted Access User

User: `docDbRestricted`

#### Definition

```json
{
"user" : "docDbRestricted",
"roles" : [ {
"db" : "admin",
"role" : "readAnyDatabase"
} ]
}
```

##### Connect with TLS
## Connect with TLS
When connecting to a TLS-enabled cluster you can follow the same steps to set up an SSH tunnel but will need to also
download the Amazon DocumentDB Certificate Authority (CA) file before trying to connect.
1. Download the CA file.
Expand All @@ -178,8 +215,8 @@ access the cluster from localhost, the server certificate does not match the hos
mongo --host 127.0.0.1:27017 --username <master-username> --password <master-password> --tls --tlsCAFile rds-combined-ca-bundle.pem --tlsAllowInvalidHostnames
~~~

##### Connect Programmatically
###### Without TLS
### Connect Programmatically
#### Without TLS
Connecting without TLS is very straightforward. We essentially follow the same steps as when connecting using the
`mongo` shell.
1. Setup the SSH tunnel. See step 3 in section [Setting Up Environment Variables](#setting-up-environment-variables) for
Expand All @@ -201,7 +238,7 @@ Make sure to set the hostname, username, password and target database. The targe
}
~~~

###### With TLS
#### With TLS
Connecting with TLS programmatically is slightly different from how we did it with the `mongo` shell.
1. Create a test or simple main to run.
2. Use either the Driver Manager, Data Source class or Connection class to establish a connection to `localhost:27017`.
Expand All @@ -224,36 +261,57 @@ class:
}
~~~

#### Setting Up Environment Variables
1. Create and set the Environment Variables:
## Integration Testing

~~~
DOC_DB_USER_NAME=<secret-username>
DOC_DB_PASSWORD=<secret-password>
DOC_DB_LOCAL_PORT=27019
DOC_DB_USER=<ec2-username>@<public-IPv4-DNS-name>
DOC_DB_HOST=<cluster-endpoint>
DOC_DB_PRIV_KEY_FILE=~/.ssh/<key-pair-name>.pem
~~~
By default, integration testing is disabled for local development. To enable
integration testing, follow the directions below.

### Setting Up Environment Variables

2. Ensure the private key file <key pair name>.pem is in the location set by the environment variable
To enable integration testing the following environment variables allow
you to customize the credentials and DocumentDB cluster settings.

1. Create and set the following environment variables:

| Variable | Description | Example |
|------------------------|--------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|
| `DOC_DB_USER_NAME` | This is the DocumentDB user. | `documentdb` |
| `DOC_DB_PASSWORD` | This is the DocumentDB password. | `aSecret` |
| `DOC_DB_LOCAL_PORT` | This is the port number used locally via an SSH Tunnel. It is recommend to use a different value than the default 27017. | `27019` |
| `DOC_DB_USER` | This is the user and host of SSH Tunnel EC2 instance. | `ec2-user@254.254.254.254` |
| `DOC_DB_HOST` | This is the host of the DocumentDB cluster server. | `docdb-jdbc-literal-test.cluster-abcdefghijk.us-east-2.docdb.amazonaws.com` |
| `DOC_DB_PRIV_KEY_FILE` | This is the path to the SSH Tunnel private key-pair file. | `~/.ssh/ec2-literal.pem` |

### SSH Tunnel

1. Ensure the private key file <key pair name>.pem is in the location set by the environment variable
`DOC_DB_PRIV_KEY_FILE`.
3. Start an SSH port-forwarding tunnel:
2. Assuming you have the environment variables setup above, starting an SSH tunnel from the command line should look like this:

~~~shell
ssh [-f] -N -i $DOC_DB_PRIV_KEY_FILE -L $DOC_DB_LOCAL_PORT:$DOC_DB_HOST:27017 $DOC_DB_USER
~~~
ssh [-f] -N -i ~/.ssh/<key-pair-name>.pem -L $DOC_DB_LOCAL_PORT:$DOC_DB_HOST:27017 $DOC_DB_USER
~~~


- The `-L` flag defines the port forwarded to the remote host and remote port. Adding the `-N` flag means do not
execute a remote command, you will not get a shell in this case. The `-f` switch instructs SSH to run in the
background.

#### Bypass Testing DocumentDB
### Enable Integration Testing of Amazon DocumentDB

To enable integration testing in the IDE, update the grade property, as intructed below.

1. Modify the */gradle.properties* file in the source code and uncomment the following line:
`runRemoteIntegrationTests=false`
`runRemoteIntegrationTests=true`

### Project Secrets

For the purposes of automated integration testing in **GitHub**, this project maintains the value for the environment variables above
as project secrets. See the workflow file [gradle.yml](https://github.com/aws/amazon-documentdb-jdbc-driver/blob/1edd9e21fdcccfe62d366580702f2904136298e5/.github/workflows/gradle.yml)

## Troubleshooting

### Issues with JDK

1. Confirm project SDK is Java Version 1.8 via the IntelliJ top menu toolbar under
*File → Project Structure → Platform Settings -> SDK* and reload the JDK home path by browsing to the path and click
*apply* and *ok*. Restart IntelliJ IDEA.
Expand All @@ -277,5 +335,3 @@ class:
below. Go to EC2 Dashboard → **Network & Security** Group in the left menu → **Security** Group.

![Security Policy for EC2 Instance](src/markdown/images/getting-started/security-policy-ec2-instance.png)


7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,4 +67,9 @@ your issue.

## Security Notice

If you discover a potential security issue in this project, please consult our [security guidance page](SECURITY.md).
If you discover a potential security issue in this project, please consult our [security guidance page](SECURITY.md).

## Contributor's Getting Started Guide

If you're a developer and want to contribute to this project, ensure to read and follow the
[Getting Started as a Developer](GETTING_STARTED.md) guide.
6 changes: 6 additions & 0 deletions src/markdown/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,12 @@ The Amazon DocumentDB JDBC driver can perform automatic schema discovery and gen
DocumentDB schema mapping. See the [schema discovery documentation](schema/schema-discovery.md)
for more details of this process.

## Schema Caching

Once schema is discovered, it is cached in the database to improve performance for subsequent access.
See the [schema caching documentation](schema/schema-caching.md) to learn
more about schema caching behaviour and access requirements.

## Schema Management

The SQL to DocumentDB schema mapping can be managed in the following ways:
Expand Down
111 changes: 111 additions & 0 deletions src/markdown/schema/schema-caching.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Schema Caching

## Schema Caching Behaviour

When a connection is made to an Amazon DocumentDB database, the Amazon DocumentDB JDBC driver
checks for a previously cached version of the mapped schema. If a previous version exists,
the latest version of the cached schema is read and used for all further interaction with the database.

If a previously cached version does not exist, the process of [schema discover](schema-discovery.md) is automatically
started on all the accessible collections in the database. The discovery process uses the properties
`scanMethod` (default `random`), and `scanLimit` (default `1000`) when sampling documents from the database.
At the end of the discovery process, the resulting schema mapping is written to the cache using the name
associated with the property `schemaName` (default `_default`).

If some reason the resulting schema cannot be saved to the cache, the resulting schema will still be used
in-memory for the life of the connection. The implication of not having access to a cached version of the
schema is that the schema discovery will have to be performed for each connection - which could have a seriously
negative impact on performance.

## Cache Location

The SQL schema mapping cache is stored in two collections on the same database as
the sampled collections. The collection `_sqlSchemas` stores the names and versions of
all the sampled schemas for the given database. The collection `_sqlTableSchemas` stores the
column to field mappings for all the cached SQL schema mappings. The two cache collections
have a strong parent/child relationship and must be maintained in a consistent way. Always use
the [schema management CLI](manage-schema-cli.md) to ensure consistency in the cache collections.

## User Permissions for Creating and Updating the Schema Cache

To be able to store or update the SQL schema mappings to the cache collections, the connected
Amazon DocumentDB user account must have write permissions to create and update the
cache collections. Once the schema is cached, users need only read permission on the
cache collections.

To allow access for an Amazon DocumentDB user, ensure to set or add the appropriate roles as
described below.

### Enable Access per Database

To allow read and write access to specific databases in your server, add
a `readWrite` [built-in role](https://www.mongodb.com/docs/manual/reference/built-in-roles/#mongodb-authrole-readWrite)
for each database the user should have access to be able to create and update the cached schema for specific
databases.

```json
roles: [
{role: "readWrite", db: "yourDatabase1"},
{role: "readWrite", db: "yourDatabase2"} ...
]
```

### Enable Access for Any Database

To allow read and write access to any databases in your server, add
a `readWriteAnyDatabase` [built-in role](https://www.mongodb.com/docs/manual/reference/built-in-roles/#mongodb-authrole-readWriteAnyDatabase)
on the `admin` database to be able to create and update the cached schema in any database.

```json
roles: [
{role: "readWriteAnyDatabase", db: "admin"}
]
```

### Collection-Level Access Control

If [collection-level access control](https://www.mongodb.com/docs/manual/core/collection-level-access-control/)
is implemented, then ensure `find`, `insert`, and `update` actions are
allowed on the cache collections (`_sqlSchemas` and `_sqlTableSchemas`)

## User Permissions for Reading an Existing Schema Cache

To be able to read the SQL schema mappings to the cache collections, the connected
Amazon DocumentDB user account must have read permissions to read the
cache collections.

To allow access for an Amazon DocumentDB user, ensure to set or add the appropriate roles as
described below.

### Enable Access per Database

To allow read access to specific databases in your server, add
a `read` [built-in role](https://www.mongodb.com/docs/manual/reference/built-in-roles/#mongodb-authrole-read)
for each database the user should have access to be able to read the cached schema for specific
databases.

```json
roles: [
{role: "read", db: "yourDatabase1"},
{role: "read", db: "yourDatabase2"} ...
]
```

### Enable Access for Any Database

To allow read access to any databases in your server, add
a `readAnyDatabase` [built-in role](https://www.mongodb.com/docs/manual/reference/built-in-roles/#mongodb-authrole-readAnyDatabase)
on the `admin` database to be able to read the cached schema in any database.

```json
roles: [
{role: "readAnyDatabase", db: "admin"}
]
```

### Collection-Level Access Control

If [collection-level access control](https://www.mongodb.com/docs/manual/core/collection-level-access-control/)
is implemented, then ensure `find` actions are
allowed on the cache collections (`_sqlSchemas` and `_sqlTableSchemas`)

0 comments on commit fa7a513

Please sign in to comment.