Skip to content

Commit

Permalink
Modified readme to reflect the updated features in HarbourBridge (#391)
Browse files Browse the repository at this point in the history
* updated readme

* addressing comments

* addressed comments

* addressing comments
  • Loading branch information
shreyakhajanchi authored Nov 15, 2022
1 parent bf73edf commit 828e86a
Show file tree
Hide file tree
Showing 7 changed files with 54 additions and 41 deletions.
48 changes: 38 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,15 @@ migration, using data from an existing PostgreSQL, MySQL, SQL Server, Oracle or
The tool ingests schema and data from either a pg_dump/mysqldump file or directly
from the source database, and supports both schema and data migration. For schema
migration, HarbourBridge automatically builds a Spanner schema from the schema
of the source database. This schema can be customized using the HarbourBridge
schema assistant. For data migration, HarbourBridge creates a new Spanner
database using the Spanner schema built during schema migration, and populates
it with data from the source database.
of the source database. This schema can be customized using the HarbourBridge schema assistant and
a new Spanner database is created using the Spanner schema built.

For more details on schema customization and use of the schema assistant, see
[web/README](web/README.md). The rest of this README describes the command-line
[web/README](webv2/README.md). The rest of this README describes the command-line
capabilities of HarbourBridge.

HarbourBridge is designed to simplify Spanner evaluation and migration, and in
particular for migrating moderate-size datasets to Spanner
(up to about 100GB). Certain features of relational databases, especially those that don't
HarbourBridge is designed to simplify Spanner evaluation and migration.
Certain features of relational databases, especially those that don't
map directly to Spanner features, are ignored, e.g. stored functions and
procedures, and sequences. Types such as integers, floats, char/text, bools,
timestamps, and (some) array types, map fairly directly to Spanner, but many
Expand All @@ -31,6 +28,14 @@ critical things like tuning performance and getting the most out of
Spanner. Expect that you'll need to tweak and enhance what HarbourBridge
produces.

## Data Migration

HarbourBridge supports two types of data migrations:

* Streaming migration - A streaming migration consists of two components, migration of existing data from the database and the stream of changes (writes and updates) that are made to the source database during migration, referred to as change database capture (CDC). Using HarbourBridge, the entire process where Datastream reads data from the source database and writes to a GCS bucket and data flow reads data from GCS bucket and writes to spanner database can be orchestrated using a unified interface. Performing schema changes on the source database during the migration is not supported. This is the suggested mode of migration for most databases.

* Bulk Migration - HarbourBridge reads data from source database and writes it to the database created in Cloud Spanner. Changes which happen to the source database during the bulk migration may or may not be written to Spanner. To achieve consistent version of data, stop writes on the source while migration is in progress, or use a read replica. Performing schema changes on the source database during the migration is not supported. While there is no technical limit on the size of the database, it is recommended for migrating moderate-size datasets to Spanner(up to about 100GB).

For some quick starter examples on how to run HarbourBridge, take a look at
[Quickstart Guide](#quickstart-guide).

Expand Down Expand Up @@ -308,7 +313,7 @@ This will print the usage pattern, a few examples, and a list of all available s
#### harbourbridge `schema`
This subcommand can be used to perform schema migration and report on the quality of the migration. Generated schema mapping file (session.json) can be then further edited using the HarbourBridge web UI to make custom edits to the destination schema. This session file
This subcommand can be used to perform schema conversion and report on the quality of the conversion. The generated schema mapping file (session.json) can be then further edited using the HarbourBridge web UI to make custom edits to the destination schema. This session file
is then passed to the data subcommand to perform data migration while honoring the defined
schema mapping. HarbourBridge also generates Spanner schema which users can modify manually and use directly as well.
Expand Down Expand Up @@ -356,7 +361,7 @@ conversion state endcoded as JSON.
`-target-profile` Specifies detailed parameters for the target database. See [Target Profile](#target-profile) for details.
`-dry-run` Controls whether we run the migration in dry run mode or not. Using this mode generates schema and report for schema and/or data conversion without any actual creation of tables.
`-dry-run` Controls whether we run the migration in dry run mode or not. Using this mode generates session file, schema and report for schema and/or data conversion without actually creating the Spanner database.
### Source Profile
Expand All @@ -375,6 +380,29 @@ have read pemissions to the GCS bucket you would like to use.
defaults to `dump`. This may be extended in future to support other formats
such as `csv`, `avro` etc.
`host` Specifies the host name for the source database.
If not specified in case of direct connection to the source database, HarbourBridge
fetches it from the environment variables([Example usage](#21-generating-pgdump-file)).
`user` Specifies the user for the source database.
If not specified in case of direct connection to the source database, HarbourBridge
fetches it from the environment variables([Example usage](#21-generating-pgdump-file)).
`dbName` Specifies the name of the source database.
If not specified in case of direct connection to the source database, HarbourBridge
fetches it from the environment variables([Example usage](#21-generating-pgdump-file)).
`port` Specifies the port for the source database.
If not specified in case of direct connection to the source database, HarbourBridge
fetches it from the environment variables([Example usage](#21-generating-pgdump-file)).
`password` Specifies the password for the source database.
If not specified in case of direct connection to the source database, HarbourBridge
fetches it from the environment variables([Example usage](#21-generating-pgdump-file)).
`streamingCfg` Optional flag. Specifies the file path for streaming config.
Please note that streaming migration is only supported for MySQL and Oracle databases currently.
### Target Profile
HarbourBridge accepts the following options for --target-profile,
Expand Down
13 changes: 3 additions & 10 deletions sources/dynamodb/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,28 +30,21 @@ can also pass corresponding source profile connection parameters `aws-access-key
, `aws-secret-access-key`, `aws-region`. Custom endpoint can be specified using
`dydb-endpoint` param.

For example, to convert schema run
For example, to perform schema conversion, run

```sh
harbourbridge schema -source=dynamodb -source-profile="aws-access-key-id=<>,aws-secret-access-key=<>,aws-region=<>"
```

This will generate a session file with `session.json` suffix. This file contains
schema mapping from source to destination. You will need to specify this file
during data migration. You can also specify a particular Spanner instance to use
during data migration. You also need to specify a particular Spanner instance and database to use
during data migration.

For example, run

```sh
harbourbridge data -session=mydb.session.json -source=dynamodb -source-profile="aws-access-key-id=<>,..." -target-profile="instance=my-spanner-instance"
```

By default, HarbourBridge will generate a new Spanner database name to populate.
You can override this and specify the database name to use by:

```sh
harbourbridge data -session=mydb.session.json -source=dynamodb -source-profile="aws-access-key-id=<>,..." -target-profile="instance=my-spanner-instance,..."
harbourbridge data -session=mydb.session.json -source=dynamodb -source-profile="aws-access-key-id=<>,..." -target-profile="instance=my-spanner-instance,,dbName=my-spanner-database-name"
```

You can also run HarbourBridge in a schema-and-data mode, where it will perform both
Expand Down
13 changes: 3 additions & 10 deletions sources/mysql/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,26 +15,19 @@ in the [Installing HarbourBridge](https://github.com/cloudspannerecosystem/harbo

### Using HarbourBridge with mysqldump

The tool can used to migrate schema from an existing mysqldump file:
The tool can be used to migrate schema from an existing mysqldump file:

```sh
harbourbridge schema -source=mysql < my_mysqldump_file
```

This will generate a session file with `session.json` suffix. This file contains
schema mapping from source to destination. You will need to specify this file
during data migration. You can also specify a particular Spanner instance to use
during data migration. You also need to specify a particular Spanner instance and database to use
during data migration.

For example, run

```sh
harbourbridge data -session=mydb.session.json -source=mysql -target-profile="instance=my-spanner-instance" < my_mysqldump_file
```

By default, HarbourBridge will generate a new Spanner database name to populate.
You can override this and specify the database name to use by:

```sh
harbourbridge data -session=mydb.session.json -source=mysql -target-profile="instance=my-spanner-instance,dbName=my-spanner-database-name" < my_mysqldump_file
```
Expand Down Expand Up @@ -71,7 +64,7 @@ In this case, HarbourBridge connects directly to the MySQL database to retrieve
table schema and data. Set the `-source=mysql` and corresponding source profile
connection parameters `host`, `port`, `user`, `dbName` and `password`.

For example to perform schema conversion, run
For example, to perform schema conversion, run

```sh
harbourbridge schema -source=mysql -source-profile="host=<>,port=<>,user=<>,dbName=<>"
Expand Down
2 changes: 1 addition & 1 deletion sources/oracle/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ retrieve table schema and data. Set the `-source=oracle` and corresponding
source profile connection parameters `host`, `port`, `user`, `dbName` and
`password`.

For example to perform schema conversion, run
For example, to perform schema conversion, run

```sh
harbourbridge schema -source=oracle -source-profile="host=<>,port=<>,user=<>,dbName=<>,password=<>"
Expand Down
11 changes: 2 additions & 9 deletions sources/postgres/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,18 +26,11 @@ You can use any of `postgresql`, `postgres`, or `pg` as the argument to the

This will generate a session file with `session.json` suffix. This file contains
schema mapping from source to destination. You will need to specify this file
during data migration. You can also specify a particular Spanner instance to use
during data migration. You also need to specify a particular Spanner instance and database to use
during data migration.

For example, run

```sh
harbourbridge data -session=mydb.session.json -source=pg -target-profile="instance=my-spanner-instance" < my_pg_dump_file
```

By default, HarbourBridge will generate a new Spanner database name to populate.
You can override this and specify the database name to use by:

```sh
harbourbridge data -session=mydb.session.json -source=pg -target-profile="instance=my-spanner-instance,dbName=my-spanner-database-name" < my_pg_dump_file
```
Expand Down Expand Up @@ -75,7 +68,7 @@ retrieve table schema and data. Set the `-source=postgres` and corresponding
source profile connection parameters `host`, `port`, `user`, `dbName` and
`password`.

For example to perform schema conversion, run
For example, to perform schema conversion, run

```sh
harbourbridge schema -source=postgres -source-profile="host=<>,port=<>,user=<>,dbName=<>"
Expand Down
2 changes: 1 addition & 1 deletion sources/sqlserver/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ retrieve table schema and data. Set the `-source=sqlserver` and corresponding
source profile connection parameters `host`, `port`, `user`, `dbName` and
`password`.

For example to perform schema conversion, run
For example, to perform schema conversion, run

```sh
harbourbridge schema -source=sqlserver -source-profile="host=<>,port=<>,user=<>,dbName=<>"
Expand Down
6 changes: 6 additions & 0 deletions webv2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,12 @@ alias harbourbridge="go run github.com/cloudspannerecosystem/harbourbridge"
HarbourBridge's Web API feature can be used with all the driver modes available,
using mysql or postgres dump or direct connection.

To generate the HarbourBridge binary, run:

```sh
make build
```

To start HarbourBridge web server, run:

```sh
Expand Down

0 comments on commit 828e86a

Please sign in to comment.