Modified readme to reflect the updated features in HarbourBridge (#391)

* updated readme * addressing comments * addressed comments * addressing comments
GoogleCloudPlatform · Nov 15, 2022 · 828e86a · 828e86a
1 parent bf73edf
commit 828e86a
Show file tree

Hide file tree

Showing 7 changed files with 54 additions and 41 deletions.
diff --git a/README.md b/README.md
@@ -7,18 +7,15 @@ migration, using data from an existing PostgreSQL, MySQL, SQL Server, Oracle or
 The tool ingests schema and data from either a pg_dump/mysqldump file or directly
 from the source database, and supports both schema and data migration. For schema
 migration, HarbourBridge automatically builds a Spanner schema from the schema
-of the source database. This schema can be customized using the HarbourBridge
-schema assistant. For data migration, HarbourBridge creates a new Spanner
-database using the Spanner schema built during schema migration, and populates
-it with data from the source database.
+of the source database. This schema can be customized using the HarbourBridge schema assistant and
+a new Spanner database is created using the Spanner schema built.
 
 For more details on schema customization and use of the schema assistant, see
-[web/README](web/README.md). The rest of this README describes the command-line
+[web/README](webv2/README.md). The rest of this README describes the command-line
 capabilities of HarbourBridge.
 
-HarbourBridge is designed to simplify Spanner evaluation and migration, and in
-particular for migrating moderate-size datasets to Spanner
-(up to about 100GB). Certain features of relational databases, especially those that don't
+HarbourBridge is designed to simplify Spanner evaluation and migration.
+Certain features of relational databases, especially those that don't
 map directly to Spanner features, are ignored, e.g. stored functions and
 procedures, and sequences. Types such as integers, floats, char/text, bools,
 timestamps, and (some) array types, map fairly directly to Spanner, but many
@@ -31,6 +28,14 @@ critical things like tuning performance and getting the most out of
 Spanner. Expect that you'll need to tweak and enhance what HarbourBridge
 produces.
 
+## Data Migration
+
+HarbourBridge supports two types of data migrations:
+
+* Streaming migration - A streaming migration consists of two components, migration of existing data from the database and the stream of changes (writes and updates) that are made to the source database during migration, referred to as change database capture (CDC). Using HarbourBridge, the entire process where Datastream reads data from the source database and writes to a GCS bucket and data flow reads data from GCS bucket and writes to spanner database can be orchestrated using a unified interface. Performing schema changes on the source database during the migration is not supported. This is the suggested mode of migration for most databases.
+
+* Bulk Migration -  HarbourBridge reads data from source database and writes it to the database created in Cloud Spanner. Changes which happen to the source database during the bulk migration may or may not be written to Spanner. To achieve consistent version of data, stop writes on the source while migration is in progress, or use a read replica. Performing schema changes on the source database during the migration is not supported. While there is no technical limit on the size of the database, it is recommended for migrating moderate-size datasets to Spanner(up to about 100GB).
+
 For some quick starter examples on how to run HarbourBridge, take a look at
 [Quickstart Guide](#quickstart-guide).
 
@@ -308,7 +313,7 @@ This will print the usage pattern, a few examples, and a list of all available s
 
 #### harbourbridge `schema`
 
-This subcommand can be used to perform schema migration and report on the quality of the migration. Generated schema mapping file (session.json) can be then further edited using the HarbourBridge web UI to make custom edits to the destination schema. This session file
+This subcommand can be used to perform schema conversion and report on the quality of the conversion. The generated schema mapping file (session.json) can be then further edited using the HarbourBridge web UI to make custom edits to the destination schema. This session file
 is then passed to the data subcommand to perform data migration while honoring the defined
 schema mapping. HarbourBridge also generates Spanner schema which users can modify manually and use directly as well.
 
@@ -356,7 +361,7 @@ conversion state endcoded as JSON.
 
 `-target-profile` Specifies detailed parameters for the target database. See [Target Profile](#target-profile) for details.
 
-`-dry-run` Controls whether we run the migration in dry run mode or not. Using this mode generates schema and report for schema and/or data conversion without any actual creation of tables.
+`-dry-run` Controls whether we run the migration in dry run mode or not. Using this mode generates session file, schema and report for schema and/or data conversion without actually creating the Spanner database.
 
 ### Source Profile
 
@@ -375,6 +380,29 @@ have read pemissions to the GCS bucket you would like to use.
 defaults to `dump`. This may be extended in future to support other formats
 such as `csv`, `avro` etc.
 
+`host` Specifies the host name for the source database.
+If not specified in case of direct connection to the source database, HarbourBridge
+fetches it from the environment variables([Example usage](#21-generating-pgdump-file)).
+
+`user` Specifies the user for the source database.
+If not specified in case of direct connection to the source database, HarbourBridge
+fetches it from the environment variables([Example usage](#21-generating-pgdump-file)).
+
+`dbName` Specifies the name of the source database.
+If not specified in case of direct connection to the source database, HarbourBridge
+fetches it from the environment variables([Example usage](#21-generating-pgdump-file)).
+
+`port` Specifies the port for the source database.
+If not specified in case of direct connection to the source database, HarbourBridge
+fetches it from the environment variables([Example usage](#21-generating-pgdump-file)).
+
+`password` Specifies the password for the source database.
+If not specified in case of direct connection to the source database, HarbourBridge
+fetches it from the environment variables([Example usage](#21-generating-pgdump-file)).
+
+`streamingCfg` Optional flag. Specifies the file path for streaming config.
+Please note that streaming migration is only supported for MySQL and Oracle databases currently.
+
 ### Target Profile
 
 HarbourBridge accepts the following options for --target-profile,

diff --git a/sources/dynamodb/README.md b/sources/dynamodb/README.md
@@ -30,28 +30,21 @@ can also pass corresponding source profile connection parameters `aws-access-key
 , `aws-secret-access-key`, `aws-region`. Custom endpoint can be specified using
 `dydb-endpoint` param.
 
-For example, to convert schema run
+For example, to perform schema conversion, run
 
 ```sh
 harbourbridge schema -source=dynamodb -source-profile="aws-access-key-id=<>,aws-secret-access-key=<>,aws-region=<>"
 ```
 
 This will generate a session file with `session.json` suffix. This file contains
 schema mapping from source to destination. You will need to specify this file
-during data migration. You can also specify a particular Spanner instance to use
+during data migration. You also need to specify a particular Spanner instance and database to use
 during data migration.
 
 For example, run
 
 ```sh
-harbourbridge data -session=mydb.session.json -source=dynamodb -source-profile="aws-access-key-id=<>,..." -target-profile="instance=my-spanner-instance"
-```
-
-By default, HarbourBridge will generate a new Spanner database name to populate.
-You can override this and specify the database name to use by:
-
-```sh
-harbourbridge data -session=mydb.session.json -source=dynamodb -source-profile="aws-access-key-id=<>,..." -target-profile="instance=my-spanner-instance,..."
+harbourbridge data -session=mydb.session.json -source=dynamodb -source-profile="aws-access-key-id=<>,..." -target-profile="instance=my-spanner-instance,,dbName=my-spanner-database-name"
 ```
 
 You can also run HarbourBridge in a schema-and-data mode, where it will perform both

diff --git a/sources/mysql/README.md b/sources/mysql/README.md
@@ -15,26 +15,19 @@ in the [Installing HarbourBridge](https://github.com/cloudspannerecosystem/harbo
 
 ### Using HarbourBridge with mysqldump
 
-The tool can used to migrate schema from an existing mysqldump file:
+The tool can be used to migrate schema from an existing mysqldump file:
 
 ```sh
 harbourbridge schema -source=mysql < my_mysqldump_file
 ```
 
 This will generate a session file with `session.json` suffix. This file contains
 schema mapping from source to destination. You will need to specify this file
-during data migration. You can also specify a particular Spanner instance to use
+during data migration. You also need to specify a particular Spanner instance and database to use
 during data migration.
 
 For example, run
 
-```sh
-harbourbridge data -session=mydb.session.json -source=mysql -target-profile="instance=my-spanner-instance" < my_mysqldump_file
-```
-
-By default, HarbourBridge will generate a new Spanner database name to populate.
-You can override this and specify the database name to use by:
-
 ```sh
 harbourbridge data -session=mydb.session.json -source=mysql -target-profile="instance=my-spanner-instance,dbName=my-spanner-database-name" < my_mysqldump_file
 ```
@@ -71,7 +64,7 @@ In this case, HarbourBridge connects directly to the MySQL database to retrieve
 table schema and data. Set the `-source=mysql` and corresponding source profile
 connection parameters `host`, `port`, `user`, `dbName` and `password`.
 
-For example to perform schema conversion, run
+For example, to perform schema conversion, run
 
 ```sh
 harbourbridge schema -source=mysql -source-profile="host=<>,port=<>,user=<>,dbName=<>"

diff --git a/sources/oracle/README.md b/sources/oracle/README.md
@@ -23,7 +23,7 @@ retrieve table schema and data. Set the `-source=oracle` and corresponding
 source profile connection parameters `host`, `port`, `user`, `dbName` and
 `password`.
 
-For example to perform schema conversion, run
+For example, to perform schema conversion, run
 
 ```sh
 harbourbridge schema -source=oracle -source-profile="host=<>,port=<>,user=<>,dbName=<>,password=<>"

diff --git a/sources/postgres/README.md b/sources/postgres/README.md
@@ -26,18 +26,11 @@ You can use any of `postgresql`, `postgres`, or `pg` as the argument to the
 
 This will generate a session file with `session.json` suffix. This file contains
 schema mapping from source to destination. You will need to specify this file
-during data migration. You can also specify a particular Spanner instance to use
+during data migration. You also need to specify a particular Spanner instance and database to use
 during data migration.
 
 For example, run
 
-```sh
-harbourbridge data -session=mydb.session.json -source=pg -target-profile="instance=my-spanner-instance" < my_pg_dump_file
-```
-
-By default, HarbourBridge will generate a new Spanner database name to populate.
-You can override this and specify the database name to use by:
-
 ```sh
 harbourbridge data -session=mydb.session.json -source=pg -target-profile="instance=my-spanner-instance,dbName=my-spanner-database-name" < my_pg_dump_file
 ```
@@ -75,7 +68,7 @@ retrieve table schema and data. Set the `-source=postgres` and corresponding
 source profile connection parameters `host`, `port`, `user`, `dbName` and
 `password`.
 
-For example to perform schema conversion, run
+For example, to perform schema conversion, run
 
 ```sh
 harbourbridge schema -source=postgres -source-profile="host=<>,port=<>,user=<>,dbName=<>"

diff --git a/sources/sqlserver/README.md b/sources/sqlserver/README.md
@@ -23,7 +23,7 @@ retrieve table schema and data. Set the `-source=sqlserver` and corresponding
 source profile connection parameters `host`, `port`, `user`, `dbName` and
 `password`.
 
-For example to perform schema conversion, run
+For example, to perform schema conversion, run
 
 ```sh
 harbourbridge schema -source=sqlserver -source-profile="host=<>,port=<>,user=<>,dbName=<>"

diff --git a/webv2/README.md b/webv2/README.md
@@ -19,6 +19,12 @@ alias harbourbridge="go run github.com/cloudspannerecosystem/harbourbridge"
 HarbourBridge's Web API feature can be used with all the driver modes available,
 using mysql or postgres dump or direct connection.
 
+To generate the HarbourBridge binary, run:
+
+```sh
+make build
+```
+
 To start HarbourBridge web server, run:
 
 ```sh