Adds sql creation scripts for postgres #1299

vishalkuo · 2018-09-19T05:00:16Z

Referencing #1298, this PR adds the SQL creation script for a postgres data store and a stripped down reset script. I've done my best to translate the MySQL <> Postgres types; however they may not be a perfect mapping and it's possible that there'll need to be further changes made but I figured it's a good starting point for now.

Note that I switched to camel casing since Postgres is case insensitive and would normalize everything to being lowercased which is hard to read without underscores.

codecov · 2018-09-19T05:27:53Z

Codecov Report

Merging #1299 into master will increase coverage by 0.31%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1299      +/-   ##
==========================================
+ Coverage   60.74%   61.06%   +0.31%     
==========================================
  Files         125      125              
  Lines       10072    10072              
==========================================
+ Hits         6118     6150      +32     
+ Misses       3259     3227      -32     
  Partials      695      695

Impacted Files	Coverage Δ
testonly/hammer/hammer.go	`67.06% <0%> (+6.42%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 08fd917...84cbfc3. Read the comment docs.

gdbelvin · 2018-09-20T11:05:55Z

storage/postgres/storage.sql

+-- are allowed they will all reference this row.
+CREATE TABLE IF NOT EXISTS leaf_data(
+  tree_id               BIGINT NOT NULL,
+  -- This is a personality specific has of some subset of the leaf data.


s/has/hash/

gdbelvin · 2018-09-20T11:07:33Z

storage/postgres/storage.sql

+  -- This is a personality specific has of some subset of the leaf data.
+  -- It's only purpose is to allow Trillian to identify duplicate entries in
+  -- the context of the personality.
+  leaf_identity_hash     BIT VARYING(255) NOT NULL,


should this be BYTEA or BYTEA(32)?

In other storage schemas, we've deliberately kept the field size overly large in order to accommodate > 256bit hashes for future expansion.

gdbelvin · 2018-09-20T11:09:35Z

storage/postgres/storage.sql

+  -- This is a personality specific has of some subset of the leaf data.
+  -- It's only purpose is to allow Trillian to identify duplicate entries in
+  -- the context of the personality.
+  leaf_identity_hash        BIT VARYING(255) NOT NULL,


Same BYTEA vs BIT question

gdbelvin

Thanks for taking a crack at adding Postgres support.
Some schema questions with respect to BYTEA vs. BIT.

Could you include some local integration test results?
I presume integration tests will follow in another PR, but some confirmation that the code base is happy with these types would be nice.

AlCutter · 2018-09-20T12:35:08Z

storage/postgres/storage.sql

+
+CREATE TABLE IF NOT EXISTS subtree(
+  tree_id               BIGINT NOT NULL,
+  subtree_id            BIT VARYING(255) NOT NULL,


Perhaps BYTEA here too (and consider size)

AlCutter · 2018-09-20T12:35:27Z

storage/postgres/storage.sql

+  tree_id                BIGINT NOT NULL,
+  tree_head_timestamp    BIGINT,
+  tree_size              BIGINT,
+  root_hash              BIT VARYING(255) NOT NULL,


Perhaps BYTEA here too (and consider size)

AlCutter · 2018-09-20T12:35:41Z

storage/postgres/storage.sql

+  tree_head_timestamp    BIGINT,
+  tree_size              BIGINT,
+  root_hash              BIT VARYING(255) NOT NULL,
+  root_signature         BIT VARYING(1024) NOT NULL,


Perhaps BYTEA here too (and consider size)

AlCutter · 2018-09-20T12:36:21Z

storage/postgres/storage.sql

+  leaf_value            BYTEA NOT NULL,
+  -- This is extra data that the application can associate with the leaf should it wish to.
+  -- This data is not included in signing and hashing.
+  extra_dta            BYTEA,


Consider extra_data to mirror naming elsewhere?

AlCutter · 2018-09-20T12:36:39Z

storage/postgres/storage.sql

+  leaf_identity_hash        BIT VARYING(255) NOT NULL,
+  -- This is a MerkleLeafHash as defined by the treehasher that the log uses. For example for
+  -- CT this hash will include the leaf prefix byte as well as the leaf data.
+  merkle_leaf_hash          BIT VARYING(255) NOT NULL,


Perhaps BYTEA here too (and consider size)

AlCutter · 2018-09-20T12:37:08Z

storage/postgres/storage.sql

+  leaf_identity_hash    BIT VARYING(255) NOT NULL,
+  -- This is a MerkleLeafHash as defined by the treehasher that the log uses. For example for
+  -- CT this hash will include the leaf prefix byte as well as the leaf data.
+  merkle_leaf_hash      BIT VARYING(255) NOT NULL,


Perhaps BYTEA here too (and consider size)

AlCutter · 2018-09-20T12:39:26Z

storage/postgres/storage.sql

+  -- This is a SHA256 hash of the TreeID, LeafIdentityHash and QueueTimestampNanos. It is used
+  -- for batched deletes from the table when trillian_log_server and trillian_log_signer are
+  -- built with the batched_queue tag.
+  queue_id              BIT VARYING(32) DEFAULT NULL UNIQUE,


Perhaps BYTEA here too (and consider size)

vishalkuo · 2018-09-20T17:34:45Z

@gdbelvin I've switched from BIT VARYING to BYTEA; however, it's worth noting that BYTEA doesn't allow one to specify length (although it has a limit of 1GB).

Regarding integration tests; are you referring to tests similar to those under the integration directory? If that's the case; it seems like I'll need to finish the rest of the pg implementation before being able to provide those. My plan was to use these types as a starting point and build log_storage.go and admin_storage.go, modifying these sql types as need be, until they reach a satisfactory state. Let me know if this isn't how you intended for this process to go and I can close this and come back with a full PR once I've got a locally working version.

gdbelvin · 2018-09-21T12:16:18Z

Smaller PRs are easier to review. Let's proceed with this and we can modify as needed when we get some unit tests working.

gdbelvin · 2018-09-21T12:20:06Z

Thanks for pointing out the lack of a size limit on BYTEA.
BYTEA is probably still more appropriate for hash outputs than BIT which is designed for bit masks.

gdbelvin

LGTM

gdbelvin · 2018-09-21T12:21:48Z

@AlCutter Would you mind merging this after taking a look?

AlCutter · 2018-09-21T14:15:10Z

Thanks for the updates, I think it's probably ok to have BYTEA with no limit, the on-disk size requirements are essentially the same anyway - len(DATA) + (1 or 4) bytes, and the personality layer can enforce sensible lower size limits for leaves.

AlCutter · 2018-09-21T14:25:09Z

storage/postgres/storage.sql

+  root_hash              BYTEA NOT NULL,
+  root_signature         BYTEA NOT NULL,
+  tree_revision          BIGINT,
+  PRIMARY KEY(tree_id, tree_head_timestamp),


I think it would be better to have the PK be tree_id, tree_revision, you could then drop the index below (I don't think anything needs this key to contain the timestamp - the cloudspanner implementation uses /treeID/tree_revision as it's PK on the equivalent table there BTW).

Separately, and I must admit I don't know much about PGSQL, but other storage impls define the PK here as tree_id, tree_rev DESC, because by far the most common operation on this table is a scan which wants to fetch the newest entry. It looks like PG doesn't support the DESC modifier on the PK fields, so I don't know what implications this will have for performance when you come to write the code itself.

Might be worth popping a comment or a TODO here as a note to yourself to look at this area again in detail once the code lands too.

vishalkuo · 2018-09-21T16:11:30Z

@AlCutter you're right, it seems like postgres doesn't support scan ordering for primary keys. That being said, the workaround seems to be to create a unique index as well, seen here. I left the index below the create table statement and added a desc modifier on tree_revision to scan the most recent changes first. I also added a TODO to benchmark this change once we have more code in place to make sure this change is acceptable. Let me know if that works

AlCutter

LGTM!

Adds sql creation scripts for postgres

555628d

googlebot added the cla: yes label Sep 19, 2018

gdbelvin reviewed Sep 20, 2018

View reviewed changes

gdbelvin suggested changes Sep 20, 2018

View reviewed changes

AlCutter reviewed Sep 20, 2018

View reviewed changes

Change sql types from bit varying to bytea

39e36d6

gdbelvin approved these changes Sep 21, 2018

View reviewed changes

AlCutter reviewed Sep 21, 2018

View reviewed changes

Change PK and add index order

84cbfc3

vishalkuo force-pushed the master branch from c27e1d5 to 84cbfc3 Compare September 21, 2018 16:08

AlCutter approved these changes Sep 21, 2018

View reviewed changes

AlCutter merged commit 40fbfc9 into google:master Sep 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds sql creation scripts for postgres #1299

Adds sql creation scripts for postgres #1299

vishalkuo commented Sep 19, 2018

codecov bot commented Sep 19, 2018 •

edited

Loading

gdbelvin Sep 20, 2018

gdbelvin Sep 20, 2018

AlCutter Sep 20, 2018

gdbelvin Sep 20, 2018

gdbelvin left a comment

AlCutter Sep 20, 2018

AlCutter Sep 20, 2018

AlCutter Sep 20, 2018

AlCutter Sep 20, 2018

AlCutter Sep 20, 2018

AlCutter Sep 20, 2018

AlCutter Sep 20, 2018

vishalkuo commented Sep 20, 2018

gdbelvin commented Sep 21, 2018

gdbelvin commented Sep 21, 2018

gdbelvin left a comment

gdbelvin commented Sep 21, 2018

AlCutter commented Sep 21, 2018

AlCutter Sep 21, 2018

vishalkuo commented Sep 21, 2018

AlCutter left a comment

Adds sql creation scripts for postgres #1299

Adds sql creation scripts for postgres #1299

Conversation

vishalkuo commented Sep 19, 2018

codecov bot commented Sep 19, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gdbelvin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vishalkuo commented Sep 20, 2018

gdbelvin commented Sep 21, 2018

gdbelvin commented Sep 21, 2018

gdbelvin left a comment

Choose a reason for hiding this comment

gdbelvin commented Sep 21, 2018

AlCutter commented Sep 21, 2018

Choose a reason for hiding this comment

vishalkuo commented Sep 21, 2018

AlCutter left a comment

Choose a reason for hiding this comment

codecov bot commented Sep 19, 2018 •

edited

Loading