Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds sql creation scripts for postgres #1299

Merged
merged 3 commits into from
Sep 24, 2018
Merged

Conversation

vishalkuo
Copy link
Contributor

Referencing #1298, this PR adds the SQL creation script for a postgres data store and a stripped down reset script. I've done my best to translate the MySQL <> Postgres types; however they may not be a perfect mapping and it's possible that there'll need to be further changes made but I figured it's a good starting point for now.

Note that I switched to camel casing since Postgres is case insensitive and would normalize everything to being lowercased which is hard to read without underscores.

@codecov
Copy link

codecov bot commented Sep 19, 2018

Codecov Report

Merging #1299 into master will increase coverage by 0.31%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1299      +/-   ##
==========================================
+ Coverage   60.74%   61.06%   +0.31%     
==========================================
  Files         125      125              
  Lines       10072    10072              
==========================================
+ Hits         6118     6150      +32     
+ Misses       3259     3227      -32     
  Partials      695      695
Impacted Files Coverage Δ
testonly/hammer/hammer.go 67.06% <0%> (+6.42%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 08fd917...84cbfc3. Read the comment docs.

-- are allowed they will all reference this row.
CREATE TABLE IF NOT EXISTS leaf_data(
tree_id BIGINT NOT NULL,
-- This is a personality specific has of some subset of the leaf data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/has/hash/

-- This is a personality specific has of some subset of the leaf data.
-- It's only purpose is to allow Trillian to identify duplicate entries in
-- the context of the personality.
leaf_identity_hash BIT VARYING(255) NOT NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be BYTEA or BYTEA(32)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other storage schemas, we've deliberately kept the field size overly large in order to accommodate > 256bit hashes for future expansion.

-- This is a personality specific has of some subset of the leaf data.
-- It's only purpose is to allow Trillian to identify duplicate entries in
-- the context of the personality.
leaf_identity_hash BIT VARYING(255) NOT NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same BYTEA vs BIT question

Copy link
Contributor

@gdbelvin gdbelvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a crack at adding Postgres support.
Some schema questions with respect to BYTEA vs. BIT.

Could you include some local integration test results?
I presume integration tests will follow in another PR, but some confirmation that the code base is happy with these types would be nice.


CREATE TABLE IF NOT EXISTS subtree(
tree_id BIGINT NOT NULL,
subtree_id BIT VARYING(255) NOT NULL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps BYTEA here too (and consider size)

tree_id BIGINT NOT NULL,
tree_head_timestamp BIGINT,
tree_size BIGINT,
root_hash BIT VARYING(255) NOT NULL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps BYTEA here too (and consider size)

tree_head_timestamp BIGINT,
tree_size BIGINT,
root_hash BIT VARYING(255) NOT NULL,
root_signature BIT VARYING(1024) NOT NULL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps BYTEA here too (and consider size)

leaf_value BYTEA NOT NULL,
-- This is extra data that the application can associate with the leaf should it wish to.
-- This data is not included in signing and hashing.
extra_dta BYTEA,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider extra_data to mirror naming elsewhere?

leaf_identity_hash BIT VARYING(255) NOT NULL,
-- This is a MerkleLeafHash as defined by the treehasher that the log uses. For example for
-- CT this hash will include the leaf prefix byte as well as the leaf data.
merkle_leaf_hash BIT VARYING(255) NOT NULL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps BYTEA here too (and consider size)

leaf_identity_hash BIT VARYING(255) NOT NULL,
-- This is a MerkleLeafHash as defined by the treehasher that the log uses. For example for
-- CT this hash will include the leaf prefix byte as well as the leaf data.
merkle_leaf_hash BIT VARYING(255) NOT NULL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps BYTEA here too (and consider size)

-- This is a SHA256 hash of the TreeID, LeafIdentityHash and QueueTimestampNanos. It is used
-- for batched deletes from the table when trillian_log_server and trillian_log_signer are
-- built with the batched_queue tag.
queue_id BIT VARYING(32) DEFAULT NULL UNIQUE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps BYTEA here too (and consider size)

@vishalkuo
Copy link
Contributor Author

@gdbelvin I've switched from BIT VARYING to BYTEA; however, it's worth noting that BYTEA doesn't allow one to specify length (although it has a limit of 1GB).

Regarding integration tests; are you referring to tests similar to those under the integration directory? If that's the case; it seems like I'll need to finish the rest of the pg implementation before being able to provide those. My plan was to use these types as a starting point and build log_storage.go and admin_storage.go, modifying these sql types as need be, until they reach a satisfactory state. Let me know if this isn't how you intended for this process to go and I can close this and come back with a full PR once I've got a locally working version.

@gdbelvin
Copy link
Contributor

Smaller PRs are easier to review. Let's proceed with this and we can modify as needed when we get some unit tests working.

@gdbelvin
Copy link
Contributor

Thanks for pointing out the lack of a size limit on BYTEA.
BYTEA is probably still more appropriate for hash outputs than BIT which is designed for bit masks.

Copy link
Contributor

@gdbelvin gdbelvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gdbelvin
Copy link
Contributor

@AlCutter Would you mind merging this after taking a look?

@AlCutter
Copy link
Member

Thanks for the updates, I think it's probably ok to have BYTEA with no limit, the on-disk size requirements are essentially the same anyway - len(DATA) + (1 or 4) bytes, and the personality layer can enforce sensible lower size limits for leaves.

root_hash BYTEA NOT NULL,
root_signature BYTEA NOT NULL,
tree_revision BIGINT,
PRIMARY KEY(tree_id, tree_head_timestamp),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to have the PK be tree_id, tree_revision, you could then drop the index below (I don't think anything needs this key to contain the timestamp - the cloudspanner implementation uses /treeID/tree_revision as it's PK on the equivalent table there BTW).

Separately, and I must admit I don't know much about PGSQL, but other storage impls define the PK here as tree_id, tree_rev DESC, because by far the most common operation on this table is a scan which wants to fetch the newest entry. It looks like PG doesn't support the DESC modifier on the PK fields, so I don't know what implications this will have for performance when you come to write the code itself.

Might be worth popping a comment or a TODO here as a note to yourself to look at this area again in detail once the code lands too.

@vishalkuo
Copy link
Contributor Author

@AlCutter you're right, it seems like postgres doesn't support scan ordering for primary keys. That being said, the workaround seems to be to create a unique index as well, seen here. I left the index below the create table statement and added a desc modifier on tree_revision to scan the most recent changes first. I also added a TODO to benchmark this change once we have more code in place to make sure this change is acceptable. Let me know if that works

Copy link
Member

@AlCutter AlCutter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@AlCutter AlCutter merged commit 40fbfc9 into google:master Sep 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants