-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CDAP-21144] Spanner Compression for String & bytes type fields #15910
base: develop
Are you sure you want to change the base?
Conversation
<name>data.storage.properties.gcp-spanner.compression.config</name> | ||
<value>application_specs:application_data:SNAPPY,provisioner_data:provisioner_task_info:SNAPPY,destination_fields_table.destination_data:SNAPPY,run_records.run_records_data:SNAPPY</value> | ||
<description> | ||
Represents table_name:column_name:compressor_type which possibly can cross spanner cell limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Support for chunk size should be added now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be done in follow up PR
...ge-ext-spanner/src/main/java/io/cdap/cdap/storage/spanner/compression/CompressionConfig.java
Outdated
Show resolved
Hide resolved
...ge-ext-spanner/src/main/java/io/cdap/cdap/storage/spanner/compression/CompressionConfig.java
Outdated
Show resolved
Hide resolved
/** | ||
* Interface for various implementations of compressors used for spanner. | ||
*/ | ||
public interface Compressor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should go in storage-spi?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now everything compression related, we want to limit to Spanner. We do not have use case for this for different implementations.
Also the compression config to be introduced later would have fields related to chunking etc - again spanner specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that but making it a spi will help anyone add it for any extensions in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any extension can implement their own compressor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me evaluate this with the chunking related changes - and move it to spi if there is nothing too specific.
For chunking, we might have to pass in some other args apart from just the value.
/** | ||
* Configuration class for compression settings. | ||
*/ | ||
public class CompressionConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar this should also go in storage-spi?
Every extension can have its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these compressor configs are limited to spanner right now.
...ge-ext-spanner/src/main/java/io/cdap/cdap/storage/spanner/compression/CompressionConfig.java
Outdated
Show resolved
Hide resolved
...age-ext-spanner/src/main/java/io/cdap/cdap/storage/spanner/compression/SnappyCompressor.java
Outdated
Show resolved
Hide resolved
cdap-storage-ext-spanner/src/main/java/io/cdap/cdap/storage/spanner/compression/Compressor.java
Show resolved
Hide resolved
...age-ext-spanner/src/main/java/io/cdap/cdap/storage/spanner/compression/SnappyCompressor.java
Show resolved
Hide resolved
...age-ext-spanner/src/main/java/io/cdap/cdap/storage/spanner/SpannerStructuredTableSchema.java
Outdated
Show resolved
Hide resolved
...age-ext-spanner/src/main/java/io/cdap/cdap/storage/spanner/SpannerStructuredTableSchema.java
Outdated
Show resolved
Hide resolved
...age-ext-spanner/src/main/java/io/cdap/cdap/storage/spanner/SpannerStructuredTableSchema.java
Show resolved
Hide resolved
...age-ext-spanner/src/main/java/io/cdap/cdap/storage/spanner/SpannerStructuredTableSchema.java
Outdated
Show resolved
Hide resolved
cdap-storage-ext-spanner/src/main/java/io/cdap/cdap/storage/spanner/SpannerStructuredRow.java
Outdated
Show resolved
Hide resolved
cdap-storage-ext-spanner/src/main/java/io/cdap/cdap/storage/spanner/SpannerStructuredRow.java
Outdated
Show resolved
Hide resolved
a47f30b
to
0270315
Compare
/** | ||
* A {@link Compressor} that uses Snappy for compression and decompression. | ||
*/ | ||
public class SnappyCompressor implements Compressor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add UTs for this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Snappy UTs have some issue when I run on local mac - It is a known issue & it needs some extra libraries
Will take this up in follow up
[FAILED_TO_LOAD_NATIVE_LIBRARY] no native library is found for os.name=Mac and os.arch=aarch64
...ge-ext-spanner/src/main/java/io/cdap/cdap/storage/spanner/compression/CompressorFactory.java
Outdated
Show resolved
Hide resolved
...ge-ext-spanner/src/main/java/io/cdap/cdap/storage/spanner/compression/CompressorFactory.java
Outdated
Show resolved
Hide resolved
...ge-ext-spanner/src/main/java/io/cdap/cdap/storage/spanner/compression/CompressorFactory.java
Outdated
Show resolved
Hide resolved
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also disallow any range queries on compressed columns.
.append(escapeName(compressedFieldName)) | ||
.append(" ") | ||
.append(getSpannerType(f.getType())); | ||
if (primaryKeys.contains(fieldName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will not be non-null as "no compression" value is null. I would probably disallow having primary keys compressed as there is a slight chance of collision between compressed and uncompressed value and making a two-column primary key index with nulls won't work well
@@ -388,6 +446,7 @@ private void executeDdlStatements(List<String> ddlStatements) throws IOException | |||
LOG.debug("No ddl statements to execute"); | |||
return; | |||
} | |||
LOG.info("DDL statements {}", ddlStatements); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why info? It's not that important to customer. If you want to see it in raw logs, make it debug
Columns to be compressed should be added to cConf
table_name:column_name:compressor_type:(TO BE USED IN FUTURE)number_of_chunks
All this would be part of the compressionConfig
High level overview of changes:
In case, we have any cx issue related to some new columns in future, we can append that column to the cConf property & thus enable compression.
Tested these changes using Distributed docker image
Unit tests to be added in follow up PR.
data:image/s3,"s3://crabby-images/24082/240820af28ae4f79014956af81a2fda9120c0731" alt="image"