Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added self documentation to the KSQL config public config variables. #422

Merged

Conversation

hjafarpour
Copy link
Contributor

No description provided.

@hjafarpour
Copy link
Contributor Author

retest this please

ConfigDef.Type.SHORT,
defaultSinkNumberOfReplications,
ConfigDef.Importance.MEDIUM,
"The default number of replications for the topics created by KSQL."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replications -> replicas

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

ConfigDef.Type.STRING,
KSQL_PERSISTENT_QUERY_NAME_PREFIX_DEFAULT,
ConfigDef.Importance.MEDIUM,
"Second part of the prefix for persitent queries.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a description of what the final query name would look like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an example.

ConfigDef.Type.STRING,
KSQL_TRANSIENT_QUERY_NAME_PREFIX_DEFAULT,
ConfigDef.Importance.MEDIUM,
"Second part of the prefix for transient queries.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an example.


public int defaultSinkNumberOfPartitions = 4;
public short defaultSinkNumberOfReplications = 1;
public static int defaultSinkNumberOfPartitions = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need to be public, i.e., they are set as the defaults anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily, changed them to private.

KSQL_TABLE_STATESTORE_NAME_SUFFIX_DEFAULT,
ConfigDef.Importance.MEDIUM,
"Suffix for state store names in Tables.")
.define(DEFAULT_SINK_NUMBER_OF_PARTITIONS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we have DEFAULT_SOME_PROPERTY and SOME_PROPERTY? We just define what the default value is here and then we can use SOME_PROPERTY everywhere. The ConfigDef takes care of providing the default value if none is supplied. So further down from line 141 etc you don't need to create a new copy of the Map and put the defaults in, they will already be there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to let user define the default for a set of statements. For instance if user want to have the default value of partitions to be 10 for all of his/her queries they can set the default and just write their queries instead of having to specify it in each of their queries. On the other hand, if user wants to have a different value for a specific query he/she can set the value in the specific query syntax and it will be applied to only that one. User can change the default to another value and that value will be used as default partition number for all the queries after that.

@miguno miguno changed the title Added self documentation to the KSQL config publis config variables. Added self documentation to the KSQL config public config variables. Oct 30, 2017
Copy link
Contributor

@dguy dguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hjafarpour in the KsqlConfig ctor you shouldn't need to create a map for the ksqlConfigProps and add the defaults. That will already be done for you by the ConfigDef. The only map we need to create is the one for ksqStreamConfigProps and for that you should still be able to pull the defaults out of the ConfigDef

@hjafarpour
Copy link
Contributor Author

@dguy the properties in ksqlConfigProps and ksqlStreamConfigProps are not final and can be changed so in addition to the ConfigDef I have these two maps so we can have the latest values.

@dguy
Copy link
Contributor

dguy commented Nov 1, 2017

@hjafarpour sure, but you can still populate the maps from the defaults provided by the config, i.e, you can use super.originals() to get a map that has the user provided configs and/or defaults, so you can remove most of the code that is populating the ksqlConfigProps with default values.

@hjafarpour
Copy link
Contributor Author

@dguy changed the init for the configs based on your feedback.


ksqlStreamConfigProps.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, defaultAutoOffsetRestConfig);
ksqlStreamConfigProps.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, defaultCommitIntervalMsConfig);
ksqlStreamConfigProps.put(
StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, defaultCacheMaxBytesBufferingConfig);
ksqlStreamConfigProps.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, defaultNumberOfStreamsThreads);

for (Map.Entry<?, ?> entry : props.entrySet()) {
for (Map.Entry<?, ?> entry : originals().entrySet()) {
final String key = entry.getKey().toString();
if (key.toLowerCase().startsWith(KSQL_CONFIG_PREPERTY_PREFIX)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we'd already have all the properties in the ksqlConfigProps by virtue of constructing the map from super.values(), so we could skip updating the ksqlConfigProps and only update ksqlStreamConfigProps if the key doesn't start with KSQL_CONFIG_PROPERTY_PREFIX

Copy link
Contributor

@dguy dguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one more comment, otherwise LGTM

@hjafarpour
Copy link
Contributor Author

@dguy made the change.

public static final String SINK_NUMBER_OF_REPLICATIONS = "REPLICATIONS";
public static final String SINK_NUMBER_OF_REPLICATIONS_PROPERTY = "ksql.sink.replications";
public static final String DEFAULT_SINK_NUMBER_OF_REPLICATIONS = "ksql.sink.replications.default";
public static final String SINK_NUMBER_OF_REPLICAS = "REPLICAS";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name and the value of this property don't seem to line up. What does the REPLICAS value mean exactly? Would it be more appropriate to have a numeric value? This applies to the SINK_NUMBER_OF_PARTITIONS property above as well. Also where is this SINK_NUMBER_OF_REPLICAS string used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not part of the config variables that can be set by the user via the KsqlConfig but they will be set in the query to specify the number of partitions, replications and the timestamp column for the result stream/table. This is just a constant to represent the name of the attributes.
Here is an example:
CREATE STREAM enrichedpv1 with (timestamp='pageid', partitions = 3) AS SELECT users.userid AS userid, pageid as pageid, region, gender FROM pageview LEFT JOIN users ON pageview.userid = users.userid;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Then why is it in KsqlConfig? There is also a SINK_NUMBER_OF_REPLICAS_PROPERTY which I assume does go in the ConfigDef? It would be better to have the literals that are in the query moved to a separate file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I'll move them to a new file, KsqlConstants.

ConfigDef.Type.STRING,
KSQL_PERSISTENT_QUERY_NAME_PREFIX_DEFAULT,
ConfigDef.Importance.MEDIUM,
"Second part of the prefix for persitent queries. For instance if the prefix is transient_"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean transient_ here? IT doesn't fit in with the rest of the description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the second part. The first part come from KSQL_SERVICE_ID_CONFIG(ksql.service.id)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I am still not sure I follow. I think it would make sense for the example to include the full chain, so that it is clearer to the user how it all comes together at each stage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more details with an example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc string here as written is:

Second part of the prefix for persitent queries. For instance if the prefix is transient_query_ the query name will be ksql_query_1.

Did you really mean for the example prefix to be 'transient_query_'?

ConfigDef.Type.LONG,
defaultSinkWindowChangeLogAdditionalRetention,
ConfigDef.Importance.MEDIUM,
"The default window change log additional retention time."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to drop the 'default' here. Also, it may make sense to expand the description. It doesn't make sense as is. Perhaps a sentence like 'The amount of time to retain window change logs'. Actually I am still not sure what that means Also, there should be a time unit in the name of the config and the description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, added the time unit and more details to the description.

"The default window change log additional retention time. This is a streams "
+ "config value which will be added to a windows maintainMs to ensure data is not "
+ "deleted from "
+ "the "
Copy link
Contributor

@apurvam apurvam Nov 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these new lines?

ie. why have this string over so many lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it, seems that my intelliJ formatted it that way automatically.

Copy link
Contributor

@apurvam apurvam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. It looks good to me, barring some doc strings which I think have some copy/paste errors.

ConfigDef.Type.STRING,
KSQL_PERSISTENT_QUERY_NAME_PREFIX_DEFAULT,
ConfigDef.Importance.MEDIUM,
"Second part of the prefix for persitent queries. For instance if the prefix is transient_"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc string here as written is:

Second part of the prefix for persitent queries. For instance if the prefix is transient_query_ the query name will be ksql_query_1.

Did you really mean for the example prefix to be 'transient_query_'?

ConfigDef.Type.STRING,
KSQL_TRANSIENT_QUERY_NAME_PREFIX_DEFAULT,
ConfigDef.Importance.MEDIUM,
"Second part of the prefix for transient queries. For instance if the prefix is "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the example. It makes sense now!

ConfigDef.Importance.MEDIUM,
"Suffix for state store names in Tables. For instance if the suffix is _ksql_statestore the state "
+ "store name would be ksql_query_1_ksql_statestore"
+ "_ksql_statestore ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you add this extra line by mistake?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch for 'transient_query_', it should be 'query_'. Fixed it.
IntelliJ added this, fixed it :)

Copy link
Contributor

@apurvam apurvam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Copy link
Contributor

@dguy dguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hjafarpour hjafarpour merged commit 6ae07fa into confluentinc:4.0.x Nov 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants