Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduced BigDecimalConverter #4557

Merged

Conversation

san81
Copy link
Collaborator

@san81 san81 commented May 21, 2024

Introduced BigDecimal Converter that users can use as part of convert_entry_type processor that currently exists. Optionally, users can also specify required scaling needed while converting to BigDecimal. If scale value is given by the user then we apply HALF_EVEN rounding strategy if the original number has more decimal digits than the scale value given.

Description

Fix for the issue described at => #3840

IMPORTANT Note: This PR is disabling STRIP_TRAILING_BIGDECIMAL_ZEROES by default. In other words, we won't be calling bigDecimal.stripTrailingZeros() method while deserializing a BigDecimal value. A value of 1.00 won't be stripped down to 1.0 or 1 while deserializing. It will be deserialized into 1.00. It could impact all existing pipelines as well if they depend on BigDecimal datatype. This PR itself is introducing BigDecimal converter so the assumption is that no one is depending on the BigDecimal type so far. Even if they depend on BigDecimal and are ingesting this non-stripped version of a number into OpenSearch sink, it won't be an issue. This will become an issue only if they look for a specific number of decimal places in a numeric value or if they do String comparison on the numbers.

After this change merged, user will have an option to choose a BigDecimal converter with optional scale (positions after the decimal point) with HALF_EVEN rounding strategy. Check here for more details on this rounding strategy.
Example to use this new converter:

processor:
   - convert_entry_type:
       key: "column1"
       type: "big_decimal"
       scale: 5
   - convert_entry_type:
       key: "column2"
       type: "big_decimal"
       scale: 2

Note: Usage of scale attribute is optional. If specified, trailing zero's will be added to match the scale given like in the examples given.

Example-1:
1.703908412707e11 will be deserialized into 170390841270.70000 with scale value set to 5. If no scale value given for this case then it will be deserialized into 170390841270.7

Example-2:
If no scale value is given then a number like 17020622024e1 will be deserialized into 1.7020622024E+11. In case if user want to avoid scientific notation in the deserialized value then he need to provide a scale value that can cover up all the digits in the decimal places.

scale = 0 is also an acceptable value, which essentially like no scale given
Giving a negative for scale is also an acceptable value
If given, scale should always be a numeric value, Otherwise, data-prepper will fail to start

Issues Resolved

Resolves #[https://github.com//issues/3840]
Above issue will get resolved after this change merged in as it gives the user an option to convert the output to BigDecimal format with a specific scale mentioned. That should eliminate scientific notation values printed in the output stream causing the failure in the above issue.

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@kkondaka
Copy link
Collaborator

Thanks for the PR @san81 .

Please add Test file for all src java file that you created or modified.

import org.opensearch.dataprepper.typeconverter.DoubleConverter;
import org.opensearch.dataprepper.typeconverter.BooleanConverter;
import org.opensearch.dataprepper.typeconverter.LongConverter;
import org.opensearch.dataprepper.typeconverter.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need to change this back to not using * imports. You can make this happen by default if you are using Intellij

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted this change now 👍 Also fixed in my Intelij to not do this going forward

…entry_type processor that currently exists. Optionally, users can also specify required scaling needed on the converted

Signed-off-by: Santhosh Gandhe <gandheaz@amazon.com>
@san81 san81 force-pushed the dynamodb-scientific-notation-fix branch from 9e571b9 to 9a6a439 Compare May 21, 2024 22:01
san81 added 4 commits May 22, 2024 10:33
…per the review comment

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>
Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>
Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>
kkondaka
kkondaka previously approved these changes May 24, 2024
@@ -53,7 +58,7 @@ static Record<Event> buildRecordWithEvent(final Map<String, Object> data) {
}

@BeforeEach
private void setup() {
public void setup() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this needs to be public?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A method annotated with @BeforeEach shouldn't be private. More details here => https://junit.org/junit5/docs/5.0.2/api/org/junit/jupiter/api/BeforeEach.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make this package private and just remove public

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

…assing the scale while converting the instance only when the instance is BigDecimalConverter

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>
san81 added 2 commits May 24, 2024 15:51
Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>
Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>
kkondaka
kkondaka previously approved these changes May 25, 2024

private static Stream<Arguments> decimalToBigDecimalValueProvider() {
return Stream.of(
Arguments.of(new BigDecimal ("0.0"), 0, 1),
Copy link
Member

@graytaylor0 graytaylor0 May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add another test case that covers the conversion described in the issue here (#3840) where we can convert 1.70206220242E+12 to 1702062202420

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added additional scenario for this case

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>
Copy link
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @san81 for this great contribution!

/**
* Optional scale value used only in the case of BigDecimal converter
*/
@JsonProperty("scale")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we expose this? What would be the value of changing the scale to something else?

You could remove the field and keep the getScale() method such that we don't expose the config, but are flexible to use in the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I listed in the PR description, I thought, we can let the user choose the scale he need per column like

   - convert_entry_type:
       key: "column1"
       type: "bigdecimal"
       scale: 5

*
* @since 2.8
*/
BIGDECIMAL("bigdecimal"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just call this decimal?

If we keep the big, we should name the enum to BIG_DECIMAL. I also think that we should have an underscore in the type name to keep consistency: big_decimal. This would match field names in OpenSearch like geo_point, etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept the word BIG and renamed the references to BIG_DECIMAL. Both enum and data type name are now modified.

@@ -67,7 +71,11 @@ public Collection<Record<Event>> doExecute(final Collection<Record<Event>> recor
if (keyVal != null) {
if (!nullValues.contains(keyVal.toString())) {
try {
recordEvent.put(key, converter.convert(keyVal));
if(converter instanceof BigDecimalConverter) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better design might be to make an interface that allows for passing additional arguments.

interface ConverterArguments {
  int getScale();
}

You can implement this interface on the ConvertEntryTypeProcessorConfig class.

Then, you can simplify this:

recordEvent.put(key, converter.convert(keyVal, convertEntryTypeProcessorConfig));

You would need to add ConverterArguments as a argument to convert, but most implementations won't use it. (You could even make use of Java's default method to reduce changes, but that is not necessary).

No more conditionals. And future changes similar to this will not require conditions here either.

san81 added 2 commits May 29, 2024 17:47
Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>
…o the converter and avoided conditional statement for calling converter methods

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>
graytaylor0
graytaylor0 previously approved these changes May 31, 2024
Copy link
Member

@graytaylor0 graytaylor0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this change!

san81 added 2 commits May 31, 2024 12:31
… the code

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>
dlvenable
dlvenable previously approved these changes Jun 3, 2024
Copy link
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution!

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>
Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>
Copy link
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution!

To your comment about trailing zeros, I think this should be fine. This is a new converter, so it won't impact existing pipelines.

@kkondaka kkondaka merged commit 7d15115 into opensearch-project:main Jun 4, 2024
40 of 46 checks passed
@san81 san81 deleted the dynamodb-scientific-notation-fix branch June 4, 2024 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants