Introduced BigDecimalConverter #4557

san81 · 2024-05-21T00:45:54Z

Introduced BigDecimal Converter that users can use as part of convert_entry_type processor that currently exists. Optionally, users can also specify required scaling needed while converting to BigDecimal. If scale value is given by the user then we apply HALF_EVEN rounding strategy if the original number has more decimal digits than the scale value given.

Description

Fix for the issue described at => #3840

IMPORTANT Note: This PR is disabling STRIP_TRAILING_BIGDECIMAL_ZEROES by default. In other words, we won't be calling bigDecimal.stripTrailingZeros() method while deserializing a BigDecimal value. A value of 1.00 won't be stripped down to 1.0 or 1 while deserializing. It will be deserialized into 1.00. It could impact all existing pipelines as well if they depend on BigDecimal datatype. This PR itself is introducing BigDecimal converter so the assumption is that no one is depending on the BigDecimal type so far. Even if they depend on BigDecimal and are ingesting this non-stripped version of a number into OpenSearch sink, it won't be an issue. This will become an issue only if they look for a specific number of decimal places in a numeric value or if they do String comparison on the numbers.

After this change merged, user will have an option to choose a BigDecimal converter with optional scale (positions after the decimal point) with HALF_EVEN rounding strategy. Check here for more details on this rounding strategy.
Example to use this new converter:

processor:
   - convert_entry_type:
       key: "column1"
       type: "big_decimal"
       scale: 5
   - convert_entry_type:
       key: "column2"
       type: "big_decimal"
       scale: 2

Note: Usage of scale attribute is optional. If specified, trailing zero's will be added to match the scale given like in the examples given.

Example-1:
1.703908412707e11 will be deserialized into 170390841270.70000 with scale value set to 5. If no scale value given for this case then it will be deserialized into 170390841270.7

Example-2:
If no scale value is given then a number like 17020622024e1 will be deserialized into 1.7020622024E+11. In case if user want to avoid scientific notation in the deserialized value then he need to provide a scale value that can cover up all the digits in the decimal places.

scale = 0 is also an acceptable value, which essentially like no scale given
Giving a negative for scale is also an acceptable value
If given, scale should always be a numeric value, Otherwise, data-prepper will fail to start

Issues Resolved

Resolves #[https://github.com//issues/3840]
Above issue will get resolved after this change merged in as it gives the user an option to convert the output to BigDecimal format with a specific scale mentioned. That should eliminate scientific notation values printed in the output stream causing the failure in the above issue.

Check List

New functionality includes testing.
New functionality has a documentation issue. Please link to it in this PR.
- New functionality has javadoc added
Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

kkondaka · 2024-05-21T07:03:48Z

Thanks for the PR @san81 .

Please add Test file for all src java file that you created or modified.

graytaylor0 · 2024-05-21T20:45:39Z

...ssors/src/main/java/org/opensearch/dataprepper/plugins/processor/mutateevent/TargetType.java

-import org.opensearch.dataprepper.typeconverter.DoubleConverter;
-import org.opensearch.dataprepper.typeconverter.BooleanConverter;
-import org.opensearch.dataprepper.typeconverter.LongConverter;
+import org.opensearch.dataprepper.typeconverter.*;


Will need to change this back to not using * imports. You can make this happen by default if you are using Intellij

Reverted this change now 👍 Also fixed in my Intelij to not do this going forward

…entry_type processor that currently exists. Optionally, users can also specify required scaling needed on the converted Signed-off-by: Santhosh Gandhe <gandheaz@amazon.com>

…per the review comment Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

…ientific-notation-fix

kkondaka · 2024-05-21T07:01:29Z

...org/opensearch/dataprepper/plugins/processor/mutateevent/ConvertEntryTypeProcessorTests.java

@@ -53,7 +58,7 @@ static Record<Event> buildRecordWithEvent(final Map<String, Object> data) {
    }

    @BeforeEach
-    private void setup() {
+    public void setup() {


Why this needs to be public?

A method annotated with @BeforeEach shouldn't be private. More details here => https://junit.org/junit5/docs/5.0.2/api/org/junit/jupiter/api/BeforeEach.html

You can make this package private and just remove public

…assing the scale while converting the instance only when the instance is BigDecimalConverter Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

graytaylor0 · 2024-05-28T20:15:44Z

...per-api/src/test/java/org/opensearch/dataprepper/typeconverter/BigDecimalConverterTests.java

+
+    private static Stream<Arguments> decimalToBigDecimalValueProvider() {
+        return Stream.of(
+            Arguments.of(new BigDecimal ("0.0"), 0, 1),


Can we add another test case that covers the conversion described in the issue here (#3840) where we can convert 1.70206220242E+12 to 1702062202420

Added additional scenario for this case

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

dlvenable

Thank you @san81 for this great contribution!

dlvenable · 2024-05-29T21:33:27Z

...rg/opensearch/dataprepper/plugins/processor/mutateevent/ConvertEntryTypeProcessorConfig.java

+    /**
+     * Optional scale value used only in the case of BigDecimal converter
+     */
+    @JsonProperty("scale")


Should we expose this? What would be the value of changing the scale to something else?

You could remove the field and keep the getScale() method such that we don't expose the config, but are flexible to use in the future.

As I listed in the PR description, I thought, we can let the user choose the scale he need per column like

- convert_entry_type: key: "column1" type: "bigdecimal" scale: 5

dlvenable · 2024-05-29T21:37:10Z

data-prepper-api/src/main/java/org/opensearch/dataprepper/model/event/DataType.java

+     *
+     * @since 2.8
+     */
+    BIGDECIMAL("bigdecimal"),


Should we just call this decimal?

If we keep the big, we should name the enum to BIG_DECIMAL. I also think that we should have an underscore in the type name to keep consistency: big_decimal. This would match field names in OpenSearch like geo_point, etc.

Kept the word BIG and renamed the references to BIG_DECIMAL. Both enum and data type name are now modified.

dlvenable · 2024-05-29T21:41:22Z

...java/org/opensearch/dataprepper/plugins/processor/mutateevent/ConvertEntryTypeProcessor.java

@@ -67,7 +71,11 @@ public Collection<Record<Event>> doExecute(final Collection<Record<Event>> recor
                    if (keyVal != null) {
                        if (!nullValues.contains(keyVal.toString())) {
                            try {
-                                recordEvent.put(key, converter.convert(keyVal));
+                                if(converter instanceof BigDecimalConverter) {


A better design might be to make an interface that allows for passing additional arguments.

interface ConverterArguments { int getScale(); }

You can implement this interface on the ConvertEntryTypeProcessorConfig class.

Then, you can simplify this:

recordEvent.put(key, converter.convert(keyVal, convertEntryTypeProcessorConfig));

You would need to add ConverterArguments as a argument to convert, but most implementations won't use it. (You could even make use of Java's default method to reduce changes, but that is not necessary).

No more conditionals. And future changes similar to this will not require conditions here either.

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

…o the converter and avoided conditional statement for calling converter methods Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

graytaylor0

Thanks for making this change!

…ion-fix

… the code Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

dlvenable

Thank you for this contribution!

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

dlvenable

Thank you for this contribution!

To your comment about trailing zeros, I think this should be fine. This is a new converter, so it won't impact existing pipelines.

san81 requested review from chenqi0805, engechas, graytaylor0, dinujoh, kkondaka, asifsmohammed, KarstenSchnitter, dlvenable and oeyh as code owners May 21, 2024 00:45

graytaylor0 reviewed May 21, 2024

View reviewed changes

Introduced BigDecimalConverter that users can use as part of convert_…

9a6a439

…entry_type processor that currently exists. Optionally, users can also specify required scaling needed on the converted Signed-off-by: Santhosh Gandhe <gandheaz@amazon.com>

san81 force-pushed the dynamodb-scientific-notation-fix branch from 9e571b9 to 9a6a439 Compare May 21, 2024 22:01

san81 added 4 commits May 22, 2024 10:33

Added Test case for the newly introduced class. Removed * imports as …

00f654a

…per the review comment Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

Avoiding using a deprecated method. Added additional test cases

a46bdbe

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

Additional tests to increase the coverage

81fed09

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

Merge branch 'main' of github.com:san81/data-prepper into dynamodb-sc…

5e8b811

…ientific-notation-fix

kkondaka previously approved these changes May 24, 2024

View reviewed changes

removed "scale" being the state of BigDecimal converter. We are now p…

e10a5c5

…assing the scale while converting the instance only when the instance is BigDecimalConverter Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

san81 dismissed kkondaka’s stale review via e10a5c5 May 24, 2024 22:47

san81 added 2 commits May 24, 2024 15:51

test case fix to be inline with the previous commit

a4d60b4

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

test case fix to be inline with the previous commit

45d1c0b

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

kkondaka previously approved these changes May 25, 2024

View reviewed changes

graytaylor0 reviewed May 28, 2024

View reviewed changes

addressing review comments

7ce8334

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

san81 dismissed kkondaka’s stale review via 7ce8334 May 28, 2024 22:10

dlvenable requested changes May 29, 2024

View reviewed changes

san81 added 2 commits May 29, 2024 17:47

renaming bigdecimal to big_decimal

761dd6f

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

Introduced ConverterArguments as a way to pass additional arguments t…

72e2eff

…o the converter and avoided conditional statement for calling converter methods Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

graytaylor0 previously approved these changes May 31, 2024

View reviewed changes

san81 added 2 commits May 31, 2024 12:31

Merge branch 'opensearch-project:main' into dynamodb-scientific-notat…

eab7236

…ion-fix

Added additional override convert method to reduce the changes across…

aed99af

… the code Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

san81 dismissed graytaylor0’s stale review via aed99af May 31, 2024 19:34

dlvenable previously approved these changes Jun 3, 2024

View reviewed changes

additional Test cases to increase the coverage

19d82b4

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

san81 dismissed dlvenable’s stale review via 19d82b4 June 3, 2024 21:41

added additional tests for converter cases

c399c06

Signed-off-by: Santhosh Gandhe <1909520+san81@users.noreply.github.com>

dlvenable approved these changes Jun 4, 2024

View reviewed changes

kkondaka approved these changes Jun 4, 2024

View reviewed changes

kkondaka merged commit 7d15115 into opensearch-project:main Jun 4, 2024
40 of 46 checks passed

san81 deleted the dynamodb-scientific-notation-fix branch June 4, 2024 20:50

san81 mentioned this pull request Jul 29, 2024

[BUG] DynamoDB source export converts Numbers ending in 0 to scientific notation #3840

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduced BigDecimalConverter #4557

Introduced BigDecimalConverter #4557

san81 commented May 21, 2024 •

edited

Loading

kkondaka commented May 21, 2024

graytaylor0 May 21, 2024

san81 May 22, 2024

kkondaka May 21, 2024

san81 May 24, 2024

graytaylor0 May 28, 2024

san81 May 28, 2024

graytaylor0 May 28, 2024 •

edited

Loading

san81 May 28, 2024

dlvenable left a comment

dlvenable May 29, 2024

san81 May 30, 2024

dlvenable May 29, 2024

san81 May 30, 2024

dlvenable May 29, 2024

graytaylor0 left a comment

dlvenable left a comment

dlvenable left a comment

Introduced BigDecimalConverter #4557

Introduced BigDecimalConverter #4557

Conversation

san81 commented May 21, 2024 • edited Loading

Description

Issues Resolved

Check List

kkondaka commented May 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

graytaylor0 May 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dlvenable left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

graytaylor0 left a comment

Choose a reason for hiding this comment

dlvenable left a comment

Choose a reason for hiding this comment

dlvenable left a comment

Choose a reason for hiding this comment

san81 commented May 21, 2024 •

edited

Loading

graytaylor0 May 28, 2024 •

edited

Loading