Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Merge Category Option Combo [DHIS2-18321] #19488

Merged
merged 64 commits into from
Jan 15, 2025
Merged

Conversation

david-mackessy
Copy link
Contributor

@david-mackessy david-mackessy commented Dec 16, 2024

Feature

Adds a new feature, which allows the merging of CategoryOptionCombos.
More details in the Jira issue with full acceptance criteria.
Docs link below also details everything involved in this merge.

Extracted the handling of merging DataValues into a common handler class as this is used by multiple merge types (DataElement & CategoryOptionCombo).

Testing

Automated

Tests added for

  • new store methods
  • integration tests for each merging/handling of metadata/data type
  • E2E tests

Performance

Some performance testing was carried out for types that may have thousands/millions of entries [dataValue, event].
Tests were carried out comparing Java v SQL versions of data merging for [dataValue, event].

Quick summary (results and data further below):

  • Java with 776k DataValues -> completed in 3m 40s
  • Java with 5 million DataValues & 365k Events produced OOM Error
  • SQL with DataValue merging, 5 million DataValues & 365k Events -> completed in 4m 14s
  • SQL with DataValue & Event merging, 5 million DataValues & 365k Events -> completed in 2m 13s
  • SQL with DataValue & Event deleting, 5 million DataValues & 365k Events -> completed in 1m 17s

DB: SL DB
machine: macbook pro - m2 pro - 16GB
data:

coc id data count
358964 data value 288,922
358966 data value 245,269
358963 data value 242,002
4 data value 4,941,155
4 event 363,585
4 comp dataset reg 162,449
424067 event 675
424068 event 639

SCENARIO 1

{
    "sources": [
        "S34ULMcHMca",
        "wHBMVthqIX4"
    ],
    "target": "jOkIbJVhECg",
    "deleteSources": true,
    "dataMergeStrategy": "LAST_UPDATED"
}

Java: 3m 40s
SQL (data value): 13s

SCENARIO 2

{
    "sources": [
        "HllvX50cXC0",
        "iR1xska5MM2"
    ],
    "target": "j9PqKr6Djnx",
    "deleteSources": true,
    "dataMergeStrategy": "LAST_UPDATED"
}

Java: OOM error
SQL (data value): 4m 14s

SCENARIO 3

{
    "sources": [
        "HllvX50cXC0",
        "iR1xska5MM2"
    ],
    "target": "j9PqKr6Djnx",
    "deleteSources": true,
    "dataMergeStrategy": "LAST_UPDATED"
}

SQL (data value & event): 2m 13s

SCENARIO 4

{
    "sources": [
        "HllvX50cXC0",
        "iR1xska5MM2"
    ],
    "target": "j9PqKr6Djnx",
    "deleteSources": true,
    "dataMergeStrategy": "DISCARD"
}

SQL (data value & event): 1m 17s

Docs

PR - dhis2/dhis2-docs#1466

}

@Override
public int hardDeleteEvents(List<String> eventsToDelete, String eventSelect, String eventDelete) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ameenhere this method has been extracted from the existing code to allow reuse of hard deleting Events. This ensures that all Event references are handled consistently & appropriately.

*/
@Slf4j
@Component
@RequiredArgsConstructor
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class will be removed in a later PR as we want to merge data values using SQL, not Java. Data Element merge will be updated next, which will remove this code.

import org.hisp.dhis.common.BaseIdentifiableObject;
import org.hisp.dhis.dataelement.DataElement;

public final class DataValueUtil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class will be removed when the Data Element merge code is updated. It will no longer be required.

"""
do
$$
declare
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2 data value SQL scripts are basically the same, but they are tailored for the correct data value field (category option combo & attribute option combo).
It felt wrong to reuse the same script with parameters as the script would then become unreadable (~8 parameters), requiring to scroll to the bottom of the script to see what parameter would go where, losing the ability to read through the script.

@david-mackessy david-mackessy merged commit 5e883da into master Jan 15, 2025
17 checks passed
@david-mackessy david-mackessy deleted the DHIS2-18321 branch January 15, 2025 06:42
Copy link
Contributor

@jbee jbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

PR is already merged but I had some comments already made from a review in progress. Nothing of significance but maybe you can work something in a future merge PR.

import org.hisp.dhis.category.CategoryOption;
import org.hisp.dhis.category.CategoryOptionCombo;

public record TestCategoryMetadata(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the record but wonder why you opted to not use List<CategoryCombo> instead of cc1+cc2 and so on. That seems more flexible, future proof and maybe easier to read?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just went for direct access to a few types as a start. How I would use it v how others would, might be different. It's not just for CategoryCombos. It's primarily to use CategoryOptionCombos. And this makes it easy to use a properly created CategoryOptionCombo. Maybe we can chat about it at the next meeting as well.

* @param cocs {@link CategoryOptionCombo}s to update
* @param coc {@link CategoryOptionCombo} to use as the new value
*/
void setAttributeOptionCombo(Set<Long> cocs, long coc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK much about events but why are we using long IDs here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the DB type is bigint which maps to Java long.
This update is done in native SQL, so no COC object is used here. The foreign key ID is used directly.

@@ -88,4 +90,18 @@ where jsonb_exists_any(e.eventdatavalues, :searchStrings)
"searchStrings", searchStrings.toArray(String[]::new), StringArrayType.INSTANCE)
.getResultList();
}

@Override
public void setAttributeOptionCombo(Set<Long> cocs, long coc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should stick to update as the verb for this type of thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, can update that.

}

private void assertWithNewValue(
@Nonnull DataValue dv, @Nonnull String newValue, @Nonnull String property) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like instead of property this should have been 3 different assert methods if they do a if-else internally anyhow. The shared part then would be another assert method they can call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these will be deleted when the data element merge is updated.


@Test
@DisplayName("Retrieving CategoryCombos by CategoryOptionCombos returns the expected entries")
void getCatOptionComboTest() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that setup looks very familiar - isn't that what you did in a helper utility method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I need to find places where similar things are done & use the new test method.


@Test
@DisplayName("Retrieving CategoryOptions by CategoryOptionCombos returns the expected entries")
void getCatOptionComboTest() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, looks familiar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants