Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API does not appear to be applying LIDVID-sort by default #250

Open
alexdunnjpl opened this issue Feb 16, 2023 · 7 comments
Open

API does not appear to be applying LIDVID-sort by default #250

alexdunnjpl opened this issue Feb 16, 2023 · 7 comments
Labels
enhancement New feature or request icebox

Comments

@alexdunnjpl
Copy link
Contributor

alexdunnjpl commented Feb 16, 2023

Checked for duplicates

Yes - I've already checked

🐛 Describe the bug

When I used a test script to validate pagination correctness, results indicated that sorting did not appear to be correctly applied.

If true, this completely breaks ability to query across multiple pages of data as some LIDVIDs will be duplicated while others will be missing.

🕵️ Expected behavior

Given values of start and limit, I expect them to provide repeatable and consistent slices of data even when an explicit sort queryparam is not provided

📜 To Reproduce

TBD - need to fix #240 first and then validate that the test script isn't what's borked.

🖥 Environment Info

No response

📚 Version of Software Used

No response

🩺 Test Data / Additional context

No response

🦄 Related requirements

No response

⚙️ Engineering Details

No response

@alexdunnjpl
Copy link
Contributor Author

alexdunnjpl commented Feb 17, 2023

Have confirmed this behaviour with script.

Testing starts [0,3] with limits [1,3], the following hashed lidvids are obtained:

['030037', '      ', '      ', '      ', '      ', '      ']
['1d58f1', '030037', '      ', '      ', '      ', '      ']
['1d58f1', '86c734', '030037', '      ', '      ', '      ']
['      ', '1d58f1', '      ', '      ', '      ', '      ']
['      ', '1d58f1', '86c734', '      ', '      ', '      ']
['      ', '1d58f1', '86c734', '66fdfc', '      ', '      ']
['      ', '      ', '86c734', '      ', '      ', '      ']
['      ', '      ', '86c734', '66fdfc', '      ', '      ']
['      ', '      ', '96d8ce', '86c734', '66fdfc', '      ']
['      ', '      ', '      ', '66fdfc', '      ', '      ']
['      ', '      ', '      ', '96d8ce', '66fdfc', '      ']
['      ', '      ', '      ', '96d8ce', '756d24', '66fdfc']

@alexdunnjpl
Copy link
Contributor Author

More test data from start 0, limits [1,11]

96d8ce - urn:nasa:pds:mars2020_supercam:data_calibrated_audio:scam_0012_0668002890_477_ca1_scam15210_maaz_________________01p::1.1
a7492c - urn:nasa:pds:mars2020_supercam:data_calibrated_audio:scam_0037_0670224605_709_ca0_scam01037_hedgehog_____________04p::1.1
1d58f1 - urn:nasa:pds:mars2020_supercam:data_calibrated_audio:scam_0004_0667311609_463_ca2_scam15104_atm__________________00p::1.1
ab0c1e - urn:nasa:pds:mars2020_supercam:data_calibrated_audio:scam_0037_0670224376_725_ca0_scam01037_hedgehog_____________02p::1.1
0e6df8 - urn:nasa:pds:mars2020_supercam:data_calibrated_audio:scam_0037_0670224491_697_ca0_scam01037_hedgehog_____________03p::1.1
756d24 - urn:nasa:pds:mars2020_supercam:data_calibrated_audio:scam_0013_0668090218_211_ca0_scam15217_scct_titanium________33p::1.1
659b1c - urn:nasa:pds:mars2020_supercam:data_calibrated_audio:scam_0037_0670224261_699_ca0_scam01037_hedgehog_____________01p::1.1
86c734 - urn:nasa:pds:mars2020_supercam:data_calibrated_audio:scam_0011_0667916985_616_ca2_scam15204_atm__________________01p::1.1
66fdfc - urn:nasa:pds:mars2020_supercam:data_calibrated_audio:scam_0011_0667917054_338_ca2_scam15204_atm__________________01p::1.1
030037 - urn:nasa:pds:mars2020_supercam:data_calibrated_audio:scam_0001_0667036439_088_ca2_scam15104______________________00p::1.1

['030037', '      ', '      ', '      ', '      ', '      ', '      ', '      ', '      ', '      ']
['1d58f1', '030037', '      ', '      ', '      ', '      ', '      ', '      ', '      ', '      ']
['1d58f1', '86c734', '030037', '      ', '      ', '      ', '      ', '      ', '      ', '      ']
['1d58f1', '86c734', '66fdfc', '030037', '      ', '      ', '      ', '      ', '      ', '      ']
['96d8ce', '1d58f1', '86c734', '66fdfc', '030037', '      ', '      ', '      ', '      ', '      ']
['96d8ce', '1d58f1', '756d24', '86c734', '66fdfc', '030037', '      ', '      ', '      ', '      ']
['96d8ce', '1d58f1', '756d24', '659b1c', '86c734', '66fdfc', '030037', '      ', '      ', '      ']
['96d8ce', '1d58f1', 'ab0c1e', '756d24', '659b1c', '86c734', '66fdfc', '030037', '      ', '      ']
['96d8ce', '1d58f1', 'ab0c1e', '0e6df8', '756d24', '659b1c', '86c734', '66fdfc', '030037', '      ']
['96d8ce', 'a7492c', '1d58f1', 'ab0c1e', '0e6df8', '756d24', '659b1c', '86c734', '66fdfc', '030037']

@alexdunnjpl
Copy link
Contributor Author

Completed test script

import hashlib
import string
from typing import List, Union

import requests


def get_ids(start: int, limit: int, hash_ids=True) -> List[str]:
    collection_urn = 'urn:nasa:pds:mars2020_supercam:data_calibrated_audio'
    url = f'http://localhost:8080/collections/{collection_urn}::4.0/products?start={start}&limit={limit}'
    # print(f'Requesting start={start}, limit={limit}')
    response = requests.get(url)
    try:
        data = response.json()['data']
    except KeyError as err:
        print(f'Response for start={start}, limit={limit} did not contain a "data" attribute')
        return []

    ids = [hashlib.md5(product['id'].encode()).hexdigest()[:6] if hash_ids else product['id'] for product in data]  # for ease of visual comparison
    return ids


max_start = 0
min_limit = 1
max_limit = 10
result_end_idx = max_start + max_limit

print(f'Hits 0 through {result_end_idx - 1} are as follows:')
ids = get_ids(0, result_end_idx - 1, hash_ids=False)
hashes = get_ids(0, result_end_idx)
pairs = zip(hashes, ids)
[print(f'{x[0]} - {x[1]}') for x in pairs]

results: List[List[Union[str, None]]] = []
for start in range(0, max_start + 1):
    for limit in range(min_limit, max_limit + 1):
        left_pad = [None] * start
        ids = get_ids(start, limit)
        padded_ids = (left_pad + ids + [None] * result_end_idx)[:result_end_idx]
        results.append(padded_ids)

for result_set in results:
    print(['      ' if val is None else val for val in result_set])

for idx in range(0, result_end_idx):
    values_at_idx = set()
    for result in results:
        value_at_idx = result[idx]
        if value_at_idx is not None:
            values_at_idx.add(value_at_idx)

    assert len(values_at_idx) == 1

@tloubrieu-jpl
Copy link
Member

@alexdunnjpl this is not really a bug but a missing feature from the specification.

@alexdunnjpl
Copy link
Contributor Author

alexdunnjpl commented Feb 22, 2023

@tloubrieu-jpl is it not a bug in the context of registry-api offering pagination, and that pagination (appeaing to be) providing incorrect results (missing/duplicated data) because of this issue?

@alexdunnjpl
Copy link
Contributor Author

alexdunnjpl commented Feb 22, 2023

Nevermind - testing with the following script, I wasn't actually able to produce duplicated/missing data, so while the results above are confusing, pagination is not broken

page_size = 50
pages_to_fetch = 50
expected_lidvids_count = page_size*pages_to_fetch
results = []

for page_idx in range(0, pages_to_fetch):
    start = page_size*page_idx
    limit = page_size
    ids = get_ids(start, limit, hash_ids=False)
    results.extend(ids)

unique_results = set(results)
print(f'Fetched {len(results)} results, with {len(unique_results)} unique, vs. {expected_lidvids_count} expected')

@alexdunnjpl alexdunnjpl added enhancement New feature or request and removed bug Something isn't working s.high High severity labels Feb 22, 2023
@alexdunnjpl alexdunnjpl removed their assignment Feb 22, 2023
@tloubrieu-jpl
Copy link
Member

I will put this ticket in the icebox.

@github-project-automation github-project-automation bot moved this to Release Backlog in EN Portfolio Backlog Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request icebox
Projects
Status: ToDo
Development

No branches or pull requests

3 participants