API does not appear to be applying LIDVID-sort by default #250

alexdunnjpl · 2023-02-16T19:19:04Z

Checked for duplicates

Yes - I've already checked

🐛 Describe the bug

When I used a test script to validate pagination correctness, results indicated that sorting did not appear to be correctly applied.

If true, this completely breaks ability to query across multiple pages of data as some LIDVIDs will be duplicated while others will be missing.

🕵️ Expected behavior

Given values of start and limit, I expect them to provide repeatable and consistent slices of data even when an explicit sort queryparam is not provided

📜 To Reproduce

TBD - need to fix #240 first and then validate that the test script isn't what's borked.

🖥 Environment Info

No response

📚 Version of Software Used

No response

🩺 Test Data / Additional context

No response

🦄 Related requirements

No response

⚙️ Engineering Details

No response

The text was updated successfully, but these errors were encountered:

alexdunnjpl · 2023-02-17T02:09:17Z

Have confirmed this behaviour with script.

Testing starts [0,3] with limits [1,3], the following hashed lidvids are obtained:

['030037', '      ', '      ', '      ', '      ', '      ']
['1d58f1', '030037', '      ', '      ', '      ', '      ']
['1d58f1', '86c734', '030037', '      ', '      ', '      ']
['      ', '1d58f1', '      ', '      ', '      ', '      ']
['      ', '1d58f1', '86c734', '      ', '      ', '      ']
['      ', '1d58f1', '86c734', '66fdfc', '      ', '      ']
['      ', '      ', '86c734', '      ', '      ', '      ']
['      ', '      ', '86c734', '66fdfc', '      ', '      ']
['      ', '      ', '96d8ce', '86c734', '66fdfc', '      ']
['      ', '      ', '      ', '66fdfc', '      ', '      ']
['      ', '      ', '      ', '96d8ce', '66fdfc', '      ']
['      ', '      ', '      ', '96d8ce', '756d24', '66fdfc']

alexdunnjpl · 2023-02-17T02:32:04Z

More test data from start 0, limits [1,11]

96d8ce - urn:nasa:pds:mars2020_supercam:data_calibrated_audio:scam_0012_0668002890_477_ca1_scam15210_maaz_________________01p::1.1
a7492c - urn:nasa:pds:mars2020_supercam:data_calibrated_audio:scam_0037_0670224605_709_ca0_scam01037_hedgehog_____________04p::1.1
1d58f1 - urn:nasa:pds:mars2020_supercam:data_calibrated_audio:scam_0004_0667311609_463_ca2_scam15104_atm__________________00p::1.1
ab0c1e - urn:nasa:pds:mars2020_supercam:data_calibrated_audio:scam_0037_0670224376_725_ca0_scam01037_hedgehog_____________02p::1.1
0e6df8 - urn:nasa:pds:mars2020_supercam:data_calibrated_audio:scam_0037_0670224491_697_ca0_scam01037_hedgehog_____________03p::1.1
756d24 - urn:nasa:pds:mars2020_supercam:data_calibrated_audio:scam_0013_0668090218_211_ca0_scam15217_scct_titanium________33p::1.1
659b1c - urn:nasa:pds:mars2020_supercam:data_calibrated_audio:scam_0037_0670224261_699_ca0_scam01037_hedgehog_____________01p::1.1
86c734 - urn:nasa:pds:mars2020_supercam:data_calibrated_audio:scam_0011_0667916985_616_ca2_scam15204_atm__________________01p::1.1
66fdfc - urn:nasa:pds:mars2020_supercam:data_calibrated_audio:scam_0011_0667917054_338_ca2_scam15204_atm__________________01p::1.1
030037 - urn:nasa:pds:mars2020_supercam:data_calibrated_audio:scam_0001_0667036439_088_ca2_scam15104______________________00p::1.1

['030037', '      ', '      ', '      ', '      ', '      ', '      ', '      ', '      ', '      ']
['1d58f1', '030037', '      ', '      ', '      ', '      ', '      ', '      ', '      ', '      ']
['1d58f1', '86c734', '030037', '      ', '      ', '      ', '      ', '      ', '      ', '      ']
['1d58f1', '86c734', '66fdfc', '030037', '      ', '      ', '      ', '      ', '      ', '      ']
['96d8ce', '1d58f1', '86c734', '66fdfc', '030037', '      ', '      ', '      ', '      ', '      ']
['96d8ce', '1d58f1', '756d24', '86c734', '66fdfc', '030037', '      ', '      ', '      ', '      ']
['96d8ce', '1d58f1', '756d24', '659b1c', '86c734', '66fdfc', '030037', '      ', '      ', '      ']
['96d8ce', '1d58f1', 'ab0c1e', '756d24', '659b1c', '86c734', '66fdfc', '030037', '      ', '      ']
['96d8ce', '1d58f1', 'ab0c1e', '0e6df8', '756d24', '659b1c', '86c734', '66fdfc', '030037', '      ']
['96d8ce', 'a7492c', '1d58f1', 'ab0c1e', '0e6df8', '756d24', '659b1c', '86c734', '66fdfc', '030037']

alexdunnjpl · 2023-02-17T02:33:14Z

Completed test script

import hashlib
import string
from typing import List, Union

import requests


def get_ids(start: int, limit: int, hash_ids=True) -> List[str]:
    collection_urn = 'urn:nasa:pds:mars2020_supercam:data_calibrated_audio'
    url = f'http://localhost:8080/collections/{collection_urn}::4.0/products?start={start}&limit={limit}'
    # print(f'Requesting start={start}, limit={limit}')
    response = requests.get(url)
    try:
        data = response.json()['data']
    except KeyError as err:
        print(f'Response for start={start}, limit={limit} did not contain a "data" attribute')
        return []

    ids = [hashlib.md5(product['id'].encode()).hexdigest()[:6] if hash_ids else product['id'] for product in data]  # for ease of visual comparison
    return ids


max_start = 0
min_limit = 1
max_limit = 10
result_end_idx = max_start + max_limit

print(f'Hits 0 through {result_end_idx - 1} are as follows:')
ids = get_ids(0, result_end_idx - 1, hash_ids=False)
hashes = get_ids(0, result_end_idx)
pairs = zip(hashes, ids)
[print(f'{x[0]} - {x[1]}') for x in pairs]

results: List[List[Union[str, None]]] = []
for start in range(0, max_start + 1):
    for limit in range(min_limit, max_limit + 1):
        left_pad = [None] * start
        ids = get_ids(start, limit)
        padded_ids = (left_pad + ids + [None] * result_end_idx)[:result_end_idx]
        results.append(padded_ids)

for result_set in results:
    print(['      ' if val is None else val for val in result_set])

for idx in range(0, result_end_idx):
    values_at_idx = set()
    for result in results:
        value_at_idx = result[idx]
        if value_at_idx is not None:
            values_at_idx.add(value_at_idx)

    assert len(values_at_idx) == 1

tloubrieu-jpl · 2023-02-22T04:54:52Z

@alexdunnjpl this is not really a bug but a missing feature from the specification.

alexdunnjpl · 2023-02-22T05:53:26Z

@tloubrieu-jpl is it not a bug in the context of registry-api offering pagination, and that pagination (appeaing to be) providing incorrect results (missing/duplicated data) because of this issue?

alexdunnjpl · 2023-02-22T20:13:06Z

Nevermind - testing with the following script, I wasn't actually able to produce duplicated/missing data, so while the results above are confusing, pagination is not broken

page_size = 50
pages_to_fetch = 50
expected_lidvids_count = page_size*pages_to_fetch
results = []

for page_idx in range(0, pages_to_fetch):
    start = page_size*page_idx
    limit = page_size
    ids = get_ids(start, limit, hash_ids=False)
    results.extend(ids)

unique_results = set(results)
print(f'Fetched {len(results)} results, with {len(unique_results)} unique, vs. {expected_lidvids_count} expected')

tloubrieu-jpl · 2023-02-23T21:52:26Z

I will put this ticket in the icebox.

alexdunnjpl added bug Something isn't working needs:triage labels Feb 16, 2023

alexdunnjpl assigned jordanpadams Feb 16, 2023

alexdunnjpl added s.high High severity sprint-backlog and removed needs:triage labels Feb 16, 2023

alexdunnjpl assigned alexdunnjpl and unassigned jordanpadams Feb 16, 2023

alexdunnjpl mentioned this issue Feb 16, 2023

Pagination not working as expected with /collections/{identifier}/products #240

Closed

alexdunnjpl closed this as completed Feb 17, 2023

alexdunnjpl reopened this Feb 17, 2023

alexdunnjpl mentioned this issue Feb 17, 2023

Fix pagination bug #251

Merged

jordanpadams added the B13.1 label Feb 18, 2023

alexdunnjpl mentioned this issue Feb 21, 2023

/classes/collections/<lidvid>/members does not return data when it should #247

Closed

tloubrieu-jpl removed the sprint-backlog label Feb 22, 2023

alexdunnjpl added enhancement New feature or request and removed bug Something isn't working s.high High severity labels Feb 22, 2023

alexdunnjpl removed their assignment Feb 22, 2023

tloubrieu-jpl added icebox and removed B13.1 labels Feb 23, 2023

alexdunnjpl mentioned this issue Apr 26, 2023

PDS API is not respecting the sort field NASA-PDS/pds-api#262

Closed

jordanpadams added this to EN Portfolio Backlog Nov 20, 2023

github-project-automation bot moved this to Release Backlog in EN Portfolio Backlog Nov 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API does not appear to be applying LIDVID-sort by default #250

API does not appear to be applying LIDVID-sort by default #250

alexdunnjpl commented Feb 16, 2023 •

edited

Loading

alexdunnjpl commented Feb 17, 2023 •

edited

Loading

alexdunnjpl commented Feb 17, 2023

alexdunnjpl commented Feb 17, 2023

tloubrieu-jpl commented Feb 22, 2023

alexdunnjpl commented Feb 22, 2023 •

edited

Loading

alexdunnjpl commented Feb 22, 2023 •

edited

Loading

tloubrieu-jpl commented Feb 23, 2023

API does not appear to be applying LIDVID-sort by default #250

API does not appear to be applying LIDVID-sort by default #250

Comments

alexdunnjpl commented Feb 16, 2023 • edited Loading

Checked for duplicates

🐛 Describe the bug

🕵️ Expected behavior

📜 To Reproduce

🖥 Environment Info

📚 Version of Software Used

🩺 Test Data / Additional context

🦄 Related requirements

⚙️ Engineering Details

alexdunnjpl commented Feb 17, 2023 • edited Loading

alexdunnjpl commented Feb 17, 2023

alexdunnjpl commented Feb 17, 2023

tloubrieu-jpl commented Feb 22, 2023

alexdunnjpl commented Feb 22, 2023 • edited Loading

alexdunnjpl commented Feb 22, 2023 • edited Loading

tloubrieu-jpl commented Feb 23, 2023

alexdunnjpl commented Feb 16, 2023 •

edited

Loading

alexdunnjpl commented Feb 17, 2023 •

edited

Loading

alexdunnjpl commented Feb 22, 2023 •

edited

Loading

alexdunnjpl commented Feb 22, 2023 •

edited

Loading