Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYNPY-1513] Validate input submission ID in getSubmission(...) #1135

Merged
merged 16 commits into from
Oct 1, 2024

Conversation

jaymedina
Copy link
Collaborator

@jaymedina jaymedina commented Sep 25, 2024

problem

The endpoint evaluation/submission/{id} was receiving communications with submission IDs that had decimal points in them (e.g. 123.0). This was obstructing an ingestion process in Snowflake, and should not exist in the first place. Submission IDs are not ID types that support decimal points.

solution

In getSubmission(...) and other related methods in client.py, the id_of(...) method from utils.py is utilized to retrieve a valid entity ID. This method was being used for submission IDs, but was effectively useless, as it would just return the same string object or numbers object (as a string) without verifying its format.

The solution involves creating a separate method called validate_submission_id and replacing id_of with it, to enforce the no-decimal rule in the input.

  • Create utils.validate_submission_id to handle decimals in input IDs
  • Add/update relevant unit and integration tests

testing

Test getSubmission

In [1]: import synapseclient

In [2]: syn = synapseclient.login()
Welcome, Jenny Medina!

# Testing an expected input in `getSubmission`
In [3]: syn.getSubmission("9745366")
Out[3]: 
{'id': '9745366',
 [ ... ]
 'filePath': None}

# Testing an input with a decimal in `getSubmission`
In [4]: syn.getSubmission("9745366.0")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[4], line 1
----> 1 syn.getSubmission("9745366.0")

[ ... ]

ValueError: Invalid submission ID: 9745366.0. ID can either be an integer or a string with no decimals.

In [5]: 

Test getSubmissionStatus

In [5]: syn.getSubmissionStatus("9745366")
Out[5]: 
{'id': '9745366',
[ ... ]
 'statusVersion': 7}

In [6]: syn.getSubmissionStatus("9745366.0")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[6], line 1
----> 1 syn.getSubmissionStatus("9745366.0")

[ ... ]

ValueError: Invalid submission ID: 9745366.0. ID can either be an integer or a string with no decimals.

In [7]: 

@pep8speaks
Copy link

pep8speaks commented Sep 25, 2024

Hello @jaymedina! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 4819:89: E501 line too long (104 > 88 characters)
Line 4863:89: E501 line too long (117 > 88 characters)
Line 4866:89: E501 line too long (103 > 88 characters)

Line 280:89: E501 line too long (104 > 88 characters)
Line 283:89: E501 line too long (90 > 88 characters)

Line 140:89: E501 line too long (109 > 88 characters)

Line 3010:89: E501 line too long (89 > 88 characters)
Line 3040:24: F541 f-string is missing placeholders
Line 3137:89: E501 line too long (90 > 88 characters)

Comment last updated at 2024-10-01 13:47:24 UTC

@jaymedina
Copy link
Collaborator Author

Hey @BryanFauble @thomasyu888 , before I move forward with this solution I'd like to run it by you both since it is a breaking change (see latest tests) and involves a little bit of re-structuring everywhere where id_of is called. When you have some time, see my comment in "Design considerations" for my reasoning, and let me know what you think.

Thanks!

@BryanFauble
Copy link
Contributor

@jaymedina

it made more sense to refactor this ID-handling method to correct for decimals in all these other entities, since the only entity type that can support decimals is File entities.

The issue with this is that is we might not know ahead of time if an ID is for an entity is a File, or some other entity in Synapse. Take the example where syn.get() is a factory that returns back several types of entities/resources, and we don't know what the entity type is until after we call Synapse, after we have called id_of.

I tested out the dot notation with a project, and it seems like on the backend these entities do have a version applied to them, but it can only ever be version 1, "syn54022402.2" fails:

from synapseclient import Synapse


syn = Synapse()
syn.login()

project = syn.get("syn54022402")
print(project)
project = syn.get("syn54022402.1")
print(project)

I think at the moment it just makes sense to worry about what's in front of us and only update syn.getSubmission() with the new logic.

Should Synapse itself should be performing this data check?

@jaymedina
Copy link
Collaborator Author

jaymedina commented Sep 26, 2024

Should Synapse itself should be performing this data check?

That's a really good point @BryanFauble and I guess it depends on the scope of responsibility within synpy. Is the current expectation when we develop on this client that synpy validates inputs before communicating with the API or do we leave that responsibility with the API to handle these cases (in this context, a submission ID having a decimal)? Validating on the synpy side feels more flexible and would prevent unnecessary communications with the Synapse API, but also adds complexity to the codebase like what's going to be in this PR.

Right now I lean towards keeping small validation logic like this in synpy (under utils.py) because of the flexibility of being able to write our own warning/error messages, and the low overhead of our code review process.

If we go with handling the inputs on the client side, I would go back to my previous solution of handling the submission IDs with a new function in utils.py, since id_of is for synapse IDs and not submission IDs.

@BryanFauble
Copy link
Contributor

Is the current expectation when we develop on this client that synpy validates inputs before communicating with the API or do we leave that responsibility with the API to handle these cases (in this context, a submission ID having a decimal)?

It's important for both. Think about the client/server relationship as it pertains to a Web UI and a backend server. The UI can have input validation, but the server needs to have that final strict enforcement of the rules.

Right now I lean towards keeping small validation logic like this in synpy (under utils.py) because of the flexibility of being able to write our own warning/error messages, and the low overhead of our code review process.

I agree that we can handle this client side. There are a bunch of these similar checks in the code base already - For example making sure that an entity ID starts as syn#####.

If we go with handling the inputs on the client side, I would go back to my previous solution of handling the submission IDs with a new function in utils.py, since id_of is for synapse IDs and not submission IDs.

I agree that this should be a new function, since the rules for a submission ID are different from those of an entity/synapse ID.

@jaymedina jaymedina force-pushed the synpy-1513-remove-dot0-get-submissions branch from 08fc8d0 to 390c6ca Compare September 26, 2024 18:42
@jaymedina jaymedina marked this pull request as ready for review September 27, 2024 06:08
@jaymedina jaymedina requested a review from a team as a code owner September 27, 2024 06:08
Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 LGTM! I'll leave it to @BryanFauble for final review.

@jaymedina jaymedina requested a review from BryanFauble October 1, 2024 13:58
Copy link
Contributor

@BryanFauble BryanFauble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thanks for the thorough tests!

@jaymedina jaymedina merged commit d9f3786 into develop Oct 1, 2024
22 of 23 checks passed
@jaymedina jaymedina deleted the synpy-1513-remove-dot0-get-submissions branch October 1, 2024 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants