Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent users from submitting multiple tasks in parallel #122

Merged
merged 2 commits into from
Feb 28, 2025

Conversation

benoit74
Copy link
Collaborator

@benoit74 benoit74 commented Feb 27, 2025

Fix #56

Changes

  • (slightly) track users based on their IP address and a uniquely generated tracker ID
  • when a user opens the main UI page, there is now four situations:
    • situation 1: user has a tracker ID and there is already too many ongoing tasks for this tracker ID => block user and display link(s) to ongoing task
    • situation 2: user has no tracker ID but there is already too many ongoing tasks for this tracker ID => block user inviting him to come back later (tasks are probably not his tasks, we do not want to "leak" tasks to other users behind same IP
    • situation 3: user has an invalid tracker ID, or other weird situations linked to a user probably trying to tamper our system => block user with quite obscure message avoiding to leak too much details, but still displaying status so that it is a bit easier to debug should this be a bug
    • situation 4: we fail to get status (bug, ...) => simple error message
    • situation 5: user has quota left for his tracker ID or his IP => display usual UI
  • for now, too many tasks for one single user is set at 1 ongoing task, but code is already a bit prepared to have more complex logic (e.g. grant more quota to some users)
Situation 1 Situation 2
Screenshot 2025-02-27 at 08 44 41 Screenshot 2025-02-27 at 08 45 46
Situation 3 Situation 4
Screenshot 2025-02-27 at 08 48 52 Screenshot 2025-02-27 at 08 41 52

Wording can obviously be fine-tuned. Everything is meant to be internationalized.

"Contact us" is really "poor" for now, but I assume this will be solved in #93

  • add a button to cancel ongoing task
    • cancellation is only possible when user has a tracker ID (which is normally always the case)

Screenshot 2025-02-27 at 10 35 05

  • handle cases where task is marked for cancellation / cancelled (we can expect this to happen more often and need to provide clear feedback to the user - for now we just displayed that the task was in error, which is wrong)

Screenshot 2025-02-27 at 10 33 47

Screenshot 2025-02-27 at 10 34 13

Technical details

  • all data about tracking is stored in zimit-frontend API ; for now, we do not have any storage (DB, ...) ; this is hence stored in-memory. This work because we have one single (uvicorn) process. Should we have a need to scale, we should definitely scale at k8s level (more Docker container, not more uvicorn process) and enable sticky sessions by IP so that users are constantly assigned same Docker container. This is obviously not perfect, to be further discussed / solved down the road.
  • all logic to block users based on tracking is done in zimit-frontend API ; this is deemed the most reasonable compromise in terms of coding (IP detection can mostly only be done at API level) but also in terms of "security" (blocking at UI level is fragile in many scenario)
  • tracker ID is composed of two parts: <unique_id>|<digest>:
    • <unique_id> is a randomly generated 64 bits number in hexadecimal representation
    • <digest> is the sha-256 HMAC of the <unique_id>, in hexadecimal representation again
      • this allows to prove authenticity of the tracker ID
      • otherwise it is too easy for users to generate new identities by just generating a new unique_id on their own
      • sha-256 is probably overkill, but we do not really mind about size of the HMAC either, so probably a reasonable compromise
  • tracker ID is stored in user local storage, associated with frontend domain
    • storing it in a cookie is more complex due to cross-site concerns between frontend and backend
    • we do not really mind to use JS to add tracker ID in the requests to backend, no real benefit from the automagic nature of cookies

@Popolechien @kelson42 @rgaudin feedback needed obviously

@benoit74 benoit74 self-assigned this Feb 27, 2025
Copy link

codecov bot commented Feb 27, 2025

Codecov Report

Attention: Patch coverage is 62.00000% with 57 lines in your changes missing coverage. Please review.

Project coverage is 56.66%. Comparing base (9adb4c7) to head (cf4b7fa).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
api/src/zimitfrontend/routes/requests.py 0.00% 37 Missing ⚠️
api/src/zimitfrontend/routes/tracker.py 0.00% 11 Missing ⚠️
api/src/zimitfrontend/tracker.py 92.04% 7 Missing ⚠️
api/src/zimitfrontend/main.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #122       +/-   ##
===========================================
+ Coverage   45.64%   56.66%   +11.01%     
===========================================
  Files          10       12        +2     
  Lines         390      533      +143     
  Branches       44       77       +33     
===========================================
+ Hits          178      302      +124     
- Misses        210      229       +19     
  Partials        2        2               

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@benoit74 benoit74 force-pushed the block_multiple_requests branch from 7461595 to f66efb5 Compare February 27, 2025 09:44
@benoit74 benoit74 marked this pull request as ready for review February 27, 2025 09:47
@benoit74
Copy link
Collaborator Author

Updated first comment with more screenshots, this is ready for review / comments

I would like to move this fast to production to reduce issues we have with zimit pipe being filled.

@Popolechien
Copy link
Contributor

user has no tracker ID but there is already too many ongoing tasks for this tracker ID

Seems to be contradictory.

Situation 3 seems quite niche (I hope), but why not simply state that we suspect tracker ID has been tampered with?

Other than that LGTM, I probably would have been content with option 1 alone.

@rgaudin
Copy link
Member

rgaudin commented Feb 27, 2025

I've already expressed my concerns about blocking by IP in the ticket so I won't complain about it but I think the wording remains important. I am confident there will be false positive and saying “You are blocked” to someone legitimately using the service is terrible communication. I even think a generic somewhat vague message with a dedicated error code would be better.
On the other hand, I think that if we block people because of IP usage, we should tell them that's the reason of the blocage so they can take action if for instance they're using a VPN.

@benoit74
Copy link
Collaborator Author

user has no tracker ID but there is already too many ongoing tasks for this tracker ID

Fixed in first comment, this is a typo: for this tracker ID => for this IP

Situation 3 seems quite niche (I hope), but why not simply state that we suspect tracker ID has been tampered with?

Because as the code is today, it might be more than that, vague statements are always superior to wrong ones. I hope it is quite niche, but I'm sure it will happen.

I probably would have been content with option 1 alone.

But then it would have been quickly circumvented by someone finding the trick, loosing again our resources and our time/effort, since we would not even be aware that traffic is high again because some tricked our systems.

I think that if we block people because of IP usage, we should tell them that's the reason of the blocage so they can take action if for instance they're using a VPN.

Or so that they know in plain text they just have to use a VPN with another exit IP to circumvent our blockage. I'm not in favor of giving too much directions on how to circumvent protections.

Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

class Tracker:
def __init__(self):
self.known_clients: list[ClientInfo] = []
self.ongoing_task_has_finished = self._ongoing_task_has_finished
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the reason for this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is meant to allow customizing both functions more easily. Used only for tests for now. I don't find it particularly elegant, but at least it works. And does not require tons of code. Suggestions welcomed.

@benoit74
Copy link
Collaborator Author

Let's merge, please open a dedicated issue if there is still things to enhance / fix.

@benoit74 benoit74 merged commit f77b192 into main Feb 28, 2025
7 of 8 checks passed
@benoit74 benoit74 deleted the block_multiple_requests branch February 28, 2025 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Block users submitting more than one task in parallel to ensure fair use
3 participants