Spike: Revisit the issue of adding rate limiting logic to the application, create a list of actionable issues to start the effort. #23

kcondon · 2015-01-14T17:34:36Z

This ticket is a placeholder for general API rate and access limiting logic to better control the load placed on the service and provide options in case of system instability.

Rate limiting was mentioned during search api testing and github search api uses this concept too:
https://developer.github.com/v3/search/

Limiting access might involve varying degrees of options: general api access on/off switch, per api, and/or whitelist/blacklist of ip addresses/ users. The last might be integrated with groups and permissions.

Update: additional terms for this:

API throttling

pdurbin · 2018-03-15T14:14:26Z

For Harvard Dataverse we have been talking about investigating rate limiting solutions offered by AWS and I just pushed b1b703a to mention the new "Rate-Based Rules" offering that's part of AWS WAF (Web Application Firewall). This blog post provides a good overview: https://aws.amazon.com/blogs/aws/protect-web-sites-services-using-rate-based-rules-for-aws-waf/

pdurbin · 2018-05-18T16:54:59Z

At standup this morning I inquired if there is any specific technical plan or approach and got feedback that we are fine with an AWS-specific solution for now so I went ahead and made pull request IQSS/dataverse#4693 based on the commit I mentioned above and moved this issue to code review at https://waffle.io/IQSS/dataverse

djbrooke · 2018-05-18T17:01:53Z

Thanks @pdurbin. I re-titled the issue to reflect that this is AWS-specific. We'll want a general solution at some point, but I think it's good to get this small chunk tested and out in a release.

djbrooke · 2018-05-21T15:34:28Z

Talked after standup this morning. The approach is good, but we need some boxes from LTS to test. @landreev will look into this with LTS. @djbrooke will get involved if a credit card or something is needed :)

landreev · 2018-05-21T21:43:47Z

I sent a note to LTS:

We've been thinking about creating a test AWS setup that would mock the production; for testing new releases before they go into production.

Something like a low-power instance or two. And we specifically want it to sit behind an ELB - in order to be able to test load-balancing and rate-limiting mechanisms. (This is one thing we have no way of testing as of now).

Is this something you could help us setting up, or would you recommend that we just set it up on our own?

(by the time I hit send I kinda felt like I was maybe pushing it with them... well, if that's the case they'll tell us to do it ourselves and we will. but I figured I'd ask)

djbrooke · 2018-06-18T14:35:54Z

Meeting with LTS on Wednesday, will discuss.

landreev · 2018-07-31T17:12:51Z

@matthew-a-dunlap
The test cluster is made up of 2 app nodes:
dvn-cloud-dev-1.lts.harvard.edu
dvn-cloud-dev-2.lts.harvard.edu
I created a shell account for you, with the username mdunlap an sudo powers.
I'll slack the password to you.
The elb for the cluster is
https://dvn-dev.lts.harvard.edu/

Both nodes are using the database on dvn-cloud-dev-1.

matthew-a-dunlap · 2018-07-31T18:46:11Z

This story is mostly blocked until we hear back from LTS about access to the web console, they only provided us access to the boxes themselves.

I'll can do some deeper research into web application firewall in the meantime.

djbrooke · 2021-10-27T18:46:59Z

The tech team will discuss and bring a well scoped issue to a future planning meeting.

djbrooke · 2021-11-17T16:31:00Z

@scolapasta - when you pick this up for discussion, one thing that @landreev mentioned is that it may be a good idea to check the number of locks a person has - for example a person can start a bunch of publishing requests, and the individual datasets are locked but what's to stop them from firing several thousand requests in parallel?

PaulBoon · 2021-11-29T16:44:24Z

Since last week we are experiencing lots of problems with lots of request to the '/api/access/datafiles/{id}' endpoint.
These download request are probably not malicious, but we don't know for sure of course.

Besides thinking about using something like mod_evasive, we also looked into our payara configuration, which might be tuned to give better performance.
This blog https://blog.payara.fish/fine-tuning-payara-server-5-in-production contains most useful information,
but I was wondering if there are some Dataverse specific tips available in the guides, or maybe it should be added?

mreekie · 2023-01-10T22:12:17Z

Prio meeting with Stefano.

Moved from Dataverse Team Backlog to ordered backlog

mreekie · 2023-01-11T19:28:35Z

Top priority for upcoming sprint

mreekie · 2023-01-11T21:00:40Z

Sizing:

The first step here is a spike. time limited to a sprint
We don't have a nailed down approach to this though there has been some research and discussion.
As part of the spike, could include a tech hour session.
This could be big enough that it becomes a "deliverable"

landreev · 2023-01-18T22:15:21Z

This came up yet again, recently.
The reason it hasn't gone anywhere in 8 years is that it's way too fat, an elephant-sized issue that's too broadly defined. We've gone through this cycle quite a few times - of talking about it during tech hours, giving it to somebody to research and investigate, etc. But it's hard to even talk about, when we are defining it like this, that we want to "throttle everything", the full spectrum of our incoming traffic - it's not clear where to start even.
Our traffic is not uniform. It would be easy if we were only serving cat pictures (of roughly the same size) all day long. But our users' requests vary immensely in their impact on the system, plus we have different classes of users etc. In general, you need to know a lot about the specifics of our application and this makes adopting existing third party solutions difficult at least.

What I'm proposing is that instead of trying to re-visit this issue as a whole, we should just start chipping away at the problem by addressing certain specific cases of limiting excessive load that we can define and know how to address. I've proposed some, like detecting and blocking aggressive crawlers (basically what I do by hand occasionally; also blocking crawlers may be one area where some off the shelf solution may/should work); or limiting specific expensive activity on the user level (like a limit on how many files/data an unprivileged user can upload per hour). Features like this are in fact long overdue. And I'm convinced by now that it would be more productive to just work on them one clearly defined case at a time.

mreekie · 2023-01-27T17:33:56Z

Sprint board review

I'm pulling this one back off onto the sprint backlog, pending an OK from Stefano.
The reasons are:
- This may need additional discussion
- I added an extra 69 points to this sprint over the team estimate of 400.

(I can't wait until some of this is automated)

mreekie · 2023-01-27T20:26:00Z

Sprint board review

Resized as a time bounded spike.
Size 33.
Added back on the sprint.

landreev · 2023-02-02T15:57:51Z

There are few specific areas that have been identified where we can start working immediately.
The list below is the first set of such issues catalogued as part of this spike, some old and some brand-new.

This new issue has been opened as a followup to the discussion with @siacus and @scolapasta, as a sensible area to focus on:

Add execution rate metering to the command engine dataverse#9356

During recent discussions it was suggested that metering and limiting file uploads should also be handled under this umbrella, since uploads are a very serious part of the overall practical system load, and there seems to be an agreement that this needs to be addressed urgently. Another practical consideration is that file uploads are not handled through the command engine, and therefore will not be subject to limiting by the technology described in 1. above.
A few issues have been opened for storage quotas and limits over the years. There is some overlap between them.

Add mechanism for collection-wise storage size quotas dataverse#8549
Limit the amount of data depositors may upload every day dataverse#7829
Storage allocation quota per user dataverse#4339
File and dataset limits: Add a programmatic way to limit file size and dataset size dataverse#3939
I added a new issue for adding a generic quota check mechanism to the file creation pipeline that is narrowly defined and could be implemented first, but will then allow us to address the specific quota cases requested in the issues above:
Add GENERIC storage quota check to the file upload framework. dataverse#9361

Add Apache-level solution for detecting bot/scripted or otherwise automated crawling, before it gets to the application:

Investigate adding Apache-level mechanism for rejecting aggressive robot crawling dataverse#9359

landreev · 2023-02-02T16:30:22Z

Per feedback from @qqmyers, I'll run some quick practical analysis on the ActionLogRecord data in production, to see if any obvious results can be derived from it immediately, smoking guns/worst offenders, etc.

landreev · 2023-02-02T21:19:42Z

Actually, I'll add any useful stats from the prod. ActionLogRecord to the "command engine" issue (#9356).

scolapasta · 2023-02-02T21:32:44Z

Reviewed the new issues added - I think they look good and represent what we can first get done, in order to help with rate limiting. There may well be more to do after those, but let's get them working (I've gone ahead and added them to the Dataverse Dev column in the backlog board) and we can revisit after, as needed.

mreekie · 2023-02-28T21:55:55Z

Grooming:

This is closed but it is the start of a dev effort to revisit rate limiting issues.
Added a deliverable label:

mreekie · 2023-04-10T13:10:21Z

grooming:

This needs to be looked at again.
For now I put it in the backlog and added the deliverable label.

scolapasta · 2024-01-17T22:52:51Z

Closing this, now that we have IQSS/dataverse#10211 in progress.

landreev · 2024-01-18T22:50:28Z

@scolapasta Are you sure you wanted to close this one?
Note that this spike was for being able to limit everything across the application; with the idea, I think, that more than one solution may be need in parallel, for different parts of the application.
IQSS/dataverse#10211, and the corresponding issue are specifically for the Command Engine only.

I can see how an argument can be made that if there is anything potentially expensive that we want to ration, that's done bypassing the command system, then it could potentially be addressed by creating dedicated commands for all such things... But I still think that would need to be discussed to make sure we're not missing anything.

scolapasta · 2024-01-29T19:58:33Z

@landreev If there are other areas that we do need to ration, outside of the command system, then I'd vote for creating more specific actionable issues for it. This one here was in the dm-project and I do think we've made plenty of headway on different aspects and I think that accomplished the goal of "creat[ing] a list of actionable issues to start the effort". But if you feel otherwise and think there's something more we can do for this one specifically, that's fine too.

bencomp mentioned this issue May 29, 2015

Consider options for opening APIs without tokens IQSS/dataverse#1838

Closed

pdurbin mentioned this issue Jun 8, 2016

Sensitive Data: Mitigate against password guessing attacks IQSS/dataverse#3153

Closed

pdurbin mentioned this issue Jun 19, 2017

Search API: Support search without an API Token IQSS/dataverse#3900

Closed

pdurbin referenced this issue in IQSS/dataverse Mar 15, 2018

add AWS rate limiting tip to new page on scaling #1339

b1b703a

pdurbin self-assigned this May 18, 2018

pdurbin mentioned this issue May 18, 2018

add AWS rate limiting tip to new page on scaling #1339 IQSS/dataverse#4693

Closed

pdurbin removed their assignment May 18, 2018

djbrooke changed the title ~~API: Add rate limiting logic~~ API: Add rate limiting logic on AWS May 18, 2018

djbrooke changed the title ~~API: Add rate limiting logic on AWS~~ API: Add rate limiting logic for AWS May 18, 2018

djbrooke assigned djbrooke and landreev and unassigned djbrooke May 21, 2018

djbrooke assigned djbrooke and unassigned landreev May 22, 2018

djbrooke assigned landreev Jun 11, 2018

djbrooke assigned landreev and unassigned landreev Jun 18, 2018

djbrooke removed their assignment Jul 13, 2018

djbrooke unassigned landreev Jul 26, 2018

matthew-a-dunlap self-assigned this Jul 30, 2018

djbrooke assigned landreev Jul 31, 2018

landreev removed their assignment Jul 31, 2018

scolapasta removed their assignment Mar 4, 2022

landreev self-assigned this Jan 30, 2023

landreev changed the title ~~Add rate limiting logic~~ Spike: Revisit the issue of adding rate limiting logic to the application, create a list of actionable issues to start the effort. Jan 30, 2023

scolapasta self-assigned this Feb 2, 2023

scolapasta closed this as completed Feb 2, 2023

mreekie reopened this Mar 6, 2023

mreekie transferred this issue from IQSS/dataverse Mar 6, 2023

mreekie added the D: FixRateLimitingBehaviors label Mar 22, 2023

mreekie added the bklog: Deliverable label Apr 10, 2023

scolapasta closed this as completed Jan 17, 2024

scolapasta unassigned landreev and scolapasta Feb 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spike: Revisit the issue of adding rate limiting logic to the application, create a list of actionable issues to start the effort. #23

Spike: Revisit the issue of adding rate limiting logic to the application, create a list of actionable issues to start the effort. #23

kcondon commented Jan 14, 2015 •

edited by pdurbin

Loading

pdurbin commented Mar 15, 2018

pdurbin commented May 18, 2018

djbrooke commented May 18, 2018

djbrooke commented May 21, 2018

landreev commented May 21, 2018 •

edited by pdurbin

Loading

djbrooke commented Jun 18, 2018

landreev commented Jul 31, 2018

matthew-a-dunlap commented Jul 31, 2018

djbrooke commented Oct 27, 2021

djbrooke commented Nov 17, 2021

PaulBoon commented Nov 29, 2021 •

edited

Loading

mreekie commented Jan 10, 2023

mreekie commented Jan 11, 2023

mreekie commented Jan 11, 2023

landreev commented Jan 18, 2023

mreekie commented Jan 27, 2023

mreekie commented Jan 27, 2023

landreev commented Feb 2, 2023 •

edited

Loading

landreev commented Feb 2, 2023 •

edited

Loading

landreev commented Feb 2, 2023

scolapasta commented Feb 2, 2023

mreekie commented Feb 28, 2023

mreekie commented Apr 10, 2023

scolapasta commented Jan 17, 2024

landreev commented Jan 18, 2024

scolapasta commented Jan 29, 2024

Spike: Revisit the issue of adding rate limiting logic to the application, create a list of actionable issues to start the effort. #23

Spike: Revisit the issue of adding rate limiting logic to the application, create a list of actionable issues to start the effort. #23

Comments

kcondon commented Jan 14, 2015 • edited by pdurbin Loading

pdurbin commented Mar 15, 2018

pdurbin commented May 18, 2018

djbrooke commented May 18, 2018

djbrooke commented May 21, 2018

landreev commented May 21, 2018 • edited by pdurbin Loading

djbrooke commented Jun 18, 2018

landreev commented Jul 31, 2018

matthew-a-dunlap commented Jul 31, 2018

djbrooke commented Oct 27, 2021

djbrooke commented Nov 17, 2021

PaulBoon commented Nov 29, 2021 • edited Loading

mreekie commented Jan 10, 2023

mreekie commented Jan 11, 2023

mreekie commented Jan 11, 2023

landreev commented Jan 18, 2023

mreekie commented Jan 27, 2023

mreekie commented Jan 27, 2023

landreev commented Feb 2, 2023 • edited Loading

landreev commented Feb 2, 2023 • edited Loading

landreev commented Feb 2, 2023

scolapasta commented Feb 2, 2023

mreekie commented Feb 28, 2023

mreekie commented Apr 10, 2023

scolapasta commented Jan 17, 2024

landreev commented Jan 18, 2024

scolapasta commented Jan 29, 2024

kcondon commented Jan 14, 2015 •

edited by pdurbin

Loading

landreev commented May 21, 2018 •

edited by pdurbin

Loading

PaulBoon commented Nov 29, 2021 •

edited

Loading

landreev commented Feb 2, 2023 •

edited

Loading

landreev commented Feb 2, 2023 •

edited

Loading