-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return 400 status codes when query parameters are bad #678
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Should just be added to /requirements
and /governing
(although I'm not sure why the Yearly Documents part is there) as well.
Something similar but for pagination should probably be added to /news
, /events
and any other pages with pagination as well
True! I'll add it to more routes. As for why we have yearly documents at |
I made some refactoring and found other places where I think we should throw |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice finding more places, but I think we should have an upper bound for the pages with normal pagination as well. Looks great otherwise!
If upper bound is not set (undefined), it falls back to |
I was mostly thinking if the goal is to avoid bots we should not allow any page numbers that shouldn't be visited. And evidently we do know the value since we have the pagination component with the numbers |
Good point actually, I'll do the requested changes |
It's more tricky to determine valid upper bounds for Also I found some off by one errors. Please take a look at those and see if you agree with me. |
const eventCount = await prisma.event.count(); | ||
const pageSize = getPageSizeOrThrowSvelteError(url); | ||
const page = getPageOrThrowSvelteError(url, { | ||
fallbackValue: 1, | ||
lowerBound: 1, | ||
upperBound: Math.ceil(eventCount / pageSize), | ||
}); | ||
const [[events, pageCount], allTags] = await Promise.all([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just use the existing pageCount on line 23? I guess it's a bit inefficient to have to do all the queries before possibly throwing an error because of the pageCount, but on the other hand I don't think we should care about that if you're trying to visit a page that doesn't exist anyways.
Same thing for news and songbook (expenses looks fine)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do this, but this is where I think it's not worth it. I would rather see that we invalidate the parameters before doing any DB queries, both to minimize traffic, and as a general principle (since parameters could affect the DB query), even if that means allowing some strange parameter values. (As, long as we don't filter out any valid parameters)
But, we could do two validations. The first one before the DB query, where we use a rough filtering of valid parameters, and the second one after the DB query to do fine grained filtering of any bad parameters. This will introduce some complexity, but it's sort of like best of both scenarios. Any thoughts about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I see your point. Could we do the validation in the transaction? For example for events:
const [events, count] = await prisma.$transaction(async (tx) => {
const events = tx.event.findMany({
where,
orderBy: {
startDatetime: filters.pastEvents ? "desc" : "asc",
},
skip: Math.max(pageNumber - 1, 0) * pageSize,
take: pageSize,
include,
});
const count = tx.event.count({ where });
return [await events, await count];
});
Do the count query first and validate. It looks like we have all the information we need in the getAllEvents function. Though it might be a bad idea to slow down the transaction, so an alternative could be to just do the same query before the transaction in this function. But I feel like doing one extra query definitely takes longer than doing the validation so it would be better to just do it in the transaction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do it like this:
- Count (DB query)
- Validate
- Fetch data (DB query)
Related to this: me and @danieladugyan had a short conversation about this code you brought up as example, and we both agree that running this as a transaction is overkill. The odds of it being a data race are slim to none, and even if it were to happen, it doesn't break the application. So the transaction just puts on loads of overhead for basically no reason. Do you agree with this? That way we could remove the transaction while we are refactoring this.
I guess we could do some benchmarking to see which is faster before making any decisions though: do validation in the middle of transaction vs. splitting up the queries (removing the transaction) and do the validation in between.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Count (DB query)
- Validate
- Fetch data (DB query)
That's exactly what i had in mind
I definitely agree that running it as a transaction is overkill for the reasons you mentioned. Removing the transaction sounds like a good idea 👍. I highly doubt it's going to make much difference performance wise but you can do some benchmarking if you really want to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transactions completely kill performance under heavy load as far as I can tell. I think there is a maximum number of concurrent transactions and anything beyond that just fails after a timeout. I was never able to confirm this though, but if you're curious you can spam requests at the /news
endpoint for instance and check the logs for errors relating to transactions.
Currently, when visiting e.g. https://www.dsek.se/documents?year=-1, https://www.dsek.se/documents?year=99999, or https://www.dsek.se/documents?year=hello, the server happily responds with status code 200, even though these values for the year parameter makes no sense.
This PR limits the value on the year parameter, and throws if it's invalid (i.e. return 400).
Some endpoints I have chosen to allow a higher value on the year parameter, like committees, since it could be fun to see "who has this position NEXT year".
One could see in our logs that we get many bots/crawlers, constantly scraping e.g. https://www.dsek.se/documents?year=-1 and other weird values on the year parameter, but returning 400 here might stop them from trying that.
(Also fixes some comments that were out of date)