Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return 400 status codes when query parameters are bad #678

Merged
merged 13 commits into from
Feb 14, 2025
Merged

Conversation

alfredgrip
Copy link
Contributor

Currently, when visiting e.g. https://www.dsek.se/documents?year=-1, https://www.dsek.se/documents?year=99999, or https://www.dsek.se/documents?year=hello, the server happily responds with status code 200, even though these values for the year parameter makes no sense.

This PR limits the value on the year parameter, and throws if it's invalid (i.e. return 400).

Some endpoints I have chosen to allow a higher value on the year parameter, like committees, since it could be fun to see "who has this position NEXT year".

One could see in our logs that we get many bots/crawlers, constantly scraping e.g. https://www.dsek.se/documents?year=-1 and other weird values on the year parameter, but returning 400 here might stop them from trying that.

(Also fixes some comments that were out of date)

Copy link
Member

@Isak-Kallini Isak-Kallini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Should just be added to /requirements and /governing (although I'm not sure why the Yearly Documents part is there) as well.
Something similar but for pagination should probably be added to /news, /events and any other pages with pagination as well

@alfredgrip
Copy link
Contributor Author

Looks great! Should just be added to /requirements and /governing (although I'm not sure why the Yearly Documents part is there) as well. Something similar but for pagination should probably be added to /news, /events and any other pages with pagination as well

True! I'll add it to more routes. As for why we have yearly documents at /governing, I think it's intended for documents like "Strategisk plan", which are sort of governing but they only apply for one year.

@alfredgrip
Copy link
Contributor Author

I made some refactoring and found other places where I think we should throw

Copy link
Member

@Isak-Kallini Isak-Kallini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice finding more places, but I think we should have an upper bound for the pages with normal pagination as well. Looks great otherwise!

@alfredgrip
Copy link
Contributor Author

Nice finding more places, but I think we should have an upper bound for the pages with normal pagination as well. Looks great otherwise!

If upper bound is not set (undefined), it falls back to Number.MAX_SAFE_INTEGER so it's not needed. You only need to provide it if you know a value to set it to, like year

@Isak-Kallini
Copy link
Member

Nice finding more places, but I think we should have an upper bound for the pages with normal pagination as well. Looks great otherwise!

If upper bound is not set (undefined), it falls back to Number.MAX_SAFE_INTEGER so it's not needed. You only need to provide it if you know a value to set it to, like year

I was mostly thinking if the goal is to avoid bots we should not allow any page numbers that shouldn't be visited. And evidently we do know the value since we have the pagination component with the numbers

@alfredgrip
Copy link
Contributor Author

Good point actually, I'll do the requested changes

@alfredgrip
Copy link
Contributor Author

It's more tricky to determine valid upper bounds for page than I first thought. It works fine if we have no special where filters in Prisma, but once we have those we need to first do a DB query with filters, then we can validate the page query parameter. I found this to be too complex and the benefits isn't worth it, so I just determine valid values using a simple .count()

Also I found some off by one errors. Please take a look at those and see if you agree with me.

Comment on lines +16 to 23
const eventCount = await prisma.event.count();
const pageSize = getPageSizeOrThrowSvelteError(url);
const page = getPageOrThrowSvelteError(url, {
fallbackValue: 1,
lowerBound: 1,
upperBound: Math.ceil(eventCount / pageSize),
});
const [[events, pageCount], allTags] = await Promise.all([
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just use the existing pageCount on line 23? I guess it's a bit inefficient to have to do all the queries before possibly throwing an error because of the pageCount, but on the other hand I don't think we should care about that if you're trying to visit a page that doesn't exist anyways.

Same thing for news and songbook (expenses looks fine)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do this, but this is where I think it's not worth it. I would rather see that we invalidate the parameters before doing any DB queries, both to minimize traffic, and as a general principle (since parameters could affect the DB query), even if that means allowing some strange parameter values. (As, long as we don't filter out any valid parameters)

But, we could do two validations. The first one before the DB query, where we use a rough filtering of valid parameters, and the second one after the DB query to do fine grained filtering of any bad parameters. This will introduce some complexity, but it's sort of like best of both scenarios. Any thoughts about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I see your point. Could we do the validation in the transaction? For example for events:

const [events, count] = await prisma.$transaction(async (tx) => {
    const events = tx.event.findMany({
      where,
      orderBy: {
        startDatetime: filters.pastEvents ? "desc" : "asc",
      },
      skip: Math.max(pageNumber - 1, 0) * pageSize,
      take: pageSize,
      include,
    });
    const count = tx.event.count({ where });
    return [await events, await count];
  });

Do the count query first and validate. It looks like we have all the information we need in the getAllEvents function. Though it might be a bad idea to slow down the transaction, so an alternative could be to just do the same query before the transaction in this function. But I feel like doing one extra query definitely takes longer than doing the validation so it would be better to just do it in the transaction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do it like this:

  1. Count (DB query)
  2. Validate
  3. Fetch data (DB query)

Related to this: me and @danieladugyan had a short conversation about this code you brought up as example, and we both agree that running this as a transaction is overkill. The odds of it being a data race are slim to none, and even if it were to happen, it doesn't break the application. So the transaction just puts on loads of overhead for basically no reason. Do you agree with this? That way we could remove the transaction while we are refactoring this.

I guess we could do some benchmarking to see which is faster before making any decisions though: do validation in the middle of transaction vs. splitting up the queries (removing the transaction) and do the validation in between.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Count (DB query)
  2. Validate
  3. Fetch data (DB query)

That's exactly what i had in mind

I definitely agree that running it as a transaction is overkill for the reasons you mentioned. Removing the transaction sounds like a good idea 👍. I highly doubt it's going to make much difference performance wise but you can do some benchmarking if you really want to

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transactions completely kill performance under heavy load as far as I can tell. I think there is a maximum number of concurrent transactions and anything beyond that just fails after a timeout. I was never able to confirm this though, but if you're curious you can spam requests at the /news endpoint for instance and check the logs for errors relating to transactions.

@alfredgrip alfredgrip merged commit 7aebe7d into main Feb 14, 2025
3 checks passed
@alfredgrip alfredgrip deleted the status-codes branch February 14, 2025 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

3 participants