Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cold Boot Performance Issues #1673

Closed
mod-flux opened this issue Apr 8, 2021 · 17 comments
Closed

Cold Boot Performance Issues #1673

mod-flux opened this issue Apr 8, 2021 · 17 comments
Labels
adapters Changes related to the core code concerning database adapters enhancement New feature or request

Comments

@mod-flux
Copy link

mod-flux commented Apr 8, 2021

Describe the bug
I have observed extended response times for any NextAuth endpoint in the scenario of a cold boot of a serverless function. This is regardless of whatever method or encryption you're using (JWT/database, encrypted and non-encrypted).

These performance issues are in the range of multiple seconds of response time. For context, 'cold boot' occurs in these scenarios:

  • Launch a new version of an app on Vercel so that there is no pre-existing hot Lambda and hit the endpoint with a new request
  • Hit the Lambda endpoint with concurrent requests
  • Wait 15 mins for the Lambda endpoint cache to expire

Steps to reproduce

  1. Fresh Next.js application using the latest version.
  2. Implement next-auth library using the standard implementation
  3. Implement any provider, custom or pre-defined (I've used custom & Google for testing)
  4. Launch on Vercel (or run locally)
  5. Go to any endpoint provided by NextAuth (I've been using <your deployment>/api/auth/session for testing)
  6. Note that on the initial request, it can take upward of ~5 seconds to respond (this is regardless of whether they have a JWT cookie to decode or a database to connect to)
  7. Repeated requests to the same endpoint once the function is 'hot' and available takes under ~100ms

Further to the above, I've noted that when 'hot', the /session endpoint uses 131MB of memory so to optimize serverless costs, we've adjusted our vercel.json for /session to provision itself with 192MB of available memory as opposed to the default of 1GB. In this scenario, the initial requests takes upward of ~9 seconds.

Expected behavior
Initial requests on cold functions don't take over a second to respond, especially for those without an active session (e.g. no JWT cookie for stateless sessions).

Screenshots or error logs
I did some profiling as to what was taking the majority of the time in the aforementioned scenarios and got the following results from a Vercel function on the /session endpoint using 192MB of memory:

typeorm: 6241.299ms
prisma: 1.078ms
adapters: 6280.308ms
jwt: 480.770ms
parse-url: 0.546ms
logger: 0.304ms
cookie: 1.041ms
default-events: 0.649ms
default-callbacks: 0.378ms
providers: 0.301ms
callback-url-handler: 0.458ms
extend-req: 0.244ms
routes: 318.520ms
pages: 42.476ms
csrf-token-handler: 36.856ms
create-secret: 0.382ms
pkce-handler: 1.179ms
state-handler: 0.491ms
_NextAuthHandler: 19.533ms
session: 0.012ms

Additional context
It's clear that most functions/libraries are fairly quick and performant but the real outlier here is typeorm which is part of adapters. This is included in the first line of the server index.js.

I'm not proficient with this library but at a glance, everything in adapters only appears to be being used when someone provides a customer adapter in the options or database is being used to store sessions, neither of which is default to next-auth and need to be explicitly set as stateless sessions is the default.

I tested my theory and removed the import adapters from '../adapters' entirely, deployed and then attempted the /session endpoint again and found it worked. More crucially it reduced the cold start response time to a more manageable ~2 seconds. Furthermore, it reduced the memory usage from 131MB to 94MB, which meant we can reduce our memory allocation even further.

To this end, I propose that it doesn't make sense to include this significant overhead in the majority of cases when users are using the next-auth default config of stateless sessions. May I suggest we look to dynamically import the ./adapters in the server/index.js only in the situations where it's necessary (e.g. user has configured database sessions).

I'm also looking at other areas to optimize this initial load, specifically jwt and routes but clearly this is an improvement on what it was initially.

@mod-flux mod-flux added the bug Something isn't working label Apr 8, 2021
@balazsorban44
Copy link
Member

balazsorban44 commented Apr 8, 2021

This is part of our goals with extracting adapters to https://github.com/nextauthjs/adapters. Removing typeorm from the default imports would drop the package size with 71%(!)
image

@ndom91 has an ongoing effort in splitting out the adapters from the core at https://github.com/nextauthjs/adapters/issues/37, and I think he would appreciate any help!

jose is another package that we haven't updated in a while (we use 1.x, when only 2.x and 3.x are maintained anymore). Upgrading it would probably result in some package size reduction as well. We just discussed this with @lluia. Although I don't know how hard this would be. I am happy to accept PRs for it!

@balazsorban44 balazsorban44 added adapters Changes related to the core code concerning database adapters enhancement New feature or request and removed bug Something isn't working labels Apr 8, 2021
@mod-flux
Copy link
Author

mod-flux commented Apr 8, 2021

This is all great stuff, thanks @balazsorban44 and glad to see it's already in progress. I will checkout https://github.com/nextauthjs/adapters/issues/37 and have a peak at jose, which I assume is involved in some way in jwt.js. :-)

@ndom91
Copy link
Member

ndom91 commented Apr 8, 2021

Yeah we're hoping to be able to get the adapters out of the primary package so users who are just using jwts get the fastest version possible!

Unfortunately there hasn't been a ton of progress lately due to a lack of time on our side, so if your interested we'd be happy for any and all support :)

@balazsorban44
Copy link
Member

Some work has been laid out here just now: #1682

@mod-flux
Copy link
Author

mod-flux commented Apr 13, 2021

Some work has been laid out here just now: #1682

This is awesome, thanks for the heads up @balazsorban44 and your size reductions are looking awesome! Shall we close this issue and relate it to #1682? I can mention the outdated version of jose on there aswell so everything is in one thread.

@eak12913
Copy link

Before you folks close this - I just wanted to add that we're experiencing a very similar issue but we're not running our app on Lambda/Vercel - but in EKS and we're seeing the same behavior: The first request always takes " a while" (6-15 seconds observed). We are also seeing periodic multi second delays occasionally. It feels very similar to what @mod-flux describes in their issue but rather than a lambda function cache expiring - it seems like this is present even when lambda is not involved.

We are not really using TypeORM in our application (other than next-auth using it internally). We could consider writing our own DB adapter using something like KnexJS - but what I don't understand is whether next-auth would still load TypeORM even if we use a custom DB adapter?

For reference we're using:

    "next": "^10.0.7",
    "next-auth": "3.4.1",

@ndom91
Copy link
Member

ndom91 commented Apr 14, 2021

Hey no so if you don't define an adapter at all, i.e. your just using JWTs and not saving user info, then next-auth won't load an adapter at all. If you define your own adapter, it'll only load that one, whatever you define in the adapters object in the config 👍

We have some examples in the docs for writing your own adapter and are in the process of pulling them out into their own repo in nextauthjs/adapters if you want to take a look.

@eak12913
Copy link

eak12913 commented Apr 14, 2021

your just using JWTs and not saving user info,

Interesting - We are indeed using JWTs - but we are also using two Providers:

  • Goole (OAuth)
  • Email

By using these two providers next-auth is interacting with our DB. I believe that it's using TypeORM to do this (as we don't specify anything to the contrary in config)

I think the crux of my question above is:

  • Is my understanding that when next-auth uses TypeORM as the db adapter then we incur the "slow" initial response correct?
  • If I instead implement a custom db adapter via: https://next-auth.js.org/schemas/adapters#custom-adapter. Will this allow me to bypass TypeORM entirely and avoid the problematic behavior I'm seeing? In other words - will the Google and Email providers be forced to using my custom DB adapter and never load TypeORM?

We have some examples in the docs for writing your own adapter

Yep - I think this will likely be the route we have to take - but before we invest the time in writing an adapter I just wanted to double check that my own adapter would bypass having TypeORM.

@balazsorban44
Copy link
Member

balazsorban44 commented Apr 14, 2021

Currently typeorm (and prisma as well, actually) is pulled in/downloaded/bundled, no matter if you use a DB or not. (Or even if you use a custom adapter)

import adapters from '../adapters'

const adapter = userOptions.adapter ?? (userOptions.database && adapters.Default(userOptions.database))

This will hopefully be fixed in #1682, but this will require that you will have to install typeorm yourself if you actually use it.

In case you don't rely on a database (or you will write your own adapter with a much smaller footprint), the bundle size will drastically drop. At least that is the intent. Follow the PR on the progress.

@ndom91
Copy link
Member

ndom91 commented Apr 14, 2021

@balazsorban44 Yeah it'll get installed and what not obviously, but is it really then also bundled and shipped to the user even if you don't use any adapter?

@balazsorban44
Copy link
Member

balazsorban44 commented Apr 14, 2021

checked with webpack-bundle-analyzer, it was in the production build, bundled in the api handler, yes. (see the PR, I have a screenshot before and after)

@eak12913
Copy link

@balazsorban44 - Thanks for the clarification. This makes sense and I think our approach will have to be to wait for #1682 and then to write our own DB adapter.

Would you have a reasonable guess/noncommittal guess as to when that PR might make it into a release?

@robert-moore
Copy link

I understand that typeorm is large, but it seems that the big 6 second initial delay is probably from connecting to the DB, not from loading up the module within the serverless context?

@mod-flux
Copy link
Author

@robert-moore I'm not connecting to any external DB or source in the context of my original issue and investigation. That 6 seconds is purely booting with what's on the Lambda, no external factors.

@kripod
Copy link
Contributor

kripod commented May 3, 2021

Thank you for all these observations!

I’ve just set up connection pooling today (with pgbouncer) and found that cold starts still go way slower than expected, possibly due to the overwhelmingly large size of Lambda functions that next-auth might be responsible for.

@rozenmd
Copy link

rozenmd commented May 9, 2021

Random observation: I noticed if I check the request for a session, and check if the session is valid and in the db myself (via pg-promise), cold-boot time goes down to 2.8 seconds for logged in users (likely due to the size of the function).

The pseudocode of my hack is:

if (session in req.cookies){
  session = await check_session_in_db();
} else {
  session = await getSession({req});
}

@stale stale bot added the stale Did not receive any activity for 60 days label Jul 8, 2021
@nextauthjs nextauthjs deleted a comment from stale bot Jul 8, 2021
@stale stale bot removed the stale Did not receive any activity for 60 days label Jul 8, 2021
@balazsorban44
Copy link
Member

balazsorban44 commented Jul 19, 2021

So it's a bit hard to track this further down, as the issue at hand is not directly related to code. In a best effort, typeorm and prisma at least won't be bundled in v4, and we aim to keep the number of dependencies as low as possible for a small bundle size. If anyone has a better idea what could cause this issue, please open a new issue with a reproduction. 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adapters Changes related to the core code concerning database adapters enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants