Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quirrel queue is blocked indefinitely by using exclusive: true #739

Closed
mattjis opened this issue Aug 19, 2021 · 26 comments · Fixed by #741
Closed

Quirrel queue is blocked indefinitely by using exclusive: true #739

mattjis opened this issue Aug 19, 2021 · 26 comments · Fixed by #741

Comments

@mattjis
Copy link

mattjis commented Aug 19, 2021

I'm not too sure what the issue is, but my api endpoints aren't being called.

Are you able to lookup my account and figure out why it's not working?

The domain is hal.vercel.app

@mattjis
Copy link
Author

mattjis commented Aug 19, 2021

When I schedule them everything seems fine:

{
	id: '6885672091808',
	endpoint: 'https://hal.vercel.app/api/products/raffle',
	body: null,
	runAt: 2021-08-19T07:00:34.449Z,
	exclusive: true,
	retry: [],
	count: 1,
	delete: [Function: delete],
	invoke: [Function: invoke]
}

But the endpoint is never called :(

@Skn0tt
Copy link
Member

Skn0tt commented Aug 19, 2021

Hi @mattjis! When did you enqueue this job? I see that it has exclusive: true and suspect it could be related to a change in Owl (Quirrel's scheduler) that I merged 2 days ago.

@mattjis
Copy link
Author

mattjis commented Aug 19, 2021

Hey Simon!

I've attempted to queue about 5 or 10 jobs over the past couple of hours and none of them are triggering.

@Skn0tt
Copy link
Member

Skn0tt commented Aug 19, 2021

Thank you. Looking into it now.

@mattjis
Copy link
Author

mattjis commented Aug 19, 2021

I did get a weird error right before it stopped working, probably unrelated to Quirrel but I'll paste it here anyway. I'm not doing anything with base64 so it was a bit odd.

[POST] /api/products/raffle
13:30:46:07
2021-08-19T05:30:46.782Z	ca5c8e12-1a82-4ed9-bb23-360ddbc5acf4	ERROR	TypeError: Cannot read property 'length' of undefined
	at getLens (/var/task/node_modules/base64-js/index.js:23:17)
	at Object.toByteArray (/var/task/node_modules/base64-js/index.js:55:14)
	at unpackMessage (/var/task/node_modules/secure-e2ee/dist/base-encryptor.js:104:51)
	at Encryptor.<anonymous> (/var/task/node_modules/secure-e2ee/dist/base-encryptor.js:156:30)
	at step (/var/task/node_modules/secure-e2ee/dist/base-encryptor.js:75:23)
	at Object.next (/var/task/node_modules/secure-e2ee/dist/base-encryptor.js:56:53)
	at /var/task/node_modules/secure-e2ee/dist/base-encryptor.js:50:71
	at new Promise (<anonymous>)
	at __awaiter (/var/task/node_modules/secure-e2ee/dist/base-encryptor.js:46:12)
	at Encryptor.BaseEncryptor.decrypt (/var/task/node_modules/secure-e2ee/dist/base-encryptor.js:151:16)
2021-08-19T05:30:46.782Z	ca5c8e12-1a82-4ed9-bb23-360ddbc5acf4	ERROR	TypeError: Cannot read property 'length' of undefined
	at getLens (/var/task/node_modules/base64-js/index.js:23:17)
	at Object.toByteArray (/var/task/node_modules/base64-js/index.js:55:14)
	at unpackMessage (/var/task/node_modules/secure-e2ee/dist/base-encryptor.js:104:51)
	at Encryptor.<anonymous> (/var/task/node_modules/secure-e2ee/dist/base-encryptor.js:156:30)
	at step (/var/task/node_modules/secure-e2ee/dist/base-encryptor.js:75:23)
	at Object.next (/var/task/node_modules/secure-e2ee/dist/base-encryptor.js:56:53)
	at /var/task/node_modules/secure-e2ee/dist/base-encryptor.js:50:71
	at new Promise (<anonymous>)
	at __awaiter (/var/task/node_modules/secure-e2ee/dist/base-encryptor.js:46:12)
	at Encryptor.BaseEncryptor.decrypt (/var/task/node_modules/secure-e2ee/dist/base-encryptor.js:151:16)
RequestId: ca5c8e12-1a82-4ed9-bb23-360ddbc5acf4 Error: Runtime exited with error: exit status 1
Runtime.ExitError

@Skn0tt
Copy link
Member

Skn0tt commented Aug 19, 2021

Oh, that is indeed not unrelated! Could you update to v1.7.1 and tell me if that fixes your issue?

@mattjis
Copy link
Author

mattjis commented Aug 19, 2021

@Skn0tt I'm already on 1.7.1

"quirrel": "^1.7.1",

@Skn0tt
Copy link
Member

Skn0tt commented Aug 19, 2021

Hmm, interesting. The log you posted is related to Quirrel's E2E encryption and was fixed via #677. I then suspect you're observing a different error now. Looking into it.

@mattjis
Copy link
Author

mattjis commented Aug 19, 2021

Should I try deleting node_modules and package-lock.json to reinstall everything?

@mattjis
Copy link
Author

mattjis commented Aug 19, 2021

This is what my enqueue code looks like too, in case there's anything helpful (e.g null as payload):

await RaffleQueue.enqueue(null, {
	id: "6771665862836",
	override: true,
	exclusive: true,
	runAt: new Date("2021-09-01T06:00:00.000Z")
})

@Skn0tt
Copy link
Member

Skn0tt commented Aug 19, 2021

No, that won't help. I had a look at Quirrel's production database, and it seems like there's a case where queues with exclusive: true can get blocked indefinitely. If you need a quick fix for this, I recommend to change the name of the queue (place it at a different API endpoint) and don't use the exclusive: true feature for now.
I'll need to do some more digging to find the root cause of this issue. Is it urgent for you? If yes, I can try looking into it tonight.

@Skn0tt Skn0tt changed the title Quirrel callbacks aren't triggering Quirrel queue is blocked indefinitely by using exclusive: true Aug 19, 2021
@mattjis
Copy link
Author

mattjis commented Aug 19, 2021

Whenever you get a chance! We'll just have to delay a few products from releasing until Monday, which isn't a huge issue.

@Skn0tt
Copy link
Member

Skn0tt commented Aug 19, 2021

Okay. Will try sorting this tonight, but can't guarantee anything.

@Skn0tt
Copy link
Member

Skn0tt commented Aug 19, 2021

Digging through the logs, I find some entries that make me think your Queue enqueues jobs into itself, re-using IDs. A bit like this:

const queue = Queue<Product | null>("/api/raffle", async (payload, { id }) => {
  if (!payload) {
    const product = await getProduct(id)
    await queue.enqueue(product, { id, override: true, exclusive: true });
  }
})

export default queue

// somewhere:
queue.enqueue(null, { id: "....", delay: "3 seconds", override: true, exclusive: true });

Can you confirm that?

@Skn0tt
Copy link
Member

Skn0tt commented Aug 19, 2021

I did some more testing, and this seems to indeed be the case. Here's roughly what happens:

  1. You enqueue job A (exclusive, override: true).
  2. Job A starts execution.
  3. Your application is called, which will enqueue a new job under the same ID (let's call it job A').
  4. Job A finishes execution, and consequently deletes all of its data, because it assumes that data is no longer needed.
  5. Job A' starts execution, but all it's data has been deleted in the previous step! Undefined behaviour happens, and leads to your queue being blocked indefinitely.

This can not only occur because a queue enqueues into itself, but also because of bad timing in general. I'll prioritise introducing a fix for that.
For your specific case, I think there's an easier way to go forward:

  1. reset your queue by moving it to another endpoint. /api/products/raffle is in an undefined state at the moment, and I'd prefer to not fix that manually in the database. So maybe rename it to /api/products/raffle?reset=1 or /api/products/raffle-2.
  2. make sure that, while executing a job, you don't enqueue a job with the same ID.

Please let me know if this resolved the issue you're seeing.

@mattjis
Copy link
Author

mattjis commented Aug 19, 2021

Yes that's right.

First I enqueue the job when people enter a raffle:

await RaffleQueue.enqueue(null, {
	id: String(sanityProduct.shopifyId),
	override: true,
	exclusive: true,
	runAt: new Date(sanityProduct.date)
})

And then when the raffle has ended it needs to be run x (unknown) amount of times to charge all of the winners. Each batch sets up another callback via quirrel using:

return RaffleQueue.enqueue(
	{ variants: product.variants },
	{
		id: String(product.id),
		exclusive: true,
		override: true,
		delay: 1
	}
)

@mattjis
Copy link
Author

mattjis commented Aug 19, 2021

The thing is though, to me, the queue ID being the product ID is a good way of making sure it's never called back multiple times. And I use it to look up all the product information required to finalise the raffle.

@Skn0tt
Copy link
Member

Skn0tt commented Aug 19, 2021

I see. Is there a way for you to split up the two kinds of work into two different queues?

@Skn0tt
Copy link
Member

Skn0tt commented Aug 19, 2021

Or by appending some incrementing counter to the second call's ID?

@mattjis
Copy link
Author

mattjis commented Aug 19, 2021

Well there's really only one job.

If I sent null as the payload in "part 2" it would work exactly the same. It's just a double check in case quantities don't match.

@mattjis
Copy link
Author

mattjis commented Aug 19, 2021

I guess a counter could work, but it does concern me that I'll somehow end up queuing multiple for the same product and then charge extra people accidentally.

@mattjis
Copy link
Author

mattjis commented Aug 19, 2021

It only sets up another queue once it's finished processing the current one — as Vercel has a 60s limit on serverless functions. So that's why I have it set up to "start again" under the same ID to sort of continue where it left off.

@Skn0tt
Copy link
Member

Skn0tt commented Aug 19, 2021

It seems like I was able to fix the issue: quirrel-dev/owl#180

I hope I can get this merged & shipped tomorrow 🤞

@mattjis
Copy link
Author

mattjis commented Aug 19, 2021

Thank you!

@mattjis
Copy link
Author

mattjis commented Aug 19, 2021

/api/products/raffle is in an undefined state at the moment

Will I have to change endpoints or ids?

@Skn0tt
Copy link
Member

Skn0tt commented Aug 20, 2021

Alright, fix is merged & deployed! You'll have to change the endpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants