Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add limit to eventsAfter #1740

Closed
wants to merge 9 commits into from

Conversation

snowteamer
Copy link
Collaborator

Close #1324

Summary of changes:

  • New MAX_EVENTS_AFTER envar, used in 'backend/db/streamEntriesAfter' and 'chelonia/out/eventsAfter'. If not set, the older behavior is assumed (no limit).

@snowteamer snowteamer force-pushed the limit-events-after branch 2 times, most recently from 543ddbd to 841ef42 Compare September 26, 2023 13:39
Copy link
Member

@Silver-IT Silver-IT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, @snowteamer. Just left one question.

@@ -51,6 +85,7 @@ sbp('sbp/selectors/register', {
const entry = await sbp('chelonia/db/getEntry', currentHEAD)
const json = `"${strToB64(entry.serialize())}"`
if (currentHEAD !== hash) {
const json = `"${strToB64(entry.serialize())}"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this added? Is it something intended?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed this doesn't seem necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snowteamer what's the update on this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been simplified and the duplicate line has been removed

Copy link
Member

@corrideat corrideat Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this was done the same way previously, but I'd use JSON.stringify(foo) instead of "${foo}"\ because that'll take care of escaping if there's ever a need to do so.

Copy link
Member

@taoeffect taoeffect left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @snowteamer - great PR!

Just left a review. I think we'd like to try a different approach based on an idea @corrideat had, what do you think?

@@ -44,10 +47,42 @@ sbp('sbp/selectors/register', {
}
let { HEAD: currentHEAD } = latestHEADinfo
let prefix = '['
if (MAX_EVENTS_AFTER) {
const circularList = createCircularList(MAX_EVENTS_AFTER, undefined)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a comment in createCircularList:

  // NOTE: this code doesn't let distinct instances share their method objects,
  // which would be bad for memory usage if many instances were created.
  // But that's fine since we're only using one so far.

Perhaps because this is no longer true because of this PR, we should break our "avoid classes" rule and create a CircularList class? Data structures are good candidates for classes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only using two instances is still not much, but I agree that data structures are good candidates for classes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the way this is written, an instance will be created on every request, and there can be multiple requests going on at the same time.

@@ -44,10 +47,42 @@ sbp('sbp/selectors/register', {
}
let { HEAD: currentHEAD } = latestHEADinfo
let prefix = '['
if (MAX_EVENTS_AFTER) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this check each time the selector is called, the check can be done once when defining the selector function, and using it to pick between two complete functions to assign to the selector.

Comment on lines 52 to 58
while (currentHEAD !== hash) {
const entry = await sbp('chelonia/db/getEntry', currentHEAD)
currentHEAD = entry.message().previousHEAD
circularList.add(entry)
}
const entry = await sbp('chelonia/db/getEntry', currentHEAD)
circularList.add(entry)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, @corrideat had a very interesting idea for how to avoid having to fetch all of the messages only to discard most of them: we can create a doubly-linked list in effect, by storing additional keys in the database that point forward instead of backward.

For example, if we have the chain:

hash1 <- hash2 <- hash3 <- hash4

Then when we add hash2, we also create an entry that looks like this: hash2-next with the value hash3

This way, when calling eventsAfter with a hashX, it very simply look up hashX-next and pull the value there, and so on and so forth until it hits MAX_EVENTS_AFTER

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah very interesting, should already be much faster than fetching every message, although we'd have to write these new entries to the database. So maybe not all contract chains need this database field, according to whether eventsAfter is going to be used often enough with them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snowteamer While I agree not every contract might make much use of this feature, for simplicity's sake let's do it for every contract. In the future, if we need to, we can worry about optimizing it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snowteamer I think it's easier to do it for all contracts as it's done now. Otherwise, you should probably also disable the eventsAfter API for contracts that don't have the nextHEAD record, as computing it from scratch is pretty expensive in a large contract.

@@ -149,6 +149,10 @@ export default (sbp('sbp/selectors/register', {
await sbp('chelonia/db/set', entry.hash(), entry.serialize())
await sbp('chelonia/db/set', getLogHead(contractID), JSON.stringify({ HEAD: entry.hash(), height: entry.height() }))
console.debug(`[chelonia.db] HEAD for ${contractID} updated to:`, entry.hash())
if (entryPreviousHEAD) {
await sbp('chelonia/db/set', `next=${entryPreviousHEAD}`, entry.hash())
console.debug(`[chelonia.db] next hash for ${entryPreviousHEAD} updated to:`, entry.hash())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have enough logging for this function, and adding this extra logging would be a bit unnecessary and excessive. Can remove?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

@taoeffect taoeffect requested a review from corrideat February 21, 2024 16:54
corrideat
corrideat previously approved these changes Feb 21, 2024
Copy link
Member

@corrideat corrideat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I left a comment with one small potential improvement.

Copy link
Member

@taoeffect taoeffect left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review ready!

}
}
} catch (e) {
console.error(`read(): ${e.message}:`, e)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few concerns:

  1. Error messages need to immediately point us to where in the code the problem comes from. "read()" is less helpful than something like: ([backend] streamEntriesAfter error: ${e.message}, e)
  2. If there was an error, that needs to be sent back to the client as a Boom.internalError (check me on that) instead of proceeding as if there was no error.

} catch (e) {
console.error(`read(): ${e.message}:`, e)
}
entries.reverse()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entries were reversed before because they had to be (since we couldn't go forwards).

Now that that's no longer the case, there's no reason to artificially reverse the entries, so please remove this line and update the other code accordingly.

I've gone ahead and also opened an issue to update the chel command once this PR is merged: okTurtles/chel#33

@@ -36,32 +37,49 @@ if (!fs.existsSync(dataFolder)) {
fs.mkdirSync(dataFolder, { mode: 0o750 })
}

// Streams stored contract log entries since the given entry hash (inclusive!), most recent first.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like with eventsBefore API, clients should be able to specify the number of eventsAfter, so please add an extra parameter to this function to take that number in (and update routes.js as well).

If the number isn't passed in, then use MAX_EVENTS_AFTER.

shared/domains/chelonia/chelonia.js Show resolved Hide resolved
Comment on lines 53 to 59
while (currentHash && (entries.length < MAX_EVENTS_AFTER)) {
const entry = await sbp('chelonia/db/getEntry', currentHash)
if (entry) {
entries.push(entry)
currentHash = await sbp('chelonia/db/get', `next=${currentHash}`)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire idea behind streams is that they allow us to avoid having to read all of the data into memory.

With the way this is currently implemented, we are reading all of the entries into memory, and therefore there's no reason to even return a readable stream, we could just return the array we read into memory.

But we do want to return a readable stream. What if there's a contract with 1000000 entries in it? Reading all of those entries into memory and returning them would be very hard on the server, and could crash it.

So, could you please remove this code, and instead use the readable stream below to pull out entries on an as-needed basis (e.g. no more than 5 entries per call to read())

prefix = ','
} else {
this.push(prefix + json + ']')
read (): void {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be safe, since promises are being used, can you please re-mark this function as async and use await instead of .then inside of the function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I do, then Flow is going to yell at me 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh really? What's the error? @snowteamer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taoeffect Roughly that read() should return void, but making it async causes it to return Promise instead

Copy link
Member

@taoeffect taoeffect left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left 2 comments + one more request: I noticed that 'chelonia/out/eventsBetween' is also calling .reverse() because it's based on the old way of traversing backwards instead of forwards.

What are you thoughts on this? We could technically update the code to use the new method and then the call to .reverse() wouldn't be necessary there either. The only place it would be needed would be on the call to 'chelonia/out/eventsBefore'.

sbp('sbp/selectors/register', {
'backend/db/streamEntriesAfter': async function (contractID: string, hash: string): Promise<*> {
'backend/db/streamEntriesAfter': async function (contractID: string, hash: string, limit: number = MAX_EVENTS_AFTER): Promise<*> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both the calls from the frontend (in chelonia.js) and on the server (in routes.js) need to be updated to support this parameter now too.

if (!Array.isArray(nextEvents) || nextEvents.length < 2) break
// Avoid duplicating the intermediate event in the result.
events.pop()
events.push(...nextEvents)
Copy link
Member

@taoeffect taoeffect Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that in some languages, there's a limit to how many arguments you can use in a function call. I don't know if that's true for JavaScript though...

It seems this limit is large and somewhat undefined: https://stackoverflow.com/questions/22747068/is-there-a-max-number-of-arguments-javascript-functions-can-accept

So if we're worried about this, an alternative would be to reassign the variable using events = events.concat(nextEvents).

Personally I would do it this way just to be safe

@corrideat corrideat mentioned this pull request Mar 4, 2024
@taoeffect
Copy link
Member

Thank you @snowteamer for your work on this, it's being picked up and iterated on further in #1876 if you'd like to follow along there (you are welcome to help review if you're back)

@taoeffect taoeffect closed this Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add limit to eventsAfter (formerly eventsSince)
4 participants