-
-
Notifications
You must be signed in to change notification settings - Fork 886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change from v6 to v7: schema object itself used as a key for compiled schema function, not serialization #1413
Comments
That is not a memory leak, it is a change in v7. v6 used serialised schema as a key to compiled schema functions, v7 uses the actual schema object as a key to the map - it actually reduces memory utilisation in real life use cases by the size of the schema. The change is covered in release notes, but it may be worth mentioning in the docs. |
Just like OP, I was bitten by this change because the JSON schema was constant but with different references: export function getValidatorForMapping(
mappings: ObjectMapping
): ValidateFunction<unknown> {
return ajv.compile(translateObjectMapping(mappings));
} It was totally my mistake for missing that part of the release notes, but I'd like to point out that you have at least two people presenting real life use cases that the new behavior didn't favor. I'll update our code to cache a reference for the schema ourselves and I understand that in most scenarios the new behavior will save memory and be beneficial. |
It is covered in this post: https://www.poberezkin.com/posts/2021-02-11-ajv-version-7-big-changes-and-improvements.html#caching-compiled-schemas Given that his change highlights incorrect usage of Ajv I think it should be kept as is for now - to be re-assessed as migration to v7 continues. The argument presented in #1098 to remove serialisation was quite compelling... |
@epoberezkin can you confirm that this is expected and desirable behaviour - it seems according to everything I've read in this and related issues that incorrect usage of ajv is using the same id for two schema that don't pass const childSchema = {
$id: '#/child',
type: 'string'
} as const
const parentSchema = {
$id: '#/parent',
type: 'object',
properties: {
child1: childSchema,
child2: childSchema
}
} as const
ajv.addSchema( parentSchema )
This only happens when sibling properties share a schema - this works fine: const childSchema = {
$id: '#/child',
type: 'string'
} as const
const parent1Schema = {
$id: '#/parent-1',
type: 'object',
properties: {
child: childSchema
}
} as const
const parent2Schema = {
$id: '#/parent-2',
type: 'object',
properties: {
child: childSchema
}
} as const
ajv.addSchema( parent1Schema )
ajv.addSchema( parent2Schema ) I can do some pre-processing to eg remove the $id from subschema before passing to ajv if necessary, but I would like to confirm with you that ajv is working as expected in this specific instance first Thank you! |
That is expected, you should not be assigning IDs like that. You can put a shared schema in the definition and use the same reference, you do not really need $id in it. |
... or you can reference it as an external schema (added to ajv instance with addSchema, not inlined) |
OK - thank you - in our case I cannot do either of those and will have to do a |
The memory leak also occurs when there's no We could switch to using a WeakMap at some places. Maybe we should create a new issue for it. Here's a reproduction: 'use strict'
async function start () {
const Ajv = require('ajv')
const baseSchema = {
type: 'object',
required: ['first', 'second'],
properties: {
first: {type: 'string'},
second: {type: 'string'},
third: {type: 'string'},
fourth: {type: 'string'},
fifth: {type: 'string'},
sixth: {type: 'string'},
seventh: {type: 'string'}
}
}
const total = 10000
let ajv, i, remaining
function fixedCompile (schema) {
const validate = ajv.compile(schema)
ajv.removeSchema(schema)
// Force reset values that aren't reset with removeSchema
ajv.scope._values.schema.delete(schema)
ajv.scope._values.validate.delete(validate)
const schemaIdx = ajv.scope._scope.schema.indexOf(schema)
const validateIdx = ajv.scope._scope.validate.indexOf(validate)
if (schemaIdx !== -1) ajv.scope._scope.schema.splice(schemaIdx, 1)
if (validateIdx !== -1) ajv.scope._scope.validate.splice(validateIdx, 1)
return validate
}
function brokenCompile (schema) {
const validate = ajv.compile(schema)
ajv.removeSchema(schema)
return validate
}
setInterval(() => {
global.gc?.()
const used = process.memoryUsage().heapUsed / 1024 / 1024
console.log(
`${total - remaining}:`,
`The script uses ${Math.round(used * 100) / 100} MB`
)
}, 500).unref()
console.log('With leak\n----------')
ajv = new Ajv({addUsedSchema: false})
i = 1000
remaining = total
while (remaining--) {
brokenCompile({...baseSchema})
if (--i === 0) {
i = 1000
await new Promise((r) => setTimeout(r, 1))
}
}
console.log('\n\nWithout leak\n----------')
ajv = new Ajv({addUsedSchema: false})
i = 1000
remaining = total
while (remaining--) {
fixedCompile({...baseSchema})
if (--i === 0) {
i = 1000
await new Promise((r) => setTimeout(r, 1))
}
}
}
start()
|
I understand the demo - this is an incorrect usage. You should not allocate constant javascript objects within the functions - they will be indeed different object instances - they should be allocated on the module level. In case of using these schemas with Ajv, they are compiled every time. As I wrote above, this tradeoff is explained in #1098 - it penalises incorrect usage more than the previous choice in v6, but it reduces memory usage in case schemas are allocated once. |
In our case we need to generate validation functions for dynamic user-defined schemas which are persisted in a remote storage. The return values of I expected that at least the I expected that behavior for In the specific case we have, only the first level properties are custom. We could loop through all the keys on the first level and call the validate function for each structure. But extending the |
For better performance, you could generate "standalone" code and save it to storage as well, and then require it from string - so it would need to be compiled only on creation / update
there is an option |
looking at your code - you are right, the shared scope needs cleaning up. |
I'd say we want to have the following behavior: // Remove should always try to remove by Object and $id
ajv.removeSchema({type: 'object', properties: {first: {type: 'string'}}})
// With addUsedSchema: true
// Cached by Object
ajv.validate({type: 'object', properties: {first: {type: 'string'}}})
// Cache by Object and $id
ajv.validate({$id: 'something', type: 'object', properties: {first: {type: 'string'}}})
// No caching at all (not optimized for multiple calls, maybe we could still cache for consistency)
ajv.compile({type: 'object', properties: {first: {type: 'string'}}})
// Cache by Object and $id
ajv.compile({$id: 'something', type: 'object', properties: {first: {type: 'string'}}})
// With addUsedSchema: false
// Cached by Object
ajv.validate({type: 'object', properties: {first: {type: 'string'}}})
// Cache by Object
ajv.validate({$id: 'something', type: 'object', properties: {first: {type: 'string'}}})
// No caching at all (not optimized for multiple calls)
ajv.compile({type: 'object', properties: {first: {type: 'string'}}})
// No caching at all (not optimized for multiple calls)
ajv.compile({$id: 'something', type: 'object', properties: {first: {type: 'string'}}}) Maybe another option is to get rid of the |
I am facing with the same problem, over 2GB of memory leak and increasing :( Definitely there is a bug or I don't understand the const Ajv = require("ajv")
const ajv = new Ajv({
allowUnionTypes: true
})
function printMemory(counter) {
const memoryUsage = Object.entries(
process.memoryUsage()
).reduce((object, [name, memory]) => {
object[name] = Math.ceil(memory / 1024 / 1024)
return object
}, {})
console.log({
counter,
...memoryUsage
})
}
function repeat(total, rutine) {
while(total--) rutine(counter++)
}
function rutine(counter) {
ajv.compile({ $id: counter.toString(), type: "string" })
ajv.removeSchema(counter.toString())
ajv.removeSchema() // it has not effect
if (!(counter % 1000))
printMemory(counter)
}
let counter = 0
//while(true) rutine(index++)
setInterval(() => repeat(1000, rutine), 0) How could I clear compile memory usage? Meanwhile I am going to downgrade the version. |
The problem that needs fixing is that in case
So the solution would be to use A better option would be whenever possible to compile schemas only once - see this page on some suggestions: https://ajv.js.org/guide/managing-schemas.html |
Firstly, thanks for your reply and your awesome job.
I don't understand why
That is not possible, at least for my case. The JSON schemas must be reloaded without restarting the server and others schemas are dynamics. |
The dependency graph that can exist between different schemas makes it impossible in some cases to fully remove it. While the top level schema is removed, some of its fragments may be used elsewhere... If you really need a full clean-up you need to discard the instance and create the new one - which is even more inefficient than re-compiling every time. Another approach is not to use Ajv as cache - you can serialise and cache schemas on the application level - doing what Ajv v6 was doing.
I understand it, but using |
Hi! I have noticed that after updating from v6 to v7, if you pass the same schema but with a different reference there's an increment on the memory used leading this to a potential memory leak. I understand that this is not the recommended way to use the library, but because there's not explicit mention about this change (that I have found) and I think other people could have not noticed this, adding a warning to the changelog or reviewing this if it was not an intentional change would be helpful.
Here is an example where the difference in memory can be observed between 6.12.6 and 7.0.2
The text was updated successfully, but these errors were encountered: