Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix indexing large smithy-build.json #13

Merged

Conversation

milesziemer
Copy link
Contributor

Fixes an issue where a UTFDataFormatException would be thrown from SmithyBuildConfigurationIndex when saving the json value. Based on the DataOutput.writeUTF docs:
https://docs.oracle.com/javase/8/docs/api/java/io/DataOutput.html#writeUTF-java.lang.String- this happens when the number of bytes to be written is > 65535, so if you have a very large smithy-build.json it throws. For some reason this
can also cause intellij to just index in an infinite loop, eventually running out of memory.

I fixed it by writing the value out as bytes, which also required reading it in differently too otherwise it couldn't be deserialized after being read in again. There may be a better way to do this though.

Tested with runIde and opening the project that was previously giving me problems.

@@ -47,9 +47,17 @@ class SmithyBuildConfigurationIndex : SingleEntryFileBasedIndexExtension<SmithyB
override fun getValueExternalizer(): DataExternalizer<SmithyBuildConfiguration> =
object : DataExternalizer<SmithyBuildConfiguration> {
override fun save(out: DataOutput, config: SmithyBuildConfiguration) =
out.writeUTF(SmithyJson.writeValueAsString(config))
out.writeBytes(SmithyJson.writeValueAsString(config))
Copy link
Owner

@iancaffey iancaffey Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

override fun save(out: DataOutput, config: SmithyBuildConfiguration) {
    val bytes = SmithyJson.writeValueAsBytes(config)
    out.write(bytes.size)
    out.write(bytes)
}

would simplify the read logic a bit

override fun read(`in`: DataInput): SmithyBuildConfiguration {
    val bytes = ByteArray(`in`.readInt())
    `in`.readFully(bytes)
    return SmithyJson.readValue<SmithyBuildConfiguration>(bytes)
}

typically I wouldn't mind either approach, but DataInput#readLine is kinda funky (e.g. DataInputStream's implementation is deprecated due to some bugs), so would be great to avoid it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea makes sense. I had to make one minor adjustment to the suggestion, using DataOutput#writeInt to write the size of the file to avoid integer overflow when reading it back in - not sure if that was just a typo in the suggestion.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. :) Was moving too quickly, saw that write took in an int and went with it. :P

line = `in`.readLine()
}
return SmithyJson.readValue<SmithyBuildConfiguration>(value)
}
}

override fun getVersion() = 2
Copy link
Owner

@iancaffey iancaffey Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index versions need to get bumped to force re-indexing.

ignore this, I don't think IntelliJ caches "negative"/exceptional results (esp. since you mentioned seeing infinite indexing), so no need to force it.

Pretty sure the issue I had seen before that was similar to this had logic in the indexer to default to null when failing to parse AST files (since there was no bonafide way to infer a json file was a valid AST without blindly parsing).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up changing the version - it did cause Intellij to immediately re-index the file, which was useful when I was testing. Previously I had to wait a few minutes for it to index it

Fixes an issue where a UTFDataFormatException would be thrown from
SmithyBuildConfigurationIndex when saving the json value. Based on
the DataOutput.writeUTF docs:
https://docs.oracle.com/javase/8/docs/api/java/io/DataOutput.html#writeUTF-java.lang.String-
this happens when the number of bytes to be written is > 65535, so
if you have a very large smithy-build.json it throws.

I fixed it by writing the value out as bytes, which also required
reading it in differently too otherwise it couldn't be deserialized
after being read in again.

Tested with `runIde` and opening the project that was previously
giving me problems.
@milesziemer milesziemer force-pushed the fix-utf-data-format-exception branch from 79c757f to d1c754b Compare December 4, 2023 14:43
@iancaffey iancaffey merged commit 13c9f12 into iancaffey:main Dec 4, 2023
@iancaffey
Copy link
Owner

Thanks @milesziemer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants