Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NETBEANS-4123] Initial implementation of handling large strings #5299

Merged
merged 1 commit into from
Feb 10, 2023

Conversation

SirIntellegence
Copy link
Contributor

@SirIntellegence SirIntellegence commented Jan 16, 2023

in Java 9+ where the String class uses a byte[] instead of a char[]

Not a final implementation at the moment, but should be enough for now. I didn't see any tests related to this class, so I didn't write any. I have tested it and it works properly. May need more testing.

Handling UTF16 byte[] strings would require more effort since it becomes a nightmare.

@SirIntellegence
Copy link
Contributor Author

SirIntellegence commented Jan 16, 2023

This is intended to (mostly) fix #4123
Mostly because I am not sure how to handle the UTF16 stuff for the reasons mentioned in the source file.

@mbien mbien added the Java [ci] enable extra Java tests (java.completion, java.source.base, java.hints, refactoring.java, form) label Jan 16, 2023
@mbien mbien added this to the NB18 milestone Jan 16, 2023
@matthiasblaesing
Copy link
Contributor

Please add the necessary imports - the proposed changes also need corresponding import adjustments. Thank you.

@SirIntellegence
Copy link
Contributor Author

Ya. I am aware. I am still working in it. I did find out how to grab the hi byte field, so I added logic in for that. I will get to that tomorrow.

private static boolean isSmallEndian(VirtualMachine virtualMachine) {
//TODO: Possibly cache this value
List<ReferenceType> possibleClasses = virtualMachine.classesByName(
"java.lang.StringUTF16");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be cacheable with a WeakHashMap using the Virtualmachine as Key. This might be slow if the target VM is not local. I experimented a bit with this code and on JDK 17 initially possibleClasses was empty for me. Not sure how the VM handled this interernally, but I could make it work using this code:

        if (possibleClasses.isEmpty()) {
            ClassType ct = (ClassType) virtualMachine.classesByName("java.lang.Class").iterator().next();
            Method m = ct.concreteMethodByName("forName", "(Ljava/lang/String;)Ljava/lang/Class;");
            StringReference referenceString = virtualMachine.mirrorOf("java.lang.StringUTF16");
            try {
                ThreadReference threadReference = virtualMachine.allThreads().get(0);
                ct.invokeMethod(threadReference, m, Collections.singletonList(referenceString), 0);
                possibleClasses = virtualMachine.classesByName("java.lang.StringUTF16");
            } catch (InvalidTypeException | ClassNotLoadedException | IncompatibleThreadStateException | InvocationException ex) {
                Exceptions.printStackTrace(ex);
            }
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aye. Will get to that. Do we need to handle multiple threads or are we good without using concurrent structures here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my test I had to make isSmallEndian synchronized - else I got into a IncompatibleThreadStateException. Not really sure whether this hammer just fixed the problem accidentally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Should I slap a synchronized onto it, not worry about it, or find another solution?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me and looks sane. The caching you introduced works (at least in my trivial testcase).

One final idea would be to rework the synchronization. The maps are protected by individual sychronized() {} blocks. The pattern could be applied here two, which would also not leak the sychronization to the outside.

My suggestion is to move the isLittleEndianCache.clear(); into a separate block synchronized (isLittleEndianCache) {isLittleEndianCache.clear();}, drop the synchronized from the signature of isLittleEndian and instead wrap the whole method contents into a synchronized (isLittleEndianCache) {....} block.

Copy link
Contributor

@matthiasblaesing matthiasblaesing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final cleanup (see inline comments). Please also squash the commits into a single one.

@SirIntellegence
Copy link
Contributor Author

While applying those changes, noticed that getStringWithLengthControl is declared to throw some exceptions. Should isLittleEndian throw the internal exceptions as well instead of catching them?

@matthiasblaesing
Copy link
Contributor

While applying those changes, noticed that getStringWithLengthControl is declared to throw some exceptions. Should isLittleEndian throw the internal exceptions as well instead of catching them?

I have mixed feelings here. The use sites don't give information about what went wrong, which would make it hard to debug. What would you think about this idea: let isLittleEndian throw all its exceptions, but add a multi catch for these before the finally-clause before 367, log errors there and rethrow. I would then not use Exceptions#printStackTrace, but a java.util.Logger on level INFO. printStackTrace creates useless message window.

@SirIntellegence
Copy link
Contributor Author

Pushing the exception lift change so it can be reviewed before squishing everything

@matthiasblaesing
Copy link
Contributor

To me this looks sane, thank you. It is an impressive list of Exceptions and we might want to refactor these calls later, but I think as a fix for the issue at hand this should be merged.

I requested a first CI run and the system is happy. So, I suggest that you squash the commit as you see fit. I would suggest to squash into a single commit as all commit cover the same area and there are not really smaller building blocks. If then GH actions is happy, I'll merge.

In the debugger (+7 squashed commits)
Squashed commits:
[5e45018] [NETBEANS-4123] Lifted exceptions from ShortenedStrings.isLittleEndian

Waiting for this change to be reviewed before squishing
[fc97d86] [NETBEANS-4123] Checkpoint before extracting exceptions...
[e6cedcd] [NETBEANS-4123] Moved sync blocks around as per @matthiasblaesing
[1b1da08] [NETBEANS-4123] Now caching whether or not a VM is little endian

Check is now deferred to when it is needed instead of right away.
[4f52c49] [NETBEANS-4123] Finished implementing the handling of long strings

Renamed 'compressed' things to 'compact'
Tested with a long UTF16 string on little endian. It detects the byte order properly
[e8d001f] [NETBEANS-4123] Apply suggestions from code review

Will apply other changes and testing in a bit since you can't do those in GitHub

Co-authored-by: Matthias Bläsing <mblaesing@doppel-helix.eu>
[2caef56] [NETBEANS-4123] Initial implementation of handling large strings

in Java 9+ where the String class uses a byte[] instead of a char[]
@SirIntellegence
Copy link
Contributor Author

And squashed

@matthiasblaesing
Copy link
Contributor

Thank you. As the unittests finally run cleanly through, lets get this in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Java [ci] enable extra Java tests (java.completion, java.source.base, java.hints, refactoring.java, form)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants