-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent Scavenger Read Barrier evaluator on X86-64 #3075
Conversation
Requires eclipse-omr/omr#2974 and eclipse-omr/omr#3012 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exciting to see some ground work for a concurrent scavenger on x86!
I've called out a couple things that should be addressed before this is merged.
5edcc5f
to
071b21d
Compare
@@ -700,13 +700,20 @@ MM_CollectorLanguageInterfaceImpl::scavenger_switchConcurrentForThread(MM_Enviro | |||
j9tty_printf(PORTLIB, "%p: Nursery [%p,%p] Evacuate [%p,%p] GS [%p,%p] Section size 0x%zx, sections %lu bit offset %lu bit mask 0x%zx\n", | |||
vmThread, nurseryBase, nurseryTop, gsBase, gsTop, pageBase, pageTop, _extensions->getConcurrentScavengerPageSectionSize() , sectionCount, startOffsetInBits, bitMask); | |||
} | |||
#if defined(J9VM_GC_COMPRESSED_POINTERS) && defined(J9VM_ARCH_X86) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see a GC reviewer on this bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amicic Could you take a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think there is a need to restrict for X86. For any plaform we should have consistent behavior in changing vmThread attributes whether address range of evacutate, privateFlags etc. Some may be not be used for specific platform (in fact it's more related to whether s/w or h/w read barrier is used), but there is not much cost in maintaining them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular, would it make sense to have some #error
in here when CS is defined, but the platforms options don't have a good answer for the values (and some comments).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked and vmThread->evacuateBase/Top is used in Z (through thisThreadGetEvacuateBaseAddressOffset() and replaceGuardedLoadWithSoftwareReadBarrier()) for our first attempt of s/w based CS. But this should go away. New implementation should be similar to X86 and should also require these fields to be compressedrefs when applicable (@fjeremic please confirm) .
It would be nice to add a comment (for example where these two fields are declared) that they are compressed refs. It is fairly rare that VM/GC meta structures store heap refs as compressed refs.
48293b1
to
6f7c949
Compare
I'd forgotten about the PARM_REG macro - doesn't work on 32-bit, but I'll implement it now that we're using fastcall (at your original suggestion), so at least things with 2 or fewer parameters will work everywhere, and will error out properly if we try to use more than 2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from the helper side of things - I'll have some updates to this code soon when I refactor how calling C works (OSX has made things a bit more difficult).
@DanHeidinga could you have a look now that @gacholio has approved to see if you are happy - I'm going to do a review now from the JIT side. @0dvictor could you confirm if this still needs to be WIP? The dependencies have been merged I believe and it looks like we should be done a review soon so I'd like to make sure this is ready to go soon if possible. |
It is currently marked as WIP only because I have a test run pending. I'll remove the WIP as soon as it cleared. |
@@ -960,6 +960,79 @@ TR::Register *J9::X86::AMD64::TreeEvaluator::conditionalHelperEvaluator(TR::Node | |||
#endif | |||
|
|||
|
|||
TR::Register *J9::X86::TreeEvaluator::irdbarEvaluator(TR::Node *node, TR::CodeGenerator *cg) | |||
{ | |||
TR_ASSERT(TR::Compiler->om.shouldGenerateReadBarriersForFieldLoads(), "iReadBarrierEvaluator"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm - this will be changed soon by the field watch work so I'm wondering if we should structure this more like if shouldGenerateReadBarriersForFieldLoads then do this sequence otherwise just treat it like an aloadi and other can add other meanings and uses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion; however without seeing the actual change of field watch work, it is hard to image exactly what need be checked. If field watch work is merged before this PR, I would definitely updated this change accordingly; otherwise, I suggest to massage the code when field watch is delivered.
@amicic Are you OK with the initialization code? You're most familiar with the existing sequence on z |
@@ -700,13 +700,20 @@ MM_CollectorLanguageInterfaceImpl::scavenger_switchConcurrentForThread(MM_Enviro | |||
j9tty_printf(PORTLIB, "%p: Nursery [%p,%p] Evacuate [%p,%p] GS [%p,%p] Section size 0x%zx, sections %lu bit offset %lu bit mask 0x%zx\n", | |||
vmThread, nurseryBase, nurseryTop, gsBase, gsTop, pageBase, pageTop, _extensions->getConcurrentScavengerPageSectionSize() , sectionCount, startOffsetInBits, bitMask); | |||
} | |||
#if defined(J9VM_GC_COMPRESSED_POINTERS) && defined(J9VM_ARCH_X86) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think there is a need to restrict for X86. For any plaform we should have consistent behavior in changing vmThread attributes whether address range of evacutate, privateFlags etc. Some may be not be used for specific platform (in fact it's more related to whether s/w or h/w read barrier is used), but there is not much cost in maintaining them.
@@ -700,13 +700,20 @@ MM_CollectorLanguageInterfaceImpl::scavenger_switchConcurrentForThread(MM_Enviro | |||
j9tty_printf(PORTLIB, "%p: Nursery [%p,%p] Evacuate [%p,%p] GS [%p,%p] Section size 0x%zx, sections %lu bit offset %lu bit mask 0x%zx\n", | |||
vmThread, nurseryBase, nurseryTop, gsBase, gsTop, pageBase, pageTop, _extensions->getConcurrentScavengerPageSectionSize() , sectionCount, startOffsetInBits, bitMask); | |||
} | |||
#if defined(J9VM_GC_COMPRESSED_POINTERS) && defined(J9VM_ARCH_X86) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked and vmThread->evacuateBase/Top is used in Z (through thisThreadGetEvacuateBaseAddressOffset() and replaceGuardedLoadWithSoftwareReadBarrier()) for our first attempt of s/w based CS. But this should go away. New implementation should be similar to X86 and should also require these fields to be compressedrefs when applicable (@fjeremic please confirm) .
It would be nice to add a comment (for example where these two fields are declared) that they are compressed refs. It is fairly rare that VM/GC meta structures store heap refs as compressed refs.
@DanHeidinga I put a bunch of comments last night but they stayed at pending state. Just pushed them through with 'submit reveiw' |
auto object = cg->allocateRegister(); | ||
auto load = generateRegMemInstruction(L4RegMem, node, object, generateX86MemoryReference(address, 0, cg), cg); | ||
cg->setImplicitExceptionPoint(load); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does setting as an implicit exception point do exactly ? What is the exception that's possible ?
The other question I had was whether a read barrier opcode should only appear under a node that is at the top of the tree at codegen time (i.e. after trees have been lowered). Looking at some IL from logs with the concurrent scavenger enabled, I did not see it appear only under a treetop and so I was wondering if this was by design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The irdbari
node may be under a NULLCHK
and hence this is the implicit NULLCHK point.
There is no restriction or necessity for read barrier to have its own treetop. The discussion following https://github.com/eclipse/openj9/pull/3075/files#r221504019 concluded read barrier is not a GC point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the same node which will be used for field watch, it most certainly will be a GC point in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Field watch is irrelevant to this Concurrent Scavenger change. Even it is a GC point for field watch, it is NOT a GC point when field watch is disabled/off. This evaluator does NOT assume whether irdbari
is a GC point, and should work regardless. It is up to the Optimizer generating the trees, and this evaluator generates instructions for Concurrent Scavenger regardless whether irdbari
has its own treetop or not. Whether the Optimizer should generate irdbari
under a dedicate treetop is beyond the scope of this changeset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
irdbari
node may be under aNULLCHK
and hence this is the implicit NULLCHK point.There is no restriction or necessity for read barrier to have its own treetop. The discussion following https://github.com/eclipse/openj9/pull/3075/files#r221504019 concluded read barrier is not a GC point.
Is that the right link ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that the right link ?
Yes, the link is correct; but I guess this is a bug in GitHub preventing it jumping to the correct comment. If you click the link, then click "Show conversation" right under "runtime/codert_vm/xnathelp.m4", it should show the correct discussion.
@@ -9405,20 +9480,20 @@ static TR::Register* inlineCompareAndSwapObjectNative(TR::Node* node, TR::CodeGe | |||
|
|||
generateLabelInstruction(LABEL, node, begLabel, cg); | |||
|
|||
generateRegMemInstruction(CMPRegMem(use64BitClasses), node, tmp, generateX86MemoryReference(cg->getVMThreadRegister(), offsetof(J9VMThread, evacuateBase), cg), cg); | |||
generateRegMemInstruction(CMPRegMem(use64BitClasses), node, tmp, generateX86MemoryReference(cg->getVMThreadRegister(), offsetof(J9VMThread, evacuateBaseCompressed), cg), cg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this code work on 64-bit uncompressed or 32-bit ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Concurrent Scavenger is currently only supported on 64-bit compressedrefs. I've update this part to avoid confusion.
@@ -9405,20 +9481,20 @@ static TR::Register* inlineCompareAndSwapObjectNative(TR::Node* node, TR::CodeGe | |||
|
|||
generateLabelInstruction(LABEL, node, begLabel, cg); | |||
|
|||
generateRegMemInstruction(CMPRegMem(use64BitClasses), node, tmp, generateX86MemoryReference(cg->getVMThreadRegister(), offsetof(J9VMThread, evacuateBase), cg), cg); | |||
generateRegMemInstruction(CMP4RegMem, node, tmp, generateX86MemoryReference(cg->getVMThreadRegister(), offsetof(J9VMThread, evacuateBaseCompressed), cg), cg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Z platforms CS runs on both 64 CR and nonCR, but not on 32. Same should be on X. But if you want to stage it and enable only CR now and later for nonCR that's ok, but nonCR should be supported eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for reminding. This version only supports 64 CR. There are more changes required beyond this PR to make CS on nonCR work. CS support on nonCR will be added after CR is fully enabled.
6482e08
to
2409d22
Compare
X86-64 compressedrefs now supports read barriers for Concurrent Scavenger. Signed-off-by: Victor Ding <dvictor@ca.ibm.com>
* In short, the evacuate region is [vmThread->evacuateBase, vmThread->evacuateTop]. | ||
*/ | ||
vmThread->evacuateBase = (UDATA)gsBase; | ||
vmThread->evacuateTop = (UDATA)gsTop - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that this line has been modified after I approved the PR. I approve this additional change, too.
This change is necessary for CR config in case gsTop is at highest possible for CR range, in which case evacuateTopCompressed would be 0x100000000 after the shift is applied, which would not fit into a 32bit variable/register. Like this highest possible evacuateTopCompressed would be 0xffffffff.
Jenkins test sanity,extended xlinux,win jdk8,jdk11 |
Jenkins test sanity xlinux,win jdk8,jdk11 |
This is all reviewed and all approved so I am going to merge this. Sanity is clean and test have run successfully with CS enabled in additional testing. Note this does not enable the feature it is the implementation - #3070 is the enablement |
please do not miss the fact that even #3070 would not activate CS at x86. OMR_GC_CONCURRENT_SCAVENGER should be set as well |
Thanks @dmitripivkine I'm going to copy your comment to #3070 so it is visible in the review discussion there once WIP is removed. |
X86-64 compressedrefs now supports read barriers for Concurrent Scavenger.
Part of Issue #3054
Signed-off-by: Victor Ding dvictor@ca.ibm.com