Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent Scavenger Read Barrier evaluator on X86-64 #3075

Merged
merged 1 commit into from
Oct 12, 2018

Conversation

0dvictor
Copy link
Contributor

@0dvictor 0dvictor commented Sep 28, 2018

X86-64 compressedrefs now supports read barriers for Concurrent Scavenger.

Part of Issue #3054

Signed-off-by: Victor Ding dvictor@ca.ibm.com

@0dvictor 0dvictor changed the title Concurrent Scavenger Read Barrier evaluator on X86-64 WIP: Concurrent Scavenger Read Barrier evaluator on X86-64 Sep 28, 2018
@0dvictor 0dvictor mentioned this pull request Sep 28, 2018
7 tasks
@0dvictor
Copy link
Contributor Author

Requires eclipse-omr/omr#2974 and eclipse-omr/omr#3012

Copy link
Member

@DanHeidinga DanHeidinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting to see some ground work for a concurrent scavenger on x86!

I've called out a couple things that should be addressed before this is merged.

@0dvictor 0dvictor force-pushed the csrdbar branch 2 times, most recently from 5edcc5f to 071b21d Compare October 3, 2018 02:24
@@ -700,13 +700,20 @@ MM_CollectorLanguageInterfaceImpl::scavenger_switchConcurrentForThread(MM_Enviro
j9tty_printf(PORTLIB, "%p: Nursery [%p,%p] Evacuate [%p,%p] GS [%p,%p] Section size 0x%zx, sections %lu bit offset %lu bit mask 0x%zx\n",
vmThread, nurseryBase, nurseryTop, gsBase, gsTop, pageBase, pageTop, _extensions->getConcurrentScavengerPageSectionSize() , sectionCount, startOffsetInBits, bitMask);
}
#if defined(J9VM_GC_COMPRESSED_POINTERS) && defined(J9VM_ARCH_X86)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see a GC reviewer on this bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amicic Could you take a look?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think there is a need to restrict for X86. For any plaform we should have consistent behavior in changing vmThread attributes whether address range of evacutate, privateFlags etc. Some may be not be used for specific platform (in fact it's more related to whether s/w or h/w read barrier is used), but there is not much cost in maintaining them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, would it make sense to have some #error in here when CS is defined, but the platforms options don't have a good answer for the values (and some comments).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked and vmThread->evacuateBase/Top is used in Z (through thisThreadGetEvacuateBaseAddressOffset() and replaceGuardedLoadWithSoftwareReadBarrier()) for our first attempt of s/w based CS. But this should go away. New implementation should be similar to X86 and should also require these fields to be compressedrefs when applicable (@fjeremic please confirm) .

It would be nice to add a comment (for example where these two fields are declared) that they are compressed refs. It is fairly rare that VM/GC meta structures store heap refs as compressed refs.

@0dvictor 0dvictor force-pushed the csrdbar branch 2 times, most recently from 48293b1 to 6f7c949 Compare October 3, 2018 03:13
@gacholio
Copy link
Contributor

gacholio commented Oct 3, 2018

I'd forgotten about the PARM_REG macro - doesn't work on 32-bit, but I'll implement it now that we're using fastcall (at your original suggestion), so at least things with 2 or fewer parameters will work everywhere, and will error out properly if we try to use more than 2.

Copy link
Contributor

@gacholio gacholio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from the helper side of things - I'll have some updates to this code soon when I refactor how calling C works (OSX has made things a bit more difficult).

@andrewcraik
Copy link
Contributor

@DanHeidinga could you have a look now that @gacholio has approved to see if you are happy - I'm going to do a review now from the JIT side.

@0dvictor could you confirm if this still needs to be WIP? The dependencies have been merged I believe and it looks like we should be done a review soon so I'd like to make sure this is ready to go soon if possible.

@0dvictor
Copy link
Contributor Author

0dvictor commented Oct 3, 2018

could you confirm if this still needs to be WIP? The dependencies have been merged I believe and it looks like we should be done a review soon so I'd like to make sure this is ready to go soon if possible.

It is currently marked as WIP only because I have a test run pending. I'll remove the WIP as soon as it cleared.

@@ -960,6 +960,79 @@ TR::Register *J9::X86::AMD64::TreeEvaluator::conditionalHelperEvaluator(TR::Node
#endif


TR::Register *J9::X86::TreeEvaluator::irdbarEvaluator(TR::Node *node, TR::CodeGenerator *cg)
{
TR_ASSERT(TR::Compiler->om.shouldGenerateReadBarriersForFieldLoads(), "iReadBarrierEvaluator");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - this will be changed soon by the field watch work so I'm wondering if we should structure this more like if shouldGenerateReadBarriersForFieldLoads then do this sequence otherwise just treat it like an aloadi and other can add other meanings and uses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion; however without seeing the actual change of field watch work, it is hard to image exactly what need be checked. If field watch work is merged before this PR, I would definitely updated this change accordingly; otherwise, I suggest to massage the code when field watch is delivered.

@DanHeidinga
Copy link
Member

@amicic Are you OK with the initialization code? You're most familiar with the existing sequence on z

@@ -700,13 +700,20 @@ MM_CollectorLanguageInterfaceImpl::scavenger_switchConcurrentForThread(MM_Enviro
j9tty_printf(PORTLIB, "%p: Nursery [%p,%p] Evacuate [%p,%p] GS [%p,%p] Section size 0x%zx, sections %lu bit offset %lu bit mask 0x%zx\n",
vmThread, nurseryBase, nurseryTop, gsBase, gsTop, pageBase, pageTop, _extensions->getConcurrentScavengerPageSectionSize() , sectionCount, startOffsetInBits, bitMask);
}
#if defined(J9VM_GC_COMPRESSED_POINTERS) && defined(J9VM_ARCH_X86)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think there is a need to restrict for X86. For any plaform we should have consistent behavior in changing vmThread attributes whether address range of evacutate, privateFlags etc. Some may be not be used for specific platform (in fact it's more related to whether s/w or h/w read barrier is used), but there is not much cost in maintaining them.

@@ -700,13 +700,20 @@ MM_CollectorLanguageInterfaceImpl::scavenger_switchConcurrentForThread(MM_Enviro
j9tty_printf(PORTLIB, "%p: Nursery [%p,%p] Evacuate [%p,%p] GS [%p,%p] Section size 0x%zx, sections %lu bit offset %lu bit mask 0x%zx\n",
vmThread, nurseryBase, nurseryTop, gsBase, gsTop, pageBase, pageTop, _extensions->getConcurrentScavengerPageSectionSize() , sectionCount, startOffsetInBits, bitMask);
}
#if defined(J9VM_GC_COMPRESSED_POINTERS) && defined(J9VM_ARCH_X86)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked and vmThread->evacuateBase/Top is used in Z (through thisThreadGetEvacuateBaseAddressOffset() and replaceGuardedLoadWithSoftwareReadBarrier()) for our first attempt of s/w based CS. But this should go away. New implementation should be similar to X86 and should also require these fields to be compressedrefs when applicable (@fjeremic please confirm) .

It would be nice to add a comment (for example where these two fields are declared) that they are compressed refs. It is fairly rare that VM/GC meta structures store heap refs as compressed refs.

@amicic
Copy link
Contributor

amicic commented Oct 3, 2018

@DanHeidinga I put a bunch of comments last night but they stayed at pending state. Just pushed them through with 'submit reveiw'

auto object = cg->allocateRegister();
auto load = generateRegMemInstruction(L4RegMem, node, object, generateX86MemoryReference(address, 0, cg), cg);
cg->setImplicitExceptionPoint(load);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does setting as an implicit exception point do exactly ? What is the exception that's possible ?

The other question I had was whether a read barrier opcode should only appear under a node that is at the top of the tree at codegen time (i.e. after trees have been lowered). Looking at some IL from logs with the concurrent scavenger enabled, I did not see it appear only under a treetop and so I was wondering if this was by design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The irdbari node may be under a NULLCHK and hence this is the implicit NULLCHK point.

There is no restriction or necessity for read barrier to have its own treetop. The discussion following https://github.com/eclipse/openj9/pull/3075/files#r221504019 concluded read barrier is not a GC point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the same node which will be used for field watch, it most certainly will be a GC point in that case.

Copy link
Contributor Author

@0dvictor 0dvictor Oct 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field watch is irrelevant to this Concurrent Scavenger change. Even it is a GC point for field watch, it is NOT a GC point when field watch is disabled/off. This evaluator does NOT assume whether irdbari is a GC point, and should work regardless. It is up to the Optimizer generating the trees, and this evaluator generates instructions for Concurrent Scavenger regardless whether irdbari has its own treetop or not. Whether the Optimizer should generate irdbari under a dedicate treetop is beyond the scope of this changeset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The irdbari node may be under a NULLCHK and hence this is the implicit NULLCHK point.

There is no restriction or necessity for read barrier to have its own treetop. The discussion following https://github.com/eclipse/openj9/pull/3075/files#r221504019 concluded read barrier is not a GC point.

Is that the right link ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that the right link ?

Yes, the link is correct; but I guess this is a bug in GitHub preventing it jumping to the correct comment. If you click the link, then click "Show conversation" right under "runtime/codert_vm/xnathelp.m4", it should show the correct discussion.

@@ -9405,20 +9480,20 @@ static TR::Register* inlineCompareAndSwapObjectNative(TR::Node* node, TR::CodeGe

generateLabelInstruction(LABEL, node, begLabel, cg);

generateRegMemInstruction(CMPRegMem(use64BitClasses), node, tmp, generateX86MemoryReference(cg->getVMThreadRegister(), offsetof(J9VMThread, evacuateBase), cg), cg);
generateRegMemInstruction(CMPRegMem(use64BitClasses), node, tmp, generateX86MemoryReference(cg->getVMThreadRegister(), offsetof(J9VMThread, evacuateBaseCompressed), cg), cg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this code work on 64-bit uncompressed or 32-bit ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Concurrent Scavenger is currently only supported on 64-bit compressedrefs. I've update this part to avoid confusion.

@@ -9405,20 +9481,20 @@ static TR::Register* inlineCompareAndSwapObjectNative(TR::Node* node, TR::CodeGe

generateLabelInstruction(LABEL, node, begLabel, cg);

generateRegMemInstruction(CMPRegMem(use64BitClasses), node, tmp, generateX86MemoryReference(cg->getVMThreadRegister(), offsetof(J9VMThread, evacuateBase), cg), cg);
generateRegMemInstruction(CMP4RegMem, node, tmp, generateX86MemoryReference(cg->getVMThreadRegister(), offsetof(J9VMThread, evacuateBaseCompressed), cg), cg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Z platforms CS runs on both 64 CR and nonCR, but not on 32. Same should be on X. But if you want to stage it and enable only CR now and later for nonCR that's ok, but nonCR should be supported eventually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reminding. This version only supports 64 CR. There are more changes required beyond this PR to make CS on nonCR work. CS support on nonCR will be added after CR is fully enabled.

X86-64 compressedrefs now supports read barriers for
Concurrent Scavenger.

Signed-off-by: Victor Ding <dvictor@ca.ibm.com>
* In short, the evacuate region is [vmThread->evacuateBase, vmThread->evacuateTop].
*/
vmThread->evacuateBase = (UDATA)gsBase;
vmThread->evacuateTop = (UDATA)gsTop - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this line has been modified after I approved the PR. I approve this additional change, too.

This change is necessary for CR config in case gsTop is at highest possible for CR range, in which case evacuateTopCompressed would be 0x100000000 after the shift is applied, which would not fit into a 32bit variable/register. Like this highest possible evacuateTopCompressed would be 0xffffffff.

@andrewcraik
Copy link
Contributor

Jenkins test sanity,extended xlinux,win jdk8,jdk11

@andrewcraik
Copy link
Contributor

Jenkins test sanity xlinux,win jdk8,jdk11

@andrewcraik
Copy link
Contributor

This is all reviewed and all approved so I am going to merge this. Sanity is clean and test have run successfully with CS enabled in additional testing. Note this does not enable the feature it is the implementation - #3070 is the enablement

@andrewcraik andrewcraik merged commit db3043a into eclipse-openj9:master Oct 12, 2018
@dmitripivkine
Copy link
Contributor

This is all reviewed and all approved so I am going to merge this. Sanity is clean and test have run successfully with CS enabled in additional testing. Note this does not enable the feature it is the implementation - #3070 is the enablement

please do not miss the fact that even #3070 would not activate CS at x86. OMR_GC_CONCURRENT_SCAVENGER should be set as well

@andrewcraik
Copy link
Contributor

Thanks @dmitripivkine I'm going to copy your comment to #3070 so it is visible in the review discussion there once WIP is removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants