Concurrent Scavenger Read Barrier evaluator on X86-64 #3075

0dvictor · 2018-09-28T22:05:45Z

X86-64 compressedrefs now supports read barriers for Concurrent Scavenger.

Part of Issue #3054

Signed-off-by: Victor Ding dvictor@ca.ibm.com

0dvictor · 2018-09-28T22:08:22Z

Requires eclipse-omr/omr#2974 and eclipse-omr/omr#3012

DanHeidinga

Exciting to see some ground work for a concurrent scavenger on x86!

I've called out a couple things that should be addressed before this is merged.

runtime/vm/vmthread.c

runtime/gc_glue_java/CollectorLanguageInterfaceImpl.cpp

runtime/codert_vm/xnathelp.m4

runtime/jilgen/jilconsts.c

gacholio · 2018-10-03T02:48:56Z

runtime/gc_glue_java/CollectorLanguageInterfaceImpl.cpp

@@ -700,13 +700,20 @@ MM_CollectorLanguageInterfaceImpl::scavenger_switchConcurrentForThread(MM_Enviro
 			j9tty_printf(PORTLIB, "%p: Nursery [%p,%p] Evacuate [%p,%p] GS [%p,%p] Section size 0x%zx, sections %lu bit offset %lu bit mask 0x%zx\n",
 					vmThread, nurseryBase, nurseryTop, gsBase, gsTop, pageBase, pageTop, _extensions->getConcurrentScavengerPageSectionSize() , sectionCount, startOffsetInBits, bitMask);
 		}
+#if defined(J9VM_GC_COMPRESSED_POINTERS) && defined(J9VM_ARCH_X86)


I'd like to see a GC reviewer on this bit.

@amicic Could you take a look?

I dont think there is a need to restrict for X86. For any plaform we should have consistent behavior in changing vmThread attributes whether address range of evacutate, privateFlags etc. Some may be not be used for specific platform (in fact it's more related to whether s/w or h/w read barrier is used), but there is not much cost in maintaining them.

In particular, would it make sense to have some #error in here when CS is defined, but the platforms options don't have a good answer for the values (and some comments).

I checked and vmThread->evacuateBase/Top is used in Z (through thisThreadGetEvacuateBaseAddressOffset() and replaceGuardedLoadWithSoftwareReadBarrier()) for our first attempt of s/w based CS. But this should go away. New implementation should be similar to X86 and should also require these fields to be compressedrefs when applicable (@fjeremic please confirm) .

It would be nice to add a comment (for example where these two fields are declared) that they are compressed refs. It is fairly rare that VM/GC meta structures store heap refs as compressed refs.

runtime/codert_vm/xnathelp.m4

gacholio · 2018-10-03T03:30:17Z

I'd forgotten about the PARM_REG macro - doesn't work on 32-bit, but I'll implement it now that we're using fastcall (at your original suggestion), so at least things with 2 or fewer parameters will work everywhere, and will error out properly if we try to use more than 2.

runtime/codert_vm/xnathelp.m4

gacholio

Looks good from the helper side of things - I'll have some updates to this code soon when I refactor how calling C works (OSX has made things a bit more difficult).

andrewcraik · 2018-10-03T15:33:24Z

@DanHeidinga could you have a look now that @gacholio has approved to see if you are happy - I'm going to do a review now from the JIT side.

@0dvictor could you confirm if this still needs to be WIP? The dependencies have been merged I believe and it looks like we should be done a review soon so I'd like to make sure this is ready to go soon if possible.

0dvictor · 2018-10-03T15:37:56Z

could you confirm if this still needs to be WIP? The dependencies have been merged I believe and it looks like we should be done a review soon so I'd like to make sure this is ready to go soon if possible.

It is currently marked as WIP only because I have a test run pending. I'll remove the WIP as soon as it cleared.

andrewcraik · 2018-10-03T15:39:19Z

runtime/compiler/x/codegen/J9TreeEvaluator.cpp

@@ -960,6 +960,79 @@ TR::Register *J9::X86::AMD64::TreeEvaluator::conditionalHelperEvaluator(TR::Node
 #endif


+TR::Register *J9::X86::TreeEvaluator::irdbarEvaluator(TR::Node *node, TR::CodeGenerator *cg)
+   {
+   TR_ASSERT(TR::Compiler->om.shouldGenerateReadBarriersForFieldLoads(), "iReadBarrierEvaluator");


Hmm - this will be changed soon by the field watch work so I'm wondering if we should structure this more like if shouldGenerateReadBarriersForFieldLoads then do this sequence otherwise just treat it like an aloadi and other can add other meanings and uses?

Good suggestion; however without seeing the actual change of field watch work, it is hard to image exactly what need be checked. If field watch work is merged before this PR, I would definitely updated this change accordingly; otherwise, I suggest to massage the code when field watch is delivered.

runtime/gc_glue_java/CollectorLanguageInterfaceImpl.cpp

runtime/vm/vmthread.c

DanHeidinga · 2018-10-03T15:48:46Z

@amicic Are you OK with the initialization code? You're most familiar with the existing sequence on z

runtime/gc_glue_java/CollectorLanguageInterfaceImpl.cpp

amicic · 2018-10-03T03:17:21Z

runtime/gc_glue_java/CollectorLanguageInterfaceImpl.cpp

@@ -700,13 +700,20 @@ MM_CollectorLanguageInterfaceImpl::scavenger_switchConcurrentForThread(MM_Enviro
 			j9tty_printf(PORTLIB, "%p: Nursery [%p,%p] Evacuate [%p,%p] GS [%p,%p] Section size 0x%zx, sections %lu bit offset %lu bit mask 0x%zx\n",
 					vmThread, nurseryBase, nurseryTop, gsBase, gsTop, pageBase, pageTop, _extensions->getConcurrentScavengerPageSectionSize() , sectionCount, startOffsetInBits, bitMask);
 		}
+#if defined(J9VM_GC_COMPRESSED_POINTERS) && defined(J9VM_ARCH_X86)


I dont think there is a need to restrict for X86. For any plaform we should have consistent behavior in changing vmThread attributes whether address range of evacutate, privateFlags etc. Some may be not be used for specific platform (in fact it's more related to whether s/w or h/w read barrier is used), but there is not much cost in maintaining them.

runtime/gc_glue_java/CollectorLanguageInterfaceImpl.cpp

amicic · 2018-10-03T03:35:53Z

runtime/gc_glue_java/CollectorLanguageInterfaceImpl.cpp

@@ -700,13 +700,20 @@ MM_CollectorLanguageInterfaceImpl::scavenger_switchConcurrentForThread(MM_Enviro
 			j9tty_printf(PORTLIB, "%p: Nursery [%p,%p] Evacuate [%p,%p] GS [%p,%p] Section size 0x%zx, sections %lu bit offset %lu bit mask 0x%zx\n",
 					vmThread, nurseryBase, nurseryTop, gsBase, gsTop, pageBase, pageTop, _extensions->getConcurrentScavengerPageSectionSize() , sectionCount, startOffsetInBits, bitMask);
 		}
+#if defined(J9VM_GC_COMPRESSED_POINTERS) && defined(J9VM_ARCH_X86)


I checked and vmThread->evacuateBase/Top is used in Z (through thisThreadGetEvacuateBaseAddressOffset() and replaceGuardedLoadWithSoftwareReadBarrier()) for our first attempt of s/w based CS. But this should go away. New implementation should be similar to X86 and should also require these fields to be compressedrefs when applicable (@fjeremic please confirm) .

It would be nice to add a comment (for example where these two fields are declared) that they are compressed refs. It is fairly rare that VM/GC meta structures store heap refs as compressed refs.

runtime/vm/vmthread.c

amicic · 2018-10-03T16:13:46Z

@DanHeidinga I put a bunch of comments last night but they stayed at pending state. Just pushed them through with 'submit reveiw'

runtime/gc_glue_java/CollectorLanguageInterfaceImpl.cpp

vijaysun-omr · 2018-10-10T18:09:07Z

runtime/compiler/x/codegen/J9TreeEvaluator.cpp

+   auto object = cg->allocateRegister();
+   auto load = generateRegMemInstruction(L4RegMem, node, object, generateX86MemoryReference(address, 0, cg), cg);
+   cg->setImplicitExceptionPoint(load);
+


What does setting as an implicit exception point do exactly ? What is the exception that's possible ?

The other question I had was whether a read barrier opcode should only appear under a node that is at the top of the tree at codegen time (i.e. after trees have been lowered). Looking at some IL from logs with the concurrent scavenger enabled, I did not see it appear only under a treetop and so I was wondering if this was by design.

The irdbari node may be under a NULLCHK and hence this is the implicit NULLCHK point.

There is no restriction or necessity for read barrier to have its own treetop. The discussion following https://github.com/eclipse/openj9/pull/3075/files#r221504019 concluded read barrier is not a GC point.

If this is the same node which will be used for field watch, it most certainly will be a GC point in that case.

Field watch is irrelevant to this Concurrent Scavenger change. Even it is a GC point for field watch, it is NOT a GC point when field watch is disabled/off. This evaluator does NOT assume whether irdbari is a GC point, and should work regardless. It is up to the Optimizer generating the trees, and this evaluator generates instructions for Concurrent Scavenger regardless whether irdbari has its own treetop or not. Whether the Optimizer should generate irdbari under a dedicate treetop is beyond the scope of this changeset.

The irdbari node may be under a NULLCHK and hence this is the implicit NULLCHK point.

There is no restriction or necessity for read barrier to have its own treetop. The discussion following https://github.com/eclipse/openj9/pull/3075/files#r221504019 concluded read barrier is not a GC point.

Is that the right link ?

Is that the right link ?

Yes, the link is correct; but I guess this is a bug in GitHub preventing it jumping to the correct comment. If you click the link, then click "Show conversation" right under "runtime/codert_vm/xnathelp.m4", it should show the correct discussion.

vijaysun-omr · 2018-10-10T18:11:19Z

runtime/compiler/x/codegen/J9TreeEvaluator.cpp

@@ -9405,20 +9480,20 @@ static TR::Register* inlineCompareAndSwapObjectNative(TR::Node* node, TR::CodeGe

      generateLabelInstruction(LABEL, node, begLabel, cg);

-      generateRegMemInstruction(CMPRegMem(use64BitClasses), node, tmp, generateX86MemoryReference(cg->getVMThreadRegister(), offsetof(J9VMThread, evacuateBase), cg), cg);
+      generateRegMemInstruction(CMPRegMem(use64BitClasses), node, tmp, generateX86MemoryReference(cg->getVMThreadRegister(), offsetof(J9VMThread, evacuateBaseCompressed), cg), cg);


Will this code work on 64-bit uncompressed or 32-bit ?

Good point. Concurrent Scavenger is currently only supported on 64-bit compressedrefs. I've update this part to avoid confusion.

amicic · 2018-10-11T00:03:10Z

runtime/compiler/x/codegen/J9TreeEvaluator.cpp

@@ -9405,20 +9481,20 @@ static TR::Register* inlineCompareAndSwapObjectNative(TR::Node* node, TR::CodeGe

      generateLabelInstruction(LABEL, node, begLabel, cg);

-      generateRegMemInstruction(CMPRegMem(use64BitClasses), node, tmp, generateX86MemoryReference(cg->getVMThreadRegister(), offsetof(J9VMThread, evacuateBase), cg), cg);
+      generateRegMemInstruction(CMP4RegMem, node, tmp, generateX86MemoryReference(cg->getVMThreadRegister(), offsetof(J9VMThread, evacuateBaseCompressed), cg), cg);


On Z platforms CS runs on both 64 CR and nonCR, but not on 32. Same should be on X. But if you want to stage it and enable only CR now and later for nonCR that's ok, but nonCR should be supported eventually.

Thank you for reminding. This version only supports 64 CR. There are more changes required beyond this PR to make CS on nonCR work. CS support on nonCR will be added after CR is fully enabled.

X86-64 compressedrefs now supports read barriers for Concurrent Scavenger. Signed-off-by: Victor Ding <dvictor@ca.ibm.com>

amicic · 2018-10-11T21:44:41Z

runtime/gc_glue_java/CollectorLanguageInterfaceImpl.cpp

+		 * In short, the evacuate region is [vmThread->evacuateBase, vmThread->evacuateTop].
+		 */
+		vmThread->evacuateBase = (UDATA)gsBase;
+		vmThread->evacuateTop = (UDATA)gsTop - 1;


I understand that this line has been modified after I approved the PR. I approve this additional change, too.

This change is necessary for CR config in case gsTop is at highest possible for CR range, in which case evacuateTopCompressed would be 0x100000000 after the shift is applied, which would not fit into a 32bit variable/register. Like this highest possible evacuateTopCompressed would be 0xffffffff.

andrewcraik · 2018-10-11T21:47:35Z

Jenkins test sanity,extended xlinux,win jdk8,jdk11

andrewcraik · 2018-10-11T21:47:56Z

Jenkins test sanity xlinux,win jdk8,jdk11

andrewcraik · 2018-10-12T12:50:20Z

This is all reviewed and all approved so I am going to merge this. Sanity is clean and test have run successfully with CS enabled in additional testing. Note this does not enable the feature it is the implementation - #3070 is the enablement

dmitripivkine · 2018-10-12T13:16:43Z

This is all reviewed and all approved so I am going to merge this. Sanity is clean and test have run successfully with CS enabled in additional testing. Note this does not enable the feature it is the implementation - #3070 is the enablement

please do not miss the fact that even #3070 would not activate CS at x86. OMR_GC_CONCURRENT_SCAVENGER should be set as well

andrewcraik · 2018-10-12T13:25:14Z

Thanks @dmitripivkine I'm going to copy your comment to #3070 so it is visible in the review discussion there once WIP is removed.

0dvictor changed the title ~~Concurrent Scavenger Read Barrier evaluator on X86-64~~ WIP: Concurrent Scavenger Read Barrier evaluator on X86-64 Sep 28, 2018

0dvictor mentioned this pull request Sep 28, 2018

Concurrent Scavenge support on X86 #3054

Closed

7 tasks

DanHeidinga requested changes Sep 29, 2018

View reviewed changes

fjeremic added comp:gc comp:jit labels Oct 2, 2018

0dvictor force-pushed the csrdbar branch 2 times, most recently from 5edcc5f to 071b21d Compare October 3, 2018 02:24

gacholio reviewed Oct 3, 2018

View reviewed changes

runtime/jilgen/jilconsts.c Outdated Show resolved Hide resolved

gacholio reviewed Oct 3, 2018

View reviewed changes

0dvictor force-pushed the csrdbar branch 2 times, most recently from 48293b1 to 6f7c949 Compare October 3, 2018 03:13

gacholio reviewed Oct 3, 2018

View reviewed changes

runtime/codert_vm/xnathelp.m4 Outdated Show resolved Hide resolved

0dvictor mentioned this pull request Oct 3, 2018

Style update of jitReferenceArrayCopy on X86 #3112

Merged

0dvictor force-pushed the csrdbar branch from 6f7c949 to 5c4ffbc Compare October 3, 2018 03:56

gacholio reviewed Oct 3, 2018

View reviewed changes

runtime/codert_vm/xnathelp.m4 Outdated Show resolved Hide resolved

0dvictor force-pushed the csrdbar branch from 5c4ffbc to cd16eed Compare October 3, 2018 04:20

gacholio approved these changes Oct 3, 2018

View reviewed changes

andrewcraik reviewed Oct 3, 2018

View reviewed changes

DanHeidinga reviewed Oct 3, 2018

View reviewed changes

runtime/gc_glue_java/CollectorLanguageInterfaceImpl.cpp Outdated Show resolved Hide resolved

runtime/vm/vmthread.c Outdated Show resolved Hide resolved

amicic reviewed Oct 3, 2018

View reviewed changes

amicic reviewed Oct 9, 2018

View reviewed changes

runtime/gc_glue_java/CollectorLanguageInterfaceImpl.cpp Outdated Show resolved Hide resolved

0dvictor force-pushed the csrdbar branch from 6c78a26 to 061868c Compare October 9, 2018 15:34

amicic approved these changes Oct 9, 2018

View reviewed changes

0dvictor force-pushed the csrdbar branch from 061868c to ffb11f4 Compare October 10, 2018 02:12

vijaysun-omr reviewed Oct 10, 2018

View reviewed changes

0dvictor force-pushed the csrdbar branch from ffb11f4 to 6e13f36 Compare October 10, 2018 18:28

amicic reviewed Oct 11, 2018

View reviewed changes

0dvictor force-pushed the csrdbar branch from 6e13f36 to 0d54c9c Compare October 11, 2018 01:04

0dvictor force-pushed the csrdbar branch 3 times, most recently from 6482e08 to 2409d22 Compare October 11, 2018 19:54

Concurrent Scavenger Read Barrier evaluator on X86-64

3ff53c6

X86-64 compressedrefs now supports read barriers for Concurrent Scavenger. Signed-off-by: Victor Ding <dvictor@ca.ibm.com>

0dvictor force-pushed the csrdbar branch from 2409d22 to 3ff53c6 Compare October 11, 2018 20:35

amicic reviewed Oct 11, 2018

View reviewed changes

andrewcraik approved these changes Oct 12, 2018

View reviewed changes

andrewcraik merged commit db3043a into eclipse-openj9:master Oct 12, 2018

0dvictor deleted the csrdbar branch October 12, 2018 14:14

fjeremic mentioned this pull request Oct 15, 2018

Validate software based CS logic for testing evacuateTop and evacuateBase #3279

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrent Scavenger Read Barrier evaluator on X86-64 #3075

Concurrent Scavenger Read Barrier evaluator on X86-64 #3075

0dvictor commented Sep 28, 2018 •

edited

Loading

0dvictor commented Sep 28, 2018

DanHeidinga left a comment

gacholio Oct 3, 2018

0dvictor Oct 3, 2018

amicic Oct 3, 2018

gacholio Oct 3, 2018

amicic Oct 3, 2018

gacholio commented Oct 3, 2018

gacholio left a comment

andrewcraik commented Oct 3, 2018

0dvictor commented Oct 3, 2018

andrewcraik Oct 3, 2018

0dvictor Oct 5, 2018

DanHeidinga commented Oct 3, 2018

amicic Oct 3, 2018

amicic Oct 3, 2018

amicic commented Oct 3, 2018

vijaysun-omr Oct 10, 2018

0dvictor Oct 10, 2018

gacholio Oct 10, 2018

0dvictor Oct 10, 2018 •

edited

Loading

vijaysun-omr Oct 10, 2018

0dvictor Oct 10, 2018

vijaysun-omr Oct 10, 2018

0dvictor Oct 10, 2018

amicic Oct 11, 2018

0dvictor Oct 11, 2018

amicic Oct 11, 2018

andrewcraik commented Oct 11, 2018

andrewcraik commented Oct 11, 2018

andrewcraik commented Oct 12, 2018

dmitripivkine commented Oct 12, 2018

andrewcraik commented Oct 12, 2018

Concurrent Scavenger Read Barrier evaluator on X86-64 #3075

Concurrent Scavenger Read Barrier evaluator on X86-64 #3075

Conversation

0dvictor commented Sep 28, 2018 • edited Loading

0dvictor commented Sep 28, 2018

DanHeidinga left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gacholio commented Oct 3, 2018

gacholio left a comment

Choose a reason for hiding this comment

andrewcraik commented Oct 3, 2018

0dvictor commented Oct 3, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DanHeidinga commented Oct 3, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amicic commented Oct 3, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

0dvictor Oct 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewcraik commented Oct 11, 2018

andrewcraik commented Oct 11, 2018

andrewcraik commented Oct 12, 2018

dmitripivkine commented Oct 12, 2018

andrewcraik commented Oct 12, 2018

0dvictor commented Sep 28, 2018 •

edited

Loading

0dvictor Oct 10, 2018 •

edited

Loading