Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An effective way to compress Large Object Heap #4076

Open
ygc369 opened this issue Mar 25, 2015 · 29 comments
Open

An effective way to compress Large Object Heap #4076

ygc369 opened this issue Mar 25, 2015 · 29 comments

Comments

@ygc369
Copy link

ygc369 commented Mar 25, 2015

I think I've found an effective way to compress LOH:
If CLR always alloc every large object at the beginning of a RAM page(usually 4KB per page),then the large object heap(LOH) can be compressed without much cost: CLR can compress LOH by modifying RAM page table and TLB instead of copying data. If so, small fragmentation maybe still exist (less then a memory page size per fragment), but there would be no large fragmentation, and compressing would be very fast because of no copying. To do this, OS support may be needed, fortunately Windows OS and Visual Studio are both Microsoft's softwares, so Microsoft can implement this at least on Windows.

@jkotas
Copy link
Member

jkotas commented Mar 25, 2015

cc @Maoni0

@kangaroo
Copy link
Contributor

To clairfy, by 'modify RAM page table', your idea here is that if the CLR could determine a page was 'free' in LOH, we could modify the PTE to point to a new physical page?

@Maoni0
Copy link
Member

Maoni0 commented Mar 25, 2015

Thanks for your interest in the GC, @ygc369. The feature you are talking about is called VA remapping or swapping - it's something that needs to be implemented in the VMM. We talked to the OS guys about it a few years ago. We don't have it yet.

@ygc369
Copy link
Author

ygc369 commented Mar 25, 2015

@Maoni0 This is a great feature, not only GC, many other operations which need copy large amount of memory can also benefit from it. I think you should talk to the OS guys about it again.It will obviously improve the performance of many programs without modifying the source code.

@zgxnet
Copy link

zgxnet commented Mar 25, 2015

I like the idea very much. Yes, LOH is likely to have fragments, and memory for large objects should be allocated directly, rather than have a managed heap for it. It is much better to do the address mapping in OS level than copy and move.

@omariom
Copy link
Contributor

omariom commented Apr 1, 2015

👍

@DemiMarie
Copy link

Also I would suggest filing feature requests with the Linux and *BSD kernel teams.

I believe (but could be wrong) that the last attempt to get this feature into Linux was tainted by association with the Azul Zing JVM, which – being proprietary and patent-encumbered – was looked down upon by the Linux team. The feature appeared to be only useful for a single, proprietary piece of software.

The FOSS *nix kernel developers might be much more interested if they saw that a free software VM would actually use address remapping.

@ygc369
Copy link
Author

ygc369 commented Oct 16, 2015

@drbo
Does Windows OS have this feature already?

@ygc369
Copy link
Author

ygc369 commented Oct 19, 2015

Is this feature very hard to realize?

@DemiMarie
Copy link

@ygc369 I don't know – I was referring to the now-defunct Managed Runtime Initiative, which (I believe) only ever managed to submit a patch to Linux. Azul has since resorted to shipping a custom proprietary kernel module.

@ygc369
Copy link
Author

ygc369 commented Apr 12, 2016

Virtual Machine softwares have already used this feature, I don't think it is too hard to realize.

@ygc369
Copy link
Author

ygc369 commented Jun 20, 2016

Nobody is interested in this?

@vancem
Copy link
Contributor

vancem commented Jun 20, 2016

I don't want to comment directly on the merits of pursuing this proposal, but I will mention that on a 64 bit machine, fragmentation is not as problematic as you might assume. The GC does not touch freed memory and since every object on the large object heap is > 85K (thus 42 pages), most of the pages simply drop out of the working set and don't 'hurt' real memory consumption (only address space consumption, which is significantly cheaper).

I don't want to make the statement that fragmentation of the large object heap does not matter, but my observation above suggests we need some data that suggests that large object heap fragmentation is a problem in interesting scenarios.

@ygc369
Copy link
Author

ygc369 commented Jun 21, 2016

@vancem
This proposal is not only for fragmentation problem, it can also collect garbage large objects earlier.
The cost of compressing LOH with traditional way is too much, so we can't do it frequently. But if we can compress LOH with the way I mentioned, the cost would be less. Thus we can collect garbage large objects in time.

@Maoni0
Copy link
Member

Maoni0 commented Jun 21, 2016

Not compacting LOH does not mean we do not collect garbage on LOH. We just don't compact (unless you specifically tell us to). We can collect LOH as often as we need if we think it's productive.

As I mentioned above, we already talked to the OS group a few years ago about doing this and the OS has yet to implement the VA remapping feature. I will talk to them again but feel free to bring this up with the Windows group and other OS groups as @drbo mentioned.

@ygc369
Copy link
Author

ygc369 commented Jun 21, 2016

@Maoni0
Thank you for commenting on this.
It seems that GC on LOH only happens during GC of Generation 2, so it may not happen often.
Even though the garbage objects on LOH are Generation 0, they would not be collected until next GC of Generation 2.

@FlorianRainer
Copy link

FlorianRainer commented Jun 15, 2017

I don't want to leave a comment on "How it would be better to compress LOH", but i want to ask if its even neccessary?

It is a lot of work for the GC and for this reason it will be done not as much at it should be done to work in a efficient way for the memory management.

What comes to my mind is a type of self defragmenting LOH.
I was reading this very good article on LOH Fragmenting https://www.codeproject.com/Articles/1191534/To-Heap-or-not-to-Heap-That-s-the-Large-Object-Que and if the problem is still the same, i have a different idea.

The reason for the fragmentation is the search for a free gap, and that a large object must have a size less (or equals) then this gap. so over time big gaps will be filled by smaller objects and a lot of small gaps are remaining.

it would be much better if LOH uses Pages with a constant Size as well, for example like uppon a 4KB PageSize.
I'm not talking about the RAM Page, it can match RAM Page size but its not required.
At the end of each page the last 64bit (for x64) will store a pointer to the next used page.
this way each gap contains always same sized free blocks, and its not any more required to allocate large memory in one block.

cause of LOH is used only on large memory allocations the performance impact and the fragmentation from not consuming a entire 4K block would be much less then the actual fragmentation.

the additional memory consumption py the 8Byte pointer on a 4K page would be ~8.4MB on a 4GB memory allocation.

reading the memory could be a tiny bit slower, but this way the GC doesn't needs to reorganize the LOH any more.

if this is not enought it could maybe be optimized by using the first byte of any page to determinate if the page should be read entirely and continued to read the next page (like on actual LOH) or if the last 8Bytes are used as reference to the next Page.

this would improve performance and memory usage for a usecase with a lot of free memory or only a few LOH objects

@svick
Copy link
Contributor

svick commented Jun 15, 2017

@FlorianRainer Wouldn't that break unsafe code? For example, I can allocate 1 GB byte[] and then use fixed to access the whole array directly using pointers. I don't think that would work with the approach you proposed.

@FlorianRainer
Copy link

FlorianRainer commented Jun 15, 2017

@svick thats true, for unsafe code and if you are working with pointers this will not work.
But maybe the idea to build some type of self defragmention (or not even fragmenting) LOH, instead of reorganizing it by the GC could be usefull? even if my approach is not so usefull.

@ygc369
Copy link
Author

ygc369 commented Jun 17, 2017

@FlorianRainer
@svick
My idea could work with unsafe code, and I think that compressing LOH in my way would not have more cost than compressing SOH, if OS and CPU could support it (GC thread can modify Page Table).
Even for safe code, I don't think FlorianRainer's idea has more advantages than mine. Assuming 4KB per page in his idea, if I allocate 1GB memory and want to access the last byte, then the CPU has to access memory 256 times to get only one byte!

@ygc369
Copy link
Author

ygc369 commented Mar 15, 2018

@Maoni0
I find that Windows OS seems to have VA remapping already.
See this: Address Windowing Extensions

AWE provides a very fast remapping capability. Remapping is done by manipulating virtual memory tables, not by moving data in physical memory.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 30, 2020
@msftgits msftgits added this to the Future milestone Jan 30, 2020
@ygc369
Copy link
Author

ygc369 commented Mar 29, 2020

@Maoni0
Is there any progress about this feature?
If OS has not supported VA remapping yet, I wonder why and how virtual machine softwares have it.
Besides, I think windows OS has supported it, look this:

AWE provides a very fast remapping capability. Remapping is done by manipulating virtual memory tables, not by moving data in physical memory.

from Address Windowing Extensions

@Maoni0
Copy link
Member

Maoni0 commented Mar 29, 2020

AWE has existed since Server 2003. it's not new. the APIs are quite awkward for this purpose and likely not fast enough (what I talked about with the OS folks was much more targeted at the GC usage); feel free to experiment with them.

Linux has mremap which seems much more suitable for the usage. I haven't gotten around to experimenting with it yet.

@GSPP
Copy link

GSPP commented Jun 17, 2020

I recently saw a CppCon talk where a clever technique for compacting heaps was presented. The key idea was to map the same physical page at multiple virtual locations. As long as objects on those virtual locations do not overlap physically, zero-copy compaction can be performed without relocating objects. It's not a full compaction. Rather, it uses suitable opportunities for this technique. Apparently, the authors found this to be a valuable optimization overall.

This technique is different from simply releasing free space "holes" by decommitting the pages.

I'm posting this here for consideration of the GC team.

https://www.youtube.com/watch?v=XRAP3lBivYM

@Maoni0
Copy link
Member

Maoni0 commented Jun 17, 2020

yeah, saw this when Emery's paper was published. I have considered it for GC usage.

@teo-tsirpanis
Copy link
Contributor

Would Regions help with this?

@sgf
Copy link

sgf commented Jul 23, 2022

its 2022,with more 3 years,thats will be 10 years,.net team will be do something for .net's ZGC?

@ygc369
Copy link
Author

ygc369 commented Jul 25, 2022

Would Regions help with this?

I also want to ask this question, will Regions help with this? @Maoni0

@ygc369
Copy link
Author

ygc369 commented May 29, 2024

@Maoni0
Any progress about this topic?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests