Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory<T> and large memory mapped files #24805

Closed
hexawyz opened this issue Jan 26, 2018 · 27 comments
Closed

Memory<T> and large memory mapped files #24805

hexawyz opened this issue Jan 26, 2018 · 27 comments
Assignees
Milestone

Comments

@hexawyz
Copy link

hexawyz commented Jan 26, 2018

I'm currently experimenting with OwnedMemory<T> and Memory<T> in an existing project that I'm trying to improve, and I ran into an issue with OwnedMemory<T> and Memory<T> being limited to int.MaxValue.

Scenario

I have a relatively big (> 2GB) data file that I want to fully map in memory (i.e. a database). My API exposes methods that returns subsets of this big memory mapped file, e.g.

public ReadOnlyMemory<byte> GetBytes(int something)
{
    // …
    return mainMemory.Slice(start, length).AsReadOnly();
}

Wrapping the MemoryMappedFile and associated MemoryMappedViewAccessor into an OwnedMemory<byte> seemed to be a good idea, since most of the tricky logic would then be handled by the framework.

Problem

The memory block that I want to wrap is bigger than 2GB and cannot currently be represented by a single Memory instance.
Since Memory can only work with T[], string, or OwnedMemory<T>, it seems that having to give up on the straightfoward OwnedMemory<T> implementation also means that I have to give up on using Memory<T> at all.

(In this specific case, Span<T> being limited to 2GB, would not be a problem, because the sliced memory blocks that my API would return would always be much smaller than that.)

Possible solutions with the currently proposed API

  • Not using Memory<T> at all and implementing a much simplified version of OwnedMemory<T>/Memory<T> that would fit my use case
  • Keeping many overlapping instances of OwnedMemory<T> around and use the one that best fits the current case

Question

Would it be possible to improve the framework in order to be able of easily working with such large memory blocks? (Maybe implementing something like a BigMemory<T> ?)

@KrzysztofCwalina
Copy link
Member

We will be soon adding ReadOnlyBuffer. See https://github.com/dotnet/corefxlab/blob/master/src/System.Buffers.Primitives/System/Buffers/ReadOnlyBuffer.cs

We would be interested in your feedback on this type. Would it support your scenarios?

@hexawyz
Copy link
Author

hexawyz commented Jan 31, 2018

I took some time to look into this new type and I think I could make it work (haven't had the time to try it yet, though).
I quite like the idea of having a standardized buffer type, but I am a bit afraid about the induced complexity in a case where all memory is contiguous by design. (Especially in the case of the Seek operation)

In my current case, the file is approximately 3,5GB, so I could create 4 OwnedMemory<byte> of 1GB or less, backed up by their owner, and I would have to chain those block by implementing IMemoryList<byte> on them.
If I'm not mistaken, using ReadOnlyBuffer<byte> would mean that creating a Span<byte> for a small part of the buffer, instead of being an O(1) operation such as new Span<byte>(pointer + offset, length), would be a non-trivial O(log N) operation.

As soon as I have the time, I'll try creating a small benchmark for this use case, and compare possible implementations.

@KrzysztofCwalina
Copy link
Member

KrzysztofCwalina commented Jan 31, 2018

@pakrym, @davidfowl I think we could solve the O(log N) seek problem if IMemoryList<T> extended ISequence<T>. ISequence<T> has Seek and it could be implemented as O(1) on some specialized datastructures, e.g. and array of buffers of the same size.

@pakrym
Copy link
Contributor

pakrym commented Jan 31, 2018

N is the number of segments. So I don't see how this has a big impact if buffers are large.

As I said before I don't like two sources of Positions (ROB and IML)

@KrzysztofCwalina
Copy link
Member

KrzysztofCwalina commented Jan 31, 2018

If IMemoryList extends ISequence, there would not be two sources of position. There would only be APIs on ISequence (Start, TryGet, Seek)

@pakrym
Copy link
Contributor

pakrym commented Jan 31, 2018

What about ReadOnlyBuffer? It edits Index to put bit's into it, how would it know that IMemoryList does not rely on that bit? It's the same conversion as in previous IMemoryList redesign

@hexawyz
Copy link
Author

hexawyz commented Feb 5, 2018

I created a benchmark comparing approaches for accessing a large memory block:
https://github.com/GoldenCrystal/MemoryLookupBenchmark

I tried to get it as close as possible to my real use-case:

  • Find the index and length of the data (I cheated a bit by using constant-length items there)
  • Create a reference to that data for later use (e.g. Span<T>)
  • Copy the item to a buffer (e.g. for Sockets)

Assuming I didn't make any mistakes in the benchmark code, the numbers tell me that using ReadOnlyBuffer would be ~1.95 times slower than implementing a custom slice type:

BenchmarkDotNet=v0.10.12, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.192)
Intel Core i7-4578U CPU 3.00GHz (Haswell), 1 CPU, 4 logical cores and 2 physical cores
Frequency=2929690 Hz, Resolution=341.3330 ns, Timer=TSC
.NET Core SDK=2.1.4
  [Host]     : .NET Core 2.0.5 (Framework 4.6.26020.03), 64bit RyuJIT
  DefaultJob : .NET Core 2.0.5 (Framework 4.6.26020.03), 64bit RyuJIT
Method Mean Error StdDev Scaled ScaledSD
'Copy a random item to the stack using a locally generated Span.' 160.455 ns 1.7740 ns 1.6594 ns 1.00 0.00
'Copy a random item to the stack using the custom implemented SafeBufferSlice<T> struct.' 168.540 ns 3.3838 ns 4.5172 ns 1.05 0.03
'Copy a random item to the stack using the ReadOnlyBuffer<T> struct.' 329.546 ns 3.3078 ns 3.0941 ns 2.05 0.03

I'm not sure how much implementing ISequence<T> would improve the performance there. I tend to think it would be difficult to match the performance reached by the more direct uses of (ReadOnly)Span<T>. 🤔

@KrzysztofCwalina
Copy link
Member

FYI: We are adding IMemoryList.GetPosition(long). It will enable O(1) random access on some IMemoryList implementations (implementations with uniform size segments).

cc: @pakrym

@benaadams
Copy link
Member

benaadams commented Feb 27, 2018

Using PR dotnet/corefx#27499

                                                  Method |       Mean |        Op/s | Scaled |
-------------------------------------------------------- |-----------:|------------:|-------:|
                                   'MM item. Local Span' | 148.554 ns | 6,731,567.6 |   1.00 |
                               'MM item. BufferSlice<T>' | 154.868 ns | 6,457,113.1 |   1.04 |
                'MM item. ReadOnlySequence<T> (current)' | 272.563 ns | 3,668,870.8 |   1.84 |
 'MM item. ReadOnlySequence<T> (PR dotnet/corefx#27455)' | 254.244 ns | 3,933,232.7 |   1.71 |
 'MM item. ReadOnlySequence<T> (PR dotnet/corefx#27499)' | 211.564 ns | 4,726,706.1 |   1.43 |

Improved to x1.43 off the local span. Code changes to benchmark to test hexawyz/MemoryLookupBenchmark#1

Bear in mind that SafeBufferSlice works directly off a pointer to create its Span so it wouldn't be able to be contained in the ReadOnlySequence data structure or return a ReadOnlyMemory as it doesn't use OwnedMemory, isn't an array or string.

Also ReadOnlySequence does bounds checking on Slice which the SafeBufferSlice doesn't do, it just adds the offset to the pointer and returns a Span of length - so its pretty unsafe.

*edit updated with tweaks

@benaadams
Copy link
Member

benaadams commented Feb 27, 2018

Update to benchmarks PR dotnet/corefx#27499 is doesn't scale badly for 100-1000 segments as shown below

                          Method |    Categories |        Mean |         Op/s | Scaled |
-------------------------------- |-------------- |------------:|-------------:|-------:|
 'ReadOnlySequence<T> (current)' |     1 segment |   103.83 ns |  9,630,807.9 |   1.00 |
       (PR dotnet/corefx#27455)' |     1 segment |    85.50 ns | 11,696,574.0 |   0.82 |
       (PR dotnet/corefx#27499)' |     1 segment |    74.30 ns | 13,458,594.1 |   0.72 |
                                 |               |             |              |        |
 'ReadOnlySequence<T> (current)' |  100 segments | 1,293.73 ns |    772,961.6 |   1.00 |
       (PR dotnet/corefx#27455)' |  100 segments |   969.20 ns |  1,031,774.4 |   0.75 |
       (PR dotnet/corefx#27499)' |  100 segments |   248.77 ns |  4,019,825.1 |   0.19 |
                                 |               |             |              |        |
 'ReadOnlySequence<T> (current)' | 1000 segments | 1,375.86 ns |    726,820.1 |   1.00 |
       (PR dotnet/corefx#27455)' | 1000 segments | 1,026.54 ns |    974,149.4 |   0.75 |
       (PR dotnet/corefx#27499)' | 1000 segments |   286.20 ns |  3,494,079.8 |   0.21 |
                                 |               |             |              |        |
                         Span<T> |       MM item |   147.97 ns |  6,758,249.9 |   0.54 |
                  BufferSlice<T> |       MM item |   152.01 ns |  6,578,374.7 |   0.56 |
 'ReadOnlySequence<T> (current)' |       MM item |   273.28 ns |  3,659,196.5 |   1.00 |
       (PR dotnet/corefx#27455)' |       MM item |   252.47 ns |  3,960,792.4 |   0.92 |
       (PR dotnet/corefx#27499)' |       MM item |   211.79 ns |  4,721,555.1 |   0.78 |

@hexawyz
Copy link
Author

hexawyz commented Feb 27, 2018

Also ReadOnlySequence does bounds checking on Slice which the SafeBufferSlice doesn't do, it just adds the offset to the pointer and returns a Span of length - so its pretty unsafe.

You're right about that… I just tried adding bounds checking before the creation of BufferSlice<T> to have a more fair comparison, and at least on my machine, it seems to actually increase the throughput 🤨

Method Mean Error StdDev Op/s Scaled Allocated
Span<T> 161.9 ns 1.951 ns 2.921 ns 6,178,403.9 0.52 0 B
BufferSlice<T> 151.8 ns 2.123 ns 3.178 ns 6,589,287.8 0.49 0 B
'BufferSlice<T> no Bounds Checking' 166.2 ns 1.419 ns 2.124 ns 6,015,079.6 0.54 0 B
'ReadOnlySequence<T> (current)' 310.0 ns 1.916 ns 2.868 ns 3,226,296.4 1.00 0 B

I may have made a mistake somewhere, or maybe it simply plays well with the JIT inlining, but I don't know what to conclude.

Anyway, good job with the improvements. The new results are great 🙂

@benaadams
Copy link
Member

Latest in dotnet/corefx#27499 is much closer still

                          Span<T> |       MM item |   145.45 ns |  6,875,297.6 |   0.55 |
                   BufferSlice<T> |       MM item |   147.68 ns |  6,771,233.9 |   0.55 |
   ReadOnlySequence<T> (previous) |       MM item |   266.73 ns |  3,749,147.7 |   1.00 |
    ReadOnlySequence<T> (current) |       MM item |   246.94 ns |  4,049,523.6 |   0.93 |
    ReadOnlySequence<T> (this PR) |       MM item |   198.30 ns |  5,042,838.1 |   0.74 |

@KrzysztofCwalina
Copy link
Member

Nice! These results are so close that I doubt the differences will matter outside of microsbenchmarks, i.e. once the program starts doing something interesting with the data in the buffers.

@KrzysztofCwalina
Copy link
Member

I am going to close this. If there is data showing that ROS still cannot support real apps with multi-segmented buffers, we can think how to improve the perf further. @GoldenCrystal thanks for bringing this scenario to our attention.

@ahsonkhan
Copy link
Member

Copying conversation over from https://github.com/dotnet/coreclr/issues/5851#issuecomment-370276484

From @kstewart83:

What is the possibility of adding a Span/Memory constructor for working with memory mapped files? Currently, it looks like I have to have unsafe code in order to do this:

var dbPath = "test.txt";
var initialSize = 1024;
var mmf = MemoryMappedFile.CreateFromFile(dbPath);
var mma = mmf.CreateViewAccessor(0, initialSize).SafeMemoryMappedViewHandle;
Span<byte> bytes;
unsafe
{
    byte* ptrMemMap = (byte*)0;
    mma.AcquirePointer(ref ptrMemMap);
    bytes = new Span<byte>(ptrMemMap, (int)mma.ByteLength);
}

Also, it seems like I can only create Spans, as there aren't public constructors for Memory that take a pointer (maybe I'm missing the reason for this). But since the view accessors have safe memory handles that implement System.Runtime.InteropServices.SafeBuffer (i.e., they have a pointer and a length)...it seems natural to be able to leverage this for Span/Memory. So what would be nice is something like this:

var dbPath = "test.txt";
var initialSize = 1024;
var mmf = MemoryMappedFile.CreateFromFile(dbPath);
var mma = mmf.CreateViewAccessor(0, initialSize).SafeMemoryMappedViewHandle;
var mem = new Memory(mma);
var span = mem.Span.Slice(0, 512);

I also noticed that the indexer and internal length of Span uses int. With memory mapped files (especially for database scenarios) it is reasonable that the target file will exceed the upper limit for int. I'm not sure about the performance impact of long based indexing or if there is some magic way to have it both ways, but it would be convenient for certain scenarios.


From @kstewart83:

Unfortunately, looking at https://github.com/dotnet/corefx/issues/26603 along with the referenced code in the benchmarks didn't clear things up for me. It seems like that particular use case is geared to copying small bits of the memory mapped files into Spans and ReadOnlySegments. It looks like the solution still involves unsafe code with OwnedMemory<T>, which is what I'd like to avoid. I don't have experience with manual memory management in C#, so some of this is a little difficult to grasp. That's what I found appealing about Span/Memory is that I could now access additional performance and reduce/eliminate copying data around without the headache of manual memory management and the issues that come with it. It seems memory mapped files fit into target paradigm of Span/Memory (unifying the APIs around contiguous random access memory), so hopefully some type of integration of memory mapped files and Span/Memory makes it in at some point.


From @davidfowl:

@KrzysztofCwalina I think we should create something first class with Memory mapped files and the new buffer primitives (ReadOnlySequence).

@kstewart83 all we have right now are extremely low level primitives that you have to string together to make something work. That specific issue was about the performance gap between using Span directly and using the ReadOnlySequence (the gap has been reduced for that specific scenario).

Dealing with anything bigger than an int you'll need to use ReadOnlySequence<T> which is just a view over a linked list of ReadOnlyMemory<T>.

@ahsonkhan ahsonkhan reopened this Mar 5, 2018
@GSPP
Copy link

GSPP commented Mar 5, 2018

It is not generally possible to slice large files into 1GB span segments. For example, a file could contain a large stream of small serialized items. Then, it's not possible to know where to cut the file. Slicing it could lead to torn items.

So it's no longer possible to create a span and pass it to some API of the form IEnumerable<MyItem> DeserializeStream(Span<byte> span) because the caller cannot know the slicing boundaries.

It would be really good if span supported long length. Some .NET users are already bumping against the 2GB array size limitations. For that reason the limit was increased to 2G items but that's only a short term remedy. As main memory sizes continue to grow any 2GB limit will make .NET look like ancient technology.

But I assume the int span length was consciously chosen... Unfortunately, I did not readily find a discussion about that but I would be interested to read it if somebody has a url to it at hand.

@jnm2
Copy link
Contributor

jnm2 commented Mar 5, 2018

Wouldn't it be better for the API to be built to handle chunks and therefore work with streaming scenarios as well?

@hexawyz
Copy link
Author

hexawyz commented Mar 5, 2018

But I assume the int span length was consciously chosen... Unfortunately, I did not readily find a discussion about that but I would be interested to read it if somebody has a url to it at hand.

If I understand correctly, the problem here would be more with Memory<T> than with Span<T>:

The current version of Memory<T> packs nicely into 16 bytes on x64, while Span<T> seems to have room for replacing the int _length by IntPtr _length and still fitting into 8/16 bytes.
However, increasing the Lenght property of Span<T> requires doing the same with Memory<T>.
If I'm not mistaken, increasing the size of Memory<T> (from 16 bytes to 24 bytes) might have consequences on the performance of the code, which would impact everyone. (Not just those of us that are playing with large regions of memory)

It is true that in the case I presented, ReadOnlySequence<T> acts as a valid replacement for a 64 bits-enabled Memory<T> / Span<T>, because all I needed was to copy the data somewhere.
But when you need to read/decode without copying, the API might indeed be less straightforward. 🤔

@kstewart83
Copy link

kstewart83 commented Mar 6, 2018

I suspect though that since Memory<T> is allocated on the heap, the performance impacts would be different than say for Span<T>. Passing a Memory<T> object around shouldn't be any different, so I think the only performance impact would be in creating Span<T>s or maybe the fill routines?

A compelling use case I see with combining memory mapped files with Memory<T>/Span<T> is specifically to enable zero copy databases with only safe C#. It allows for a very understandable and uniform API by being able to present ReadOnly slices as well as ReadWrite slices. This could be combined with data formats such as FlatBuffers which don't require explicit parsing/unpacking to access the data.

@KrzysztofCwalina
Copy link
Member

Memory is not allocated on the heap (necessarily). It's a struct.

@ghost
Copy link

ghost commented Mar 16, 2018

@KrzysztofCwalina, there is no API proposal for MMF Memory/Span overloads, should this issue be converted to api-needs-work. It will help downstream projects (serializers and other data computers etc,) waiting to update to .NET Core 2.1, if MMF also join the Span(t) and Memory(t) club. Thanks!

@KrzysztofCwalina
Copy link
Member

@kasper3, please open a separate issue for adding span support to MMF. This issue was about Memory's length property not being able to deal with large files.

@attilah
Copy link

attilah commented Jun 7, 2018

@kasper3 @KrzysztofCwalina is there a separate issue for MMF/Span? I was not able to find it and is not linked here.

@KrzysztofCwalina
Copy link
Member

I am not aware.

@ghost
Copy link

ghost commented Jun 7, 2018

@attilah, related https://github.com/dotnet/corefx/issues/29562#issuecomment-388182098 and overarching idea https://github.com/dotnet/corefx/issues/30174.
In case of MemoryMappedFile.CreateFromMemory, the file IO operation due to every .WriteX(..) would need to be replaced by memory IO operation. Use-case i was thinking was; user downloaded data file and without persisting to filesystem, file can be mapped to memory and sent back on wire. If you have better ideas how the API should be structured in terms of competing/related proposals, please send a proposal.

@miloush
Copy link
Contributor

miloush commented Jan 15, 2020

Sorry to be late to this, but it is not very clear to me from the above what is currently the recommended way to turn a MemoryMappedFile into a ReadOnlySequence<byte> (or ReadOnlySpan<byte>)?

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.1.0 milestone Jan 31, 2020
@sakno
Copy link
Contributor

sakno commented Jun 14, 2020

@miloush , you can use third-party library. ReadOnlySequenceAccessor is probably what you need.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 18, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests