Efficient sparse access to large memory-mapped file

We have image data in a large (e.g. 700 MB) files. The files are memory mapped on Windows 7 64-bit.

Some operations on the image data involve us reading a few bytes from each line of the image. This can be slow - no line is larger than a page so we get a page fault for every line even though we are only reading a few bytes. There's no way we can see of getting round this in our current implementation, however we would like to make sure we can squeeze the most out of the disk system.

To get the best possible performance, we are hoping that we can hint to the VM system to fetch the next image line (possibly causing a page fault) while we are processing data in the current one. This would parallelise our processing and the page faults. There does not seem to be an obvious way to do this on Windows!

So the questions:

  • Is there any equivalent on Windows 7 to madvise( MADV_WILLNEED )?

  • Is there a way of asynchronously touching a page, triggering a page fault without waiting for the page to become available?

The right long-term solution is to store our data a different way (e.g. in tiles), however we can't do that right now. We need to keep the memory-mapped approach as well at the moment.


I do not think you can hint VM system but you can pre-fetch data just by loading next line data while you process previous. You can make this parallel to processing. You should probably fetch more than 1 ahead since processing is most likely much faster then reading from file.

This actually quite nicely fits into producer-consumer pattern. Make image reader and data processor run on separate threads and use some sort of blocking collection (like C# BlockingCollection) with capacity limit to pass data from reader to processor.

Even though this is somewhat oldish topic already, I give a practical example of one implementation, which should be used as a design reference only if you understand the big picture well enough.

We had a challenge in one "big data" telco -application, where we new better than virtual memory manager what pages should be paged in from sparse "huge files". What we did was that we had a dedicated thread ("inmemadvisor") to receive "need" requests, which each had a priority. This thread maintains the prioritized list of requests and submits the memory read "hints" to pool of threads, where largest number of threads handles high priority requests and smallest part of the pool handles lowest priority (you probably get the idea).

Then we have parameters to control the number of threads in the pool and some other nifty details.

Pros of this implementation are:

  • we outperform 10x the VM's default paging model (in Windoze Server 2008/2012)
  • as long as the paging goes "soft" and we can forget the data we don't need again (no immediate need to page in with hard fault) the performance is better than good
  • we can utilize the whole available free physical RAM memory to speed up the computation
  • every new MB of memory you add can boost the performance


  • if you run out of RAM (locality broken), the performance will be a pig ... well, that would be same even without this implementation
  • this requires some careful design and good implementation to be worth of doing it
  • in some applications you may need to add additional monitoring and control logic to control "inmemadvisor" to be a good citizen

So, in short: This is something that one rather would not do, but on the other hand, these are the things wich make programming a positive challenge ;-) Btw: Our implementation beats Linux madvice() in performance with our application, but is not as generic as it is.

Cheers, //Jari

Need Your Help

C# Implicit operators and ToString()

c# operator-overloading implicit-cast

I'm creating my own type for representing css values (like pixels eg. 12px ). To be able to add/subtract/multiply/... my type and ints I've defined two implicit operators to and from int. Everything

Better way for aggregation on list of case classes

scala case-class foldleft

I have list of case classes. Output requires aggregation on different parameters of case class. Looking for more optimized way to do it.