Using memory mapping in C to read binary files
While processing a very large binary file can using memory mapping in C make any difference when compared to fread ? Even if there are small differences in time it would be fine. And if it does make the process fsater any idea how to use memory mapping on a large binary file and extract data from it ?
If you're going to read the entire file beginning to end, the most important thing is to let the platform know this. This will allow it to do aggressive read ahead and it will allow it to avoid polluting the cache with data that will not be read again anyway. You can do this either with memory mapping or without it. The key functions are posix_fadvise and posix_madvise.
Memory mapping is a huge win when you have random, small accesses. This is especially true when you have multiple writes to the same page. Without memory mapping, each read or write requires a user/kernel transition and a copy. With memory mapping, most operations don't.
But with sequential access, all will save is the copy. Oddly, the user/kernel transitions may be even worse. With large sequential reads, you get one user/kernel transition per read, which could be per 256KB if the reads are large. With large sequential access to a memory mapped file, you may fault every page (4KB). It depends on the kernel's "fault ahead" optimizations.
However, with memory mapping, you will save the copy, assuming you don't need to do the copy anyway. If you have to copy out of the mapped pages for any reason, then you might as well let a read operation copy them into place for you. However, if you can operate on the data in place, memory mapping may be a win.
It generally doesn't make as much of a difference as people tend to think it does. Especially when you think about how slow the disk is in comparison to all this stuff.