|
Home > Archive > Unix Programming > August 2007 > mmap page cahe on Linux
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
mmap page cahe on Linux
|
|
| softari22@gmail.com 2007-08-19, 7:17 am |
| Hi All,
I am writing an application that needs to access data in huge files.
It only needs to access parts of the file. I was thinking of mmaping
the files and letting the kernel to pull in the pages that I am
accessing.
I will need to be doing this over several invogations of the program,
with some of the data remaining the same.
Does the kernel discard the mmap pages from the page cache when the
program that has mapped them exits or do they remain there untill it
needs the memory for something else?
Reason why I am asking is because if they are discarded then I suppose
that I need to write some sort of server that maps the files and makes
them available to the programs that need the data.
Many Thanks,
Softari
| |
|
| On Sun, 19 Aug 2007 11:08:52 +0000, softari22 wrote:
> Hi All,
>
> I am writing an application that needs to access data in huge files.
> It only needs to access parts of the file. I was thinking of mmaping
> the files and letting the kernel to pull in the pages that I am
> accessing.
>
> I will need to be doing this over several invogations of the program,
> with some of the data remaining the same.
If you mmap with MAP_SHARED, all the invocations of your program will see
the same data
>
> Does the kernel discard the mmap pages from the page cache when the
> program that has mapped them exits or do they remain there untill it
> needs the memory for something else?
If your process exits, the pages stay in the LRU-cache. Other processes
may benefit from that: if they want the same page, it MAY still be in the
cache. If the system needs more memory, the pages may be reused.
>
> Reason why I am asking is because if they are discarded then I suppose
> that I need to write some sort of server that maps the files and makes
> them available to the programs that need the data.
I'm not sure what you mean here. ( 'Prefetching' by a separate process can
be used to minimalize the chance of your main process to be blocked by
page-faults, but I don't think you want to do that. Yet.)
They are not discarded, anyway. (unless you tinker with msync() or
MAP_PRIVATE)
HTH,
AvK
| |
| softari22@gmail.com 2007-08-19, 1:23 pm |
| On Aug 19, 12:50 pm, moi <r...@localhost.localdomain> wrote:
> On Sun, 19 Aug 2007 11:08:52 +0000, softari22 wrote:
>
>
>
> If you mmap with MAP_SHARED, all the invocations of your program will see
> the same data
>
>
>
>
> If your process exits, the pages stay in the LRU-cache. Other processes
> may benefit from that: if they want the same page, it MAY still be in the
> cache. If the system needs more memory, the pages may be reused.
>
>
>
>
> I'm not sure what you mean here. ( 'Prefetching' by a separate process can
> be used to minimalize the chance of your main process to be blocked by
> page-faults, but I don't think you want to do that. Yet.)
> They are not discarded, anyway. (unless you tinker with msync() or
> MAP_PRIVATE)
>
> HTH,
> AvK
Thank you very much for your helpful answer. What I meant was that I
have two alternatives (that I can think of)
1) Have a server process that maps the files to memory and using some
sort of IPC serves the data to separate client processes that request
the data.
2) Have each individual invocations to map the files to memory, using
MAP_SHARED as you suggested.
The timing when the data would be accessed would be the same using
both approaches. I would consider doing 1) if the kernel would some
how give more priority for pages that were accessed by process that is
still executing and that the probapility of them remaining in memory
would be greater than going with 2).
If I understood your answer correctly the likelyhood of the data still
being in the memory should be the same for both aproaches and 2) being
much simpler, that is the way to go?
Thank you,
Softari
| |
|
| On Sun, 19 Aug 2007 15:34:19 +0000, softari22 wrote:
> On Aug 19, 12:50 pm, moi <r...@localhost.localdomain> wrote:
>
> Thank you very much for your helpful answer. What I meant was that I
> have two alternatives (that I can think of)
> 1) Have a server process that maps the files to memory and using some
> sort of IPC serves the data to separate client processes that request
> the data.
I can see no reason (other than messaging) to use any (other) form of IPC.
Using IPC for the data would only cause more copies of the data to be kept
in memory, plus the additional copying and the necessary housekeeping.
>
> 2) Have each individual invocations to map the files to memory, using
> MAP_SHARED as you suggested.
Why do you need more than one process, anyway ?
[ one reason could be that the total size of the files you want to map
*at the same time* does not fit into your allowable address space,
typically 2-4 GB ]
>
> The timing when the data would be accessed would be the same using both
> approaches. I would consider doing 1) if the kernel would some how give
> more priority for pages that were accessed by process that is still
> executing and that the probapility of them remaining in memory would be
> greater than going with 2).
Don't. In practice it will be very hard to perform better than LRU.
Don't try to outsmart the system; at least not until you are smart
enough...
> If I understood your answer correctly the likelyhood of the data still
> being in the memory should be the same for both aproaches and 2) being
> much simpler, that is the way to go?
(2) is much simpler, yes.
The kernel is lazy: it does not trow away things that might be needed in
the future. Empty memory == wasted memory. So: if the "footprint" of your
process is smaller than "available memory", everything will (eventually)
be mapped into buffers. ( so your "likelyhood" will be close to 1. )
One point to clear up: mmap()ing per se *does not cause any data to be
transferred* to RAM. mmap only places the file into your process's
*address space*. It will be the first *access to the memory* that will
cause the data (a page from the file) to be faulted in. Even after that,
the memory may be reclaimed by the kernel, and a second access will cause
a second pagefault. etc.
HTH,
AvK
|
|
|
|
|