mmap page cahe on Linux
Web Server forum
Back To The Forum Home!Search!Private Messaging System

Web Server Talk Web Server Talk > Unix and Linux reviews > Free Unix support > Unix Programming > mmap page cahe on Linux




  Last Thread   Next Thread Next
  Show Printable Version Email this Page Subscribe to this Thread      Post New Thread    Post A Reply      

    mmap page cahe on Linux  
softari22@gmail.com


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
08-19-07 12:17 PM

Hi All,

I am writing an application that needs to access data in huge files.
It only needs to access parts of the file. I was thinking of mmaping
the files and letting the kernel to pull in the pages that I am
accessing.

I will need to be doing this over several invogations of the program,
with some of the data remaining the same.

Does the kernel discard the mmap pages from the page cache when the
program that has mapped them exits or do they remain there untill it
needs the memory for something else?

Reason why I am asking is because if they are discarded then I suppose
that I need to write some sort of server that maps the files and makes
them available to the programs that need the data.

Many Thanks,
Softari






[ Post a follow-up to this message ]



    Re: mmap page cahe on Linux  
moi


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
08-19-07 12:17 PM

On Sun, 19 Aug 2007 11:08:52 +0000, softari22 wrote:

> Hi All,
>
> I am writing an application that needs to access data in huge files.
> It only needs to access parts of the file. I was thinking of mmaping
> the files and letting the kernel to pull in the pages that I am
> accessing.
>
> I will need to be doing this over several invogations of the program,
> with some of the data remaining the same.

If you mmap with MAP_SHARED, all the invocations of your program will see
the same data

>
> Does the kernel discard the mmap pages from the page cache when the
> program that has mapped them exits or do they remain there untill it
> needs the memory for something else?

If your process exits, the pages stay in the LRU-cache. Other processes
may benefit from that: if they want the same page, it MAY still be in the
cache. If the system needs more memory, the pages may be reused.


>
> Reason why I am asking is because if they are discarded then I suppose
> that I need to write some sort of server that maps the files and makes
> them available to the programs that need the data.

I'm not sure what you mean here. ( 'Prefetching' by a separate process can
be used to minimalize the chance of your main process to be blocked by
page-faults, but I don't think you want to do that. Yet.)
They are not discarded, anyway. (unless you tinker with msync() or
MAP_PRIVATE)

HTH,
AvK






[ Post a follow-up to this message ]



    Re: mmap page cahe on Linux  
softari22@gmail.com


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
08-19-07 06:23 PM

On Aug 19, 12:50 pm, moi <r...@localhost.localdomain> wrote:
> On Sun, 19 Aug 2007 11:08:52 +0000, softari22 wrote: 
> 
> 
>
> If you mmap with MAP_SHARED, all the invocations of your program will see
> the same data
>
>
> 
>
> If your process exits, the pages stay in the LRU-cache. Other processes
> may benefit from that: if they want the same page, it MAY still be in the
> cache. If the system needs more memory, the pages may be reused.
>
>
> 
>
> I'm not sure what you mean here. ( 'Prefetching' by a separate process can
> be used to minimalize the chance of your main process to be blocked by
> page-faults, but I don't think you want to do that. Yet.)
> They are not discarded, anyway. (unless you tinker with msync() or
> MAP_PRIVATE)
>
> HTH,
> AvK

Thank you very much for your helpful answer. What I meant was that I
have two alternatives (that I can think of)
1) Have a server process that maps the files to memory and using some
sort of IPC serves the data to separate client processes that request
the data.

2) Have each individual invocations to map the files to memory, using
MAP_SHARED as you suggested.

The timing when the data would be accessed would be the same using
both approaches. I would consider doing 1) if the kernel would some
how give more priority for pages that were accessed by process that is
still executing and that the probapility of them remaining in memory
would be greater than going with 2).

If I understood your answer correctly the likelyhood of the data still
being in the memory should be the same for both aproaches and 2) being
much simpler, that is the way to go?

Thank  you,
Softari






[ Post a follow-up to this message ]



    Re: mmap page cahe on Linux  
moi


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
08-19-07 06:23 PM

On Sun, 19 Aug 2007 15:34:19 +0000, softari22 wrote:

> On Aug 19, 12:50 pm, moi <r...@localhost.localdomain> wrote: 
>
> Thank you very much for your helpful answer. What I meant was that I
> have two alternatives (that I can think of)
> 1) Have a server process that maps the files to memory and using some
> sort of IPC serves the data to separate client processes that request
> the data.

I can see no reason (other than messaging) to use any (other) form of IPC.
Using IPC for the data would only cause more copies of the data to be kept
in memory, plus the additional copying and the necessary housekeeping.

>
> 2) Have each individual invocations to map the files to memory, using
> MAP_SHARED as you suggested.

Why do you need more than one process, anyway ?
[ one reason could be that the total size of the files you want to map
*at the same time* does not fit into your allowable address space,
typically 2-4 GB ]

>
> The timing when the data would be accessed would be the same using both
> approaches. I would consider doing 1) if the kernel would some how give
> more priority for pages that were accessed by process that is still
> executing and that the probapility of them remaining in memory would be
> greater than going with 2).

Don't. In practice it will be very hard to perform better than LRU.
Don't try to outsmart the system; at least not until you are smart
enough...

> If I understood your answer correctly the likelyhood of the data still
> being in the memory should be the same for both aproaches and 2) being
> much simpler, that is the way to go?

(2) is much simpler, yes.
The kernel is lazy: it does not trow away things that might be needed in
the future. Empty memory == wasted memory. So: if the "footprint" of your
process is smaller than "available memory", everything will (eventually)
be mapped into buffers. ( so your "likelyhood" will be close to 1. )

One point to clear up: mmap()ing per se *does not cause any data to be
transferred* to RAM. mmap only places the file into your process's
*address space*. It will be the first *access to the memory* that will
cause the data (a page from the file) to be faulted in. Even after that,
the memory may be reclaimed by the kernel, and a second access will cause
a second pagefault. etc.


HTH,
AvK





[ Post a follow-up to this message ]



    Sponsored Links  




 





   All times are GMT. The time now is 03:35 AM.      Post New Thread    Post A Reply      
  Last Thread   Next Thread Next


Most Popular forums 

Forum Jump:
Rate This Thread:

Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
 
Medical and Health forum | Computer Games Reviews | Graphics design forum

Back To The Top
Home | Usercp | Faq | Register