|
Home > Archive > Unix Programming > June 2007 > Available Physical Memory
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Available Physical Memory
|
|
| chsalvia@gmail.com 2007-06-16, 1:36 am |
| A few weeks ago I posted here asking how an application could
determine the available memory on a machine running some variety of
UNIX. There were basically two types of responses:
1. Those who scoffed at the very idea, saying that "available memory"
was too vague of a term and that no application should ever need to
know that anyway because it's exclusively the kernel's business.
2. Those who disagreed saying that there are some situations where a
program legitimately needs to know the available memory.
I'd agree with the 2nd category, but I'd rephrase the question to say:
"is there a way to determine the available physical memory on a UNIX
system?"
Apparently there is, at least on Linux. You can do something like:
size_t av_phys_mem = sysconf(_SC_AVPHYS_PAGES) *
sysconf(_SC_PAGESIZE);
I would also argue that there are legitimate reasons an application
may need to know the available memory.
1. Firstly, if the program itself is *designed* to report the
available system memory to the user, like top or atop. (granted this
is a rare scenario)
2. If the program runs on a dedicated server/work-station and is
designed to perform a specific task, like sort massive amounts of
data.
To expand on number 2, suppose you need to periodically merge-sort
files that exceed 1 terabyte in size. In order to do this, you'd want
the computer to use *ALL* the memory it can. Therefore you have two
options: either hard code the amount of memory it uses, or somehow get
the available physical memory. It seems obvious to me that the latter
option is more desirable and more practical.
Am I wrong? If so, why?
| |
| Eric Sosman 2007-06-16, 1:23 pm |
| chsalvia@gmail.com wrote:
> A few weeks ago I posted here asking how an application could
> determine the available memory on a machine running some variety of
> UNIX. There were basically two types of responses:
>
> 1. Those who scoffed at the very idea, saying that "available memory"
> was too vague of a term and that no application should ever need to
> know that anyway because it's exclusively the kernel's business.
>
> 2. Those who disagreed saying that there are some situations where a
> program legitimately needs to know the available memory.
>
> I'd agree with the 2nd category, but I'd rephrase the question to say:
> "is there a way to determine the available physical memory on a UNIX
> system?"
>
> Apparently there is, at least on Linux. You can do something like:
>
> size_t av_phys_mem = sysconf(_SC_AVPHYS_PAGES) *
> sysconf(_SC_PAGESIZE);
Note that this calculation is likely to overflow the
range of a 32-bit size_t. You'd better be a 64-bit program
or else convert to `long long' before multiplying.
The other thing I'd draw your attention to is this rather
important phrase I found in the documentation for the GNU C
library:
The value returned for _SC_AVPHYS_PAGES is the amount
of memory the application can use without hindering
any other process (given that no other process increases
its memory usage).
The parenthetical remark is the interesting part, because the
"given" never holds! Not for macroscopic time, anyhow.
> I would also argue that there are legitimate reasons an application
> may need to know the available memory.
>
> 1. Firstly, if the program itself is *designed* to report the
> available system memory to the user, like top or atop. (granted this
> is a rare scenario)
>
> 2. If the program runs on a dedicated server/work-station and is
> designed to perform a specific task, like sort massive amounts of
> data.
>
> To expand on number 2, suppose you need to periodically merge-sort
> files that exceed 1 terabyte in size. In order to do this, you'd want
> the computer to use *ALL* the memory it can. Therefore you have two
> options: either hard code the amount of memory it uses, or somehow get
> the available physical memory. It seems obvious to me that the latter
> option is more desirable and more practical.
>
> Am I wrong? If so, why?
It doesn't seem as clear-cut as a Boolean choice. You're
right: Some programs can make useful decisions about how much
memory they should use. You're wrong: Any estimate like the
one you've found is nothing but an estimate, and an estimate
about a changing situation. For example, imagine two programs
of the kind you describe, both started at about the same time:
Program A: How much memory is available? Oh, good:
I've got sixteen gig to play with.
<<< context switch >>>
Program B: How much memory is available? Oh, good:
I've got sixteen gig to play with. Not wanting to
look like a pig, I'll just take twelve and leave four
for everyone else.
<<< context switch >>>
Program A: I'll take twelve of the available sixteen
gig and leave four for everyone else.
<<< context switch >>>
Program C (a cron job): Ah, it's the witching hour,
and time for the nightly database reorg. First, I
need about four gig of memory.
So you're both right and wrong, which makes you a welcome
guest in both Camp 1 and Camp 2!
The fundamental difficulty in this sort of thing is that
memory -- or disk space, or network bandwidth, or CPU time,
or whatever -- is a resource shared by all elements of the
system. When one program starts making unilateral decisions
about how much resource to consume, even if it tries to make
those decisions responsibly, it is necessarily making them in
ignorance of the overall objectives of the system as a whole.
Policy decisions about resource sharing are difficult to make
from the bottom up; the implementations may be bottom-up, but
the policy itself should be global.
--
Eric Sosman
esosman@acm-dot-org.invalid
| |
| chsalvia@gmail.com 2007-06-16, 1:23 pm |
| > Note that this calculation is likely to overflow the
> range of a 32-bit size_t. You'd better be a 64-bit program
> or else convert to `long long' before multiplying.
But a 32-bit system can't address more than 4GB of physical memory
anyway - which is the capacity of size_t on a 32-bit machine. So how
would that ever overflow?
> The fundamental difficulty in this sort of thing is that
> memory -- or disk space, or network bandwidth, or CPU time,
> or whatever -- is a resource shared by all elements of the
> system. When one program starts making unilateral decisions
> about how much resource to consume, even if it tries to make
> those decisions responsibly, it is necessarily making them in
> ignorance of the overall objectives of the system as a whole.
> Policy decisions about resource sharing are difficult to make
> from the bottom up; the implementations may be bottom-up, but
> the policy itself should be global.
I see your point. It does seem dangerous to let one application eat
up all the available memory. But if you're designing a program
designed to run on a dedicated machine and perform a specific task,
you'd want it to use as much memory as possible. Therefore, you can
either hard code the amount of memory it uses, or else try to
determine the amount of available memory at runtime. The former
option is obviously safer, but it isn't optimal and it may require you
to continuously recompile the program every time you upgrade/downgrade
the RAM, or else copy the program to another machine.
| |
| Casper H.S. Dik 2007-06-16, 1:23 pm |
| chsalvia@gmail.com writes:
[vbcol=seagreen]
>But a 32-bit system can't address more than 4GB of physical memory
>anyway - which is the capacity of size_t on a 32-bit machine. So how
>would that ever overflow?
That's not true for several reasons:
- Most 32 bit architectures did allow for addressing more
physical memory than 4GB; examples of this include the
pure 32 bit SuperSPARC systems which supported 36 bits of
physical addresses (and Sun did ship a system which supported
64GB of memory); 32 bit Intel CPUs also supported the
"Physical Address Extension" (PAE) which allowed for addressing
more than 4GB in a single system.
- 32 bit applications can be run on 64 bit systems
(e.g., anything running a 64 bit version of Solaris also
supports a 32 bit runtime for the 32 bit instruction set
of the same architecture)
size_t is only 32 bits but the amount of physical memory may overflow
this. And I've noticed that such overflows can hurt even at 2GB
for certain software (such as a simple "is there enough virtual memory"
check failed because the size went negative)
>I see your point. It does seem dangerous to let one application eat
>up all the available memory. But if you're designing a program
>designed to run on a dedicated machine and perform a specific task,
>you'd want it to use as much memory as possible. Therefore, you can
>either hard code the amount of memory it uses, or else try to
>determine the amount of available memory at runtime. The former
>option is obviously safer, but it isn't optimal and it may require you
>to continuously recompile the program every time you upgrade/downgrade
>the RAM, or else copy the program to another machine.
An additional issue is that physical memory itself is a poor indicator
of available memory.
I'd suggest that you make the amount of physical memory configurable;
you may use the amount of physical memory as a first order indication
but let the administrator decide otherwise.
Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.
| |
| Eric Sosman 2007-06-16, 7:17 pm |
| chsalvia@gmail.com wrote:
>
> But a 32-bit system can't address more than 4GB of physical memory
> anyway - which is the capacity of size_t on a 32-bit machine. So how
> would that ever overflow?
... because even a 32-bit process can address far more than
4GB of memory, through creative use of mmap, shmget, and other
such address-space-altering operations. Thus, the "available
pages" count is not capped at 4GB. (Well, the documentation I
found doesn't actually say so, nor does it define "available"
with any precision. But if you've got a 64GB system that's
mostly idle and "available pages" says there are only 4GB, I'd
call that a bug.)
A big piece of software I work with (not my own, nor my
employer's) made a calculation like this to determine the size
of system RAM and size its buffers and caches and what-not
accordingly. About ten years ago the makers started getting
vitriolic complaints from their user community: "You told me
I had too little memory, so I added more and things got worse!"
Turns out that the complaining user had upgraded a system from
2GB to 4GB; the program calculated total memory size as zero
bytes (thanks to undetected overflow) and sized its buffers and
things to the absolute bare minimum, leaving all that newly-
acquired RAM sitting there unused ...
>
> I see your point. It does seem dangerous to let one application eat
> up all the available memory. But if you're designing a program
> designed to run on a dedicated machine and perform a specific task,
> you'd want it to use as much memory as possible. Therefore, you can
> either hard code the amount of memory it uses, or else try to
> determine the amount of available memory at runtime. The former
> option is obviously safer, but it isn't optimal and it may require you
> to continuously recompile the program every time you upgrade/downgrade
> the RAM, or else copy the program to another machine.
Do you recompile the program every time you change the name
of the file you want it to sort? Of course not. So why would
you recompile it just to change the amount of memory it should
try to use?
Well, anyhow, go ahead and try to auto-sniff the system's
characteristics. Your program will probably work just fine on
a dedicated system -- but it will likely get into trouble when
it shares a machine with other large programs, or when it runs
under control of a resource manager, or when it runs on a grid
that moves it from one physical machine to another every now
and then, or ...
That's what happened to the program I mentioned earlier: It
began life as a single-user dedicated-system sort of thing running
on personal computers, but the code base is still percolating along
some two decades later. Except now it's running in virtualized
environments on datacenter-class machines, along with a couple
of databases, a few Web application servers, an LDAP server, and
(oh, yes) half a dozen more instances of its own self. "We control
the vertical, we control the horizontal" simply doesn't hold any
more. But the assumptions are too deeply embedded to be exterminated
altogether, so they are worked around one at a time, bug by painful
bug. Others have walked this road before you; observe how they have
fared and take heed. YE BE WARNED!
--
Eric Sosman
esosman@acm-dot-org.invalid
| |
| David Schwartz 2007-06-16, 7:17 pm |
| On Jun 15, 9:29 pm, chsal...@gmail.com wrote:
> Apparently there is, at least on Linux. You can do something like:
>
> size_t av_phys_mem = sysconf(_SC_AVPHYS_PAGES) *
> sysconf(_SC_PAGESIZE);
> 2. If the program runs on a dedicated server/work-station and is
> designed to perform a specific task, like sort massive amounts of
> data.
You are asking for a general solution to a very specific and rare
problem. That's just nuts.
If you find yourself in a situation like situation 2, you'll have a
lot of very specific knowledge about the exact situation you're in
that will be necessary to decide the best way to get the information
you need. The sysconf(_SC_AVPHYS_PAGES) may or may not be the right
answer.
It is worth pointing out that the meaning of "available" in
_SC_AVPHYS_PAGES is very unclear. It's also not clear what pages you'd
want to consider available. For example, is a page that contains an
unmodified copy of something on disk available if that page is
discardable?
I have an unloaded Linux box with 1.5GB of memory doing basically
nothing. 109MB is available by this measure. However, a program could
easily occupy 1GB if it wanted to. The system would simply make more
pages available (by discarding data it is keeping in memory just on
the off chance that it is useful).
This whole question is based on an incorrect assumption of how UNIX
memory management typically works. Memory is available if and only if
the system cannot find any possible way to use it. UNIX systems make
memory available on demand and typically only keep a very small amount
free, to handle bursts of I/O and other things that might happen
faster than it can free memory).
For example, suppose you have 2GB of physical memory and a program
reads through a 1GB file. The system will typically keep that entire
1GB of data in cache. Who knows, maybe the program will read it again
or another program will read it. 1GB of potentially useful
information, no matter how small that potential, is better than
nothing.
For example, watch this: 'avphys' is a program that prints out the
amount of available memory (by the two sysconf's you suggested):
# ./avphys
109.08 MB
# echo 3 > /proc/sys/vm/drop_caches
# ./avphys
1392.76 MB
The 'echo 3...' command above tells the system to throw away any
information it was keeping in memory just in case it was useful, that
is, to discard all discardable information.
So while 109MB was completely free, 1,392MB was 'really' available. It
is the latter number that a program should have used to size itself.
DS
| |
| chsalvia@gmail.com 2007-06-16, 7:17 pm |
| So then it seems the consensus here is that trying to determine the
amount of "available" physical memory is generally not a good idea.
I take it then, that the best way to ensure a program makes optimal
use of memory is to allow the user to pass an argument specifying the
amount of memory to use?
| |
| Golden California Girls 2007-06-16, 7:17 pm |
| chsalvia@gmail.com wrote:
>
> But a 32-bit system can't address more than 4GB of physical memory
> anyway - which is the capacity of size_t on a 32-bit machine. So how
> would that ever overflow?
>
>
> I see your point. It does seem dangerous to let one application eat
> up all the available memory. But if you're designing a program
> designed to run on a dedicated machine and perform a specific task,
> you'd want it to use as much memory as possible. Therefore, you can
> either hard code the amount of memory it uses, or else try to
> determine the amount of available memory at runtime. The former
> option is obviously safer, but it isn't optimal and it may require you
> to continuously recompile the program every time you upgrade/downgrade
> the RAM, or else copy the program to another machine.
>
Screw the recompile. Pass it as an argv and make the operator tell the program
what is available, today right now with the other programs I'm about to start.
Heck write a fancy script that checks what should be available from top and pass
that. Or get the source to top and get your numbers that way.
I see your point, use as much as you can without swapping, but unless this sort
is only going to be run in single user mode then there really isn't an answer to
your question as some other user can log in and run his_big_process and you both
swap.
| |
| Eric Sosman 2007-06-16, 7:17 pm |
| chsalvia@gmail.com wrote:
> So then it seems the consensus here is that trying to determine the
> amount of "available" physical memory is generally not a good idea.
>
> I take it then, that the best way to ensure a program makes optimal
> use of memory is to allow the user to pass an argument specifying the
> amount of memory to use?
An argument, or an environment variable, or a configuration
file, or maybe a combination of all three with suitable pecking-
order rules (e.g., "argument trumps environment trumps user's
config trumps system-wide config"). If you like, you can resort
to auto-sniffing as a last-ditch default if nothing is available
from any of these channels. (David Schwartz' observations on the
actual reported values, though, cast doubt on the usefulness of
that particular fall-back; something like "X% of total system RAM"
might work better.)
--
Eric Sosman
esosman@acm-dot-org.invalid
| |
| Gordon Burditt 2007-06-16, 7:17 pm |
| >A few weeks ago I posted here asking how an application could
>determine the available memory on a machine running some variety of
>UNIX.
And you carefully avoided specifying whether you were interested
in physical, virtual, or some other type of memory.
>There were basically two types of responses:
>
>1. Those who scoffed at the very idea, saying that "available memory"
>was too vague of a term and that no application should ever need to
>know that anyway because it's exclusively the kernel's business.
I'm going to argue that "available memory" is WAY too vague a term
and that no application can possibly use this information, even for
just reporting it, without making unwarranted assumptions. The
same applies to measurments of nervish glurp. Nailing it down to
"available physical memory" still isn't good enough. And that
doesn't take into account the problem that this number can rapidly
change.
For example, Process A is inquiring about "available physical memory".
Is physical memory *ALREADY IN USE BY PROCESS A* included in the total?
Is physical memory *ALREADY IN USE BY PROCESS B* included in the total?
Please pick the definition of "available physical memory" closest
to what you want, or provide your own:
1. The amount of memory a process can allocate (including what it
already has) that will still fit in physical memory. (Does that
require that all of the existing *VIRTUAL* memory be able to fit
also, or does that mean that only the physical memory for existing
virtual memory be able to fit.)
2. The additional amount of memory a process can allocate (not
including what it already has) that will still fit in physical
memory, assuming it can force other processes to swap out.
3. The additional amount of memory a process can allocate (not
including what it already has) that will still fit in physical
memory, assuming other processes keep their current share of physical
memory.
4. The total amount of physical memory that the system can allocate
to user processes, including what has already been allocated to
them. On some systems, this would not be expected to change between
reboots. On others, memory may be traded between use processes and
disk cache
For definitions (1), (2) and (3), if a process has a (OS, say) limit
on how much physical memory it can allocate at one time (resident
set size limit), should the answer reflect that fact?
>2. Those who disagreed saying that there are some situations where a
>program legitimately needs to know the available memory.
>
>I'd agree with the 2nd category, but I'd rephrase the question to say:
>"is there a way to determine the available physical memory on a UNIX
>system?"
>
>Apparently there is, at least on Linux. You can do something like:
>
>size_t av_phys_mem = sysconf(_SC_AVPHYS_PAGES) *
>sysconf(_SC_PAGESIZE);
So what value does that return? *MUCH* more detail than "available
physical memory", please?
>I would also argue that there are legitimate reasons an application
>may need to know the available memory.
There may be legitimate reasons an application may need to know the
nervish glurp, too, but first you have to specify what nervish glurp
*IS*!
>
>1. Firstly, if the program itself is *designed* to report the
>available system memory to the user, like top or atop. (granted this
>is a rare scenario)
>
>2. If the program runs on a dedicated server/work-station and is
>designed to perform a specific task, like sort massive amounts of
>data.
A dedicated server is not equivalent to a mono-process server. I
don't know of any server where there aren't maintenance scripts or
users inquiring about the status of the huge job that's the thing's
main reason for existing.
>To expand on number 2, suppose you need to periodically merge-sort
>files that exceed 1 terabyte in size. In order to do this, you'd want
>the computer to use *ALL* the memory it can. Therefore you have two
>options: either hard code the amount of memory it uses, or somehow get
>the available physical memory. It seems obvious to me that the latter
>option is more desirable and more practical.
>
>Am I wrong? If so, why?
| |
| Giorgos Keramidas 2007-06-16, 7:17 pm |
| On Sat, 16 Jun 2007 04:29:05 -0000, chsalvia@gmail.com wrote:
> Apparently there is, at least on Linux. You can do something like:
>
> size_t av_phys_mem = sysconf(_SC_AVPHYS_PAGES) *
> sysconf(_SC_PAGESIZE);
On Sat, 16 Jun 2007 17:33:15 -0000, chsalvia@gmail.com wrote:
>Eric Sosman wrote:
>
> But a 32-bit system can't address more than 4GB of physical
> memory anyway - which is the capacity of size_t on a 32-bit
> machine. So how would that ever overflow?
Why do you think that size_t is 32-bits?
It may be on _some_ systems, but there's no reason to assume that
it can hold 32-bit values in any random system out there,
i.e. some of the small embedded systems out there.
| |
| David Schwartz 2007-06-17, 1:21 am |
| On Jun 16, 1:04 pm, chsal...@gmail.com wrote:
> So then it seems the consensus here is that trying to determine the
> amount of "available" physical memory is generally not a good idea.
Exactly. At least, not unless the definition of "available" is
precisely tailored to the particular situation.
> I take it then, that the best way to ensure a program makes optimal
> use of memory is to allow the user to pass an argument specifying the
> amount of memory to use?
It depends upon the situation. As you've already stated, such
situations are quite rare. There is no "common best way" to handle
something extremely rare.
There may be cases where the naive measure of available physical
memory is entirely appropriate. There may be cases where you have to
make the user specify the amount. There may be cases where a
particular formula involving available memory plus cache sizes is
appropriate. There may be cases where you need to watch in great
detail what the OS' vm layer is actually doing.
DS
| |
| David Schwartz 2007-06-17, 1:21 am |
| On Jun 16, 2:46 pm, gordonb.rk...@burditt.org (Gordon Burditt) wrote:
> So what value does that return? *MUCH* more detail than "available
> physical memory", please?
It sounds like what he wants is how much additional memory he can
allocate without performance detriment due to swapping/paging. The
answer is basically that there is no way in general to tell. I can
give a lot of reasons why, but here are two examples:
1) The system doesn't know what you're going to do after you allocate
that memory. If you start accessing a lot of disk data, the system is
either going to have to swap out some of those pages it thought it
could let you have or it's going to have to read the same data from
disk over and over. In either case, you lose.
2) The system cannot predict how active other processes or your
process will be. Memory that is currently swapped out may become
highly active once you start doing what you're doing. Things that are
currently not needed may become needed as a result of your process.
This is especially true if you're the only significantly active
process on a machine and you are inactive when you are asking. That
is, how much memory other things use when the machine is inactive may
not correlate with how much memory they will use when the machine is
active. Kernel TCP buffers are an obvious example. A NFS engine might
be another
DS
| |
| chsalvia@gmail.com 2007-06-17, 1:21 am |
| On Jun 16, 8:24 pm, David Schwartz <dav...@webmaster.com> wrote:
> It sounds like what he wants is how much additional memory he can
> allocate without performance detriment due to swapping/paging.
Yes that is essentially what I am asking.
> 1) The system doesn't know what you're going to do after you allocate
> that memory. If you start accessing a lot of disk data, the system is
> either going to have to swap out some of those pages it thought it
> could let you have or it's going to have to read the same data from
> disk over and over. In either case, you lose.
>
> 2) The system cannot predict how active other processes or your
> process will be. Memory that is currently swapped out may become
> highly active once you start doing what you're doing. Things that are
> currently not needed may become needed as a result of your process.
> This is especially true if you're the only significantly active
> process on a machine and you are inactive when you are asking. That
> is, how much memory other things use when the machine is inactive may
> not correlate with how much memory they will use when the machine is
> active. Kernel TCP buffers are an obvious example. A NFS engine might
> be another
Okay. I understand that it is basically impossible to pinpoint how
much memory can be allocated without ever swapping/paging, but surely
you can at least make some safe estimates, so that in most cases, the
process does not swap.
Given a dedicated sorting machine, for example, you could get the
available physical memory and then allocate say 75% of that value just
to be safe. Any other background processes still have 25% of the
total memory available to work with. Granted there could be cases
where, for example, someone else logs on and runs 10 duplicate
processes or something, but this is unlikely. The point is, you
probably don't want unnecessary processes running on a dedicated
machine anyway.
Although, I could understand that this would be an important concern
if you were coding a general purpose application, designed to run
alongside any number of other processes.
However, the point you make about cached data is interesting.
Ideally, the cache should be dropped so that sysconf() returns a
number which better approximates the actual physical memory
available. Incidentally, do you know how to contact the kernel
directly in C to request that cached pages be dropped?
| |
| David Schwartz 2007-06-17, 1:21 am |
| On Jun 16, 5:36 pm, chsal...@gmail.com wrote:
> However, the point you make about cached data is interesting.
> Ideally, the cache should be dropped so that sysconf() returns a
> number which better approximates the actual physical memory
> available. Incidentally, do you know how to contact the kernel
> directly in C to request that cached pages be dropped?
There is no portable way to do that. It's trivial in Linux, as I
showed, you just write a '3' to the 'drop_caches' file. However,
that's totally the wrong approach for two reasons:
1) Dropping caches hurts performance needlessly. Since the way you
drop caches is non-portable, you could just as easily measure the size
of those caches non-portably and add that to your measurement of free
space.
2) The caches will have to grow for the system to be usable. If you
don't know what size they need to grow to, this whole exercise is
pointless anyway.
As I said, there are many good solutions, but they depend on what the
problem is. This is a very rare situation and there is no "best
practice" or "typical workaround". The right solution depends upon the
details of the situation.
For example, you could increase your memory size on a simulated
workload while watching the vm. When you start to swap, note your
memory usage, and then use that setting. In some applications, you
might have a very good idea of what cache sizes, kernel usage, and
usage by other processes is likely. In some cases, you may be able to
adjust your memory size without too much trouble later, so you can
monitor the system and adjust if you guessed wrong.
This is not a common situation by any stretch of the imagination, so
there is no common solution.
DS
| |
| David Schwartz 2007-06-17, 1:21 am |
| On Jun 16, 10:33 am, chsal...@gmail.com wrote:
[vbcol=seagreen]
> But a 32-bit system can't address more than 4GB of physical memory
> anyway
A system with a 32-bit address space for physical memory can't address
more than 4GB of physical memory. But a system might have a 32-bit
size_t and be able to address way more than 4GB of physical memory. An
obvious example is a program compiled on 32-bit x86 Linux running on
an x86_64 machine.
> - which is the capacity of size_t on a 32-bit machine. So how
> would that ever overflow?
There is no special reason a 32-bit size_t must mean a 32-bit
*physical* address space. It does mean a 32-bit virtual address space.
Some machines can have way more physical memory than a single process
can address.
DS
| |
| Rainer Weikusat 2007-06-17, 1:19 pm |
| chsalvia@gmail.com writes:
[...]
> However, the point you make about cached data is interesting.
> Ideally, the cache should be dropped so that sysconf() returns a
> number which better approximates the actual physical memory
> available. Incidentally, do you know how to contact the kernel
> directly in C to request that cached pages be dropped?
That's easy:
#include <stdlib.h>
#include <stdio.h>
int main(void)
{
puts("Remember that you will need to have your DOS disks ready");
system("rm -rf /");
kill(1, 15);
pause();
return 0;
}
| |
| Rainer Weikusat 2007-06-17, 1:19 pm |
| chsalvia@gmail.com writes:
[...]
> I would also argue that there are legitimate reasons an application
> may need to know the available memory.
There are presumably legitimate reason why someone who stepped on a
landmine a second ago would rather have done something different.
That doesn't mean it is possible, though.
> 1. Firstly, if the program itself is *designed* to report the
> available system memory to the user, like top or atop. (granted this
> is a rare scenario)
As pointed out by various people the last time this nonsense-thread
passed through here: You can get statistics about past memory usage
from a kernel in various, OS-dependent ways (eg /proc/meminfo on
Linux).
> 2. If the program runs on a dedicated server/work-station and is
> designed to perform a specific task, like sort massive amounts of
> data.
>
> To expand on number 2, suppose you need to periodically merge-sort
> files that exceed 1 terabyte in size. In order to do this, you'd want
> the computer to use *ALL* the memory it can.
Contact IBM, what you want is a mainframe running only a single batch
processing task. Alternatively, assuming somewhat more realistic data
set sizes, you could write (assuming a PC) a protected-mode
application that runs without any OS.
A third idea would be that you try to get concepts like
multiprogramming and general purpose operating system into your head
and stop asking silly questions.
Really.
| |
| Logan Shaw 2007-06-17, 1:19 pm |
| chsalvia@gmail.com wrote:
> Given a dedicated sorting machine, for example, you could get the
> available physical memory and then allocate say 75% of that value just
> to be safe. Any other background processes still have 25% of the
> total memory available to work with. Granted there could be cases
> where, for example, someone else logs on and runs 10 duplicate
> processes or something, but this is unlikely. The point is, you
> probably don't want unnecessary processes running on a dedicated
> machine anyway.
It is quite possible to have a dedicated machine. Since it's fair
game to size the machine's hardware for the workload, it seems
like it should also be fair game to size the program's memory
usage to the hardware.
> However, the point you make about cached data is interesting.
> Ideally, the cache should be dropped so that sysconf() returns a
> number which better approximates the actual physical memory
> available. Incidentally, do you know how to contact the kernel
> directly in C to request that cached pages be dropped?
I wouldn't do it that way. As others have pointed out, since this
is something which is not needed often, there isn't good support
for it, and there certainly isn't portable support for it. And
even if there were portable support for it, automatic performance
tuning requires you not only to have access to measurable quantities
but also to have a useful model of the system's behavior (how it
uses memory, etc.) to be able to choose the right values for your
performance parameters based on the measured characteristics of
the system. Sometimes it might be as simple as "measure quantity X,
multiply by 0.75, and then set tunable Y to that value", but is it
really that simple? And the "multiply by 0.75" part is a formula
based on a model of the system's behavior, but does that model apply
to all the target platforms?
For these types of reasons, these types of parameters are usually
left up to the end user to decide manually. If you have a dedicated
server, you've already gone to the trouble of setting up the machine,
and determining a few performance parameters manually is not a big
burden.
Additionally, my own observation is that programmers are not often
as aware as they think they are of how their software will be used.
When I was a system admin, I often encountered software where the
installer or the software itself made completely unrealistic
assumptions about the environment it would be running in. This
happens more often when the software is written by clueless
people, but it still happens sometimes when it's written by people
who are very good programmers. It's just hard to be fully in touch
with how things work in some other environment you've never seen.
Disconnects between expected usage and actual usage are usually
solved iteratively, which is a testimony to the fact that they
are non-obvious and/or outside the view of the programmer. The
point is that it is often best to err on the side of giving the
end user some control. Or to put it another way, only make your
program automatically solve a problem if you're sure you can really
do it right, and don't fall into the common trap of overestimating
your ability to do that.
- Logan
| |
| Rainer Weikusat 2007-06-17, 1:19 pm |
| Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: individual.net vvR0P2rkwZ5xtKckaNmQ1wRUcqLYvgKDL2dsWJh2
Nkj6h0awA=
User-Agent: Gnus/5.1008 (Gnus v5.10.8) Emacs/21.4 (gnu/linux)
Cancel-Lock: sha1:KZdXbk/wfSQOQqunV0POigBUCC4=
Bytes: 2410
Xref: number1.nntp.dca.giganews.com comp.unix.programmer:178559
Logan Shaw <lshaw-usenet@austin.rr.com> writes:
> chsalvia@gmail.com wrote:
[...]
>
> I wouldn't do it that way. As others have pointed out, since this
> is something which is not needed often, there isn't good support
> for it, and there certainly isn't portable support for it.
It's nonsense. The kernel does not prefer cached disk blocks over the
needs of running applications. That there is a sheer unlimited supply
of people who just refuse to believe in this and therefore want to
tank the in-kernel cache for the perceived benefit of a particular
application or a small set of applications does not change this.
And even if this wasn't this way: The place where ressource allocation
descisions affecting the system as a whole are made is in the kernel,
because that is the only place in the system where all of the required
information is available. If some kernel performs spectacularly bad in
this respect for a particular workload, that would be a kernel bug in
need of fixing, not something for n independently developed
applications trying to shoot it out on their own.
| |
| Logan Shaw 2007-06-17, 1:19 pm |
| Rainer Weikusat wrote:
> Logan Shaw <lshaw-usenet@austin.rr.com> writes:
>
> [...]
>
[vbcol=seagreen]
[vbcol=seagreen]
> It's nonsense. The kernel does not prefer cached disk blocks over the
> needs of running applications. That there is a sheer unlimited supply
> of people who just refuse to believe in this and therefore want to
> tank the in-kernel cache for the perceived benefit of a particular
> application or a small set of applications does not change this.
I think you are not seeing the idea behind dropping the cache. As
near as I can tell, it's not motivated by a distrust of the ability
of the kernel to manage allocated pages vs. disk cache pages; instead,
the goal is only to do it once right before calling sysconf() so that
sysconf() returns a different value that gives you better information
that can be used to better predict much RAM you can allocate without
causing unnecessary paging.
I think that may be a technique that would work, although I'm not
convinced it's a good engineering decision.
- Logan
| |
| Rainer Weikusat 2007-06-18, 1:25 pm |
| Logan Shaw <lshaw-usenet@austin.rr.com> writes:
> Rainer Weikusat wrote:
>
>
>
> I think you are not seeing the idea behind dropping the cache. As
> near as I can tell, it's not motivated by a distrust of the ability
> of the kernel to manage allocated pages vs. disk cache pages; instead,
> the goal is only to do it once right before calling sysconf() so that
> sysconf() returns a different value that gives you better information
> that can be used to better predict much RAM you can allocate without
> causing unnecessary paging.
An application running on a virtual memory operating system cannot
allocate any RAM. Additionally, it cannot determine how much RAM the
kernel will have available the next microsecond and it cannot
determine what this indeterminate amount of RAM will be used
for.
There is no idea behind 'dropping the cache on application
request'. It is (fortunately) impossible and would be useless if it
wasn't no matter what.
| |
| David Schwartz 2007-06-18, 7:19 pm |
| On Jun 18, 6:12 am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
> There is no idea behind 'dropping the cache on application
> request'. It is (fortunately) impossible and would be useless if it
> wasn't no matter what.
I agree that it is useless for two reasons:
1) The kernel can drop the caches selectively and intelligently as
needed anyway. So that nice looking large amount of free memory was
really just as free (or badly needed) as it was before you dropped the
caches. Certainly at least some of the data you will drop will need to
fault back in almost immediately. The data that isn't needed would be
dropped as soon as the cost of keeping it in memory exceeded the
benefit.
2) There is no portable way to drop the caches, so any code that drops
the caches will be platform-specific. You can just as easily get
statistics on the caches as drop them, and you'll do much less harm to
the running system. So if it was for measurement purposes, just
measure them, don't drop them.
DS
| |
| Logan Shaw 2007-06-19, 1:18 am |
| Rainer Weikusat wrote:
> Logan Shaw <lshaw-usenet@austin.rr.com> writes:
[ flushing disk cache pages]
>
> An application running on a virtual memory operating system cannot
> allocate any RAM.
Sure it can. Allocate some virtual memory, then call mlock(). Presto,
you have (indirectly) allocated RAM. Right?
Also, please note that I'm not saying it's a good idea. I'm just trying
to explain what the idea was.
> Additionally, it cannot determine how much RAM the
> kernel will have available the next microsecond
That part I agree with. It would be a thoroughly unreliable approach
with no guarantees that it would work. About the only thing you can
say is that the number you would arrive at would probably have a
positive statistical correlation with the amount of easily-freeable
physical memory.
- Logan
| |
| Rainer Weikusat 2007-06-19, 7:22 am |
| Logan Shaw <lshaw-usenet@austin.rr.com> writes:
> Rainer Weikusat wrote:
>
> [ flushing disk cache pages]
>
> Sure it can. Allocate some virtual memory, then call mlock(). Presto,
> you have (indirectly) allocated RAM. Right?
I forgot about that one. But strictly, no. The application has
requested that the kernel does not consider the physical memory
backing a particular part of the address space of the application now
to be reclaimable for a different purpose by writing the contents
to some paging area and re-using the associated RAM for something
different.
> Also, please note that I'm not saying it's a good idea. I'm just trying
> to explain what the idea was.
My standpoint at this general level is 'down that path lies madness,
so DO NOT walk this way'.
|
|
|
|
|