Unix Programming - Puzzled by mmap behavior

This is Interesting: Free IT Magazines  
Home > Archive > Unix Programming > January 2005 > Puzzled by mmap behavior





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Puzzled by mmap behavior
neo_in_matrix@msn.com

2005-01-22, 5:49 pm

I could not find a Linux programming group to post to (I am new to the
*nix world). I think Linux share much with Unix, so I post my question
here. If anyone can direct me to, great thanks to him/her.

I am using Fedora Core 3, which is a distro from redhat.com.

I come from Windows background. In Windows world, for the moment a file
is memory mapped, it cannot be deleted.

However, I don't see this behavior on Linux. See my program below:

void test_mmap()
{
int fd;
int page_size;
char* p;

fd = open("/home/neo/test", O_RDWR | O_NONBLOCK);
if(fd < 0)
{
puts("Unabled to open target file");
return;
}

page_size = getpagesize();
printf("page size is %d\n", (int)page_size);
p = (char*)mmap(0, page_size,
PROT_READ | PROT_WRITE, MAP_SHARED,
fd, 0);
if(p)
{
puts("Dumping first 100 bytes:");
for(int i = 0; i < 100; i++)
{
printf("%02x",(int)p[i]);
}

puts("\nPress ENTER to pad garbage data...");
getchar();
// Zero out the bytes
puts("\nPadding with garbage data...");
for(int i = 0; i < page_size; i++)
{
p[i] = i;
}
munmap(p, page_size);
}
else
{
puts("mmap failed");
}
close(fd);

puts("All done!");
}

After the pause on getchar() line, I can actually delete the file by rm
~/test. This is a surprise to me. But what surpises me *more* is that
even the file is deleted, the follow padding operation still succeeds
without an error (I would expect segfault error). So, what the hell is
the data written to?

Can anyone explain this to me? Or direct to any helpful resources.
Thanks.

Paul Pluzhnikov

2005-01-22, 5:49 pm

neo_in_matrix@msn.com writes:

> I could not find a Linux programming group to post to


Next time, try comp.os.linux.development.apps

> However, I don't see this behavior on Linux.


You would not see this behaviour in any UNIX.
It's one of the Windows mis-features.

From "man 2 unlink":

If the name was the last link to a file but any processes still
have the file open the file will remain in existence until the
last file descriptor referring to it is closed.

> So, what the hell is the data written to?


To the disk blocks that belonged to the file before it was removed.
The blocks are cleaned up by the kernel once your program exits.

Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
neo_in_matrix@msn.com

2005-01-22, 5:49 pm

Thanks for replying. It sorts some mass out of my mind.

Ulrich Eckhardt

2005-01-22, 5:49 pm

neo_in_matrix@msn.com wrote:
> I could not find a Linux programming group to post to (I am new to the
> *nix world).


Seems rather that you are new to the search function of your newsclient...

> I come from Windows background. In Windows world, for the moment a file
> is memory mapped, it cannot be deleted.


Almost. In fact it can't even be (easily) deleted if it is just open. The
main reason for that is the different filesystems. On Unix and similar
systems, you have a file (i.e. an area of hard-disk space) on the one hand
and you have directory entries referencing that storage. There can be
several directory entries referencing the same storage (hardlink).
However, the storage itself exists independently from any entry in a
directory!
Now, if you open a file, the filesystem driver remembers that there is an
additional reference to the storage. It needs that, because usually when
the last hardlink to the storage is gone, it is marked as free and reused.
So, when deleting the entry in the directory, it knows the storage is
still used and defers its deletion until the file is closed.

On MS Windows/FAT, there is no such separation between the file and its
storage. I know later versions can be forced to delete a file although it
is in use, but I'd rather consider that a hack. BTW, this is also the
reason you need to reboot after upgrading software: the installer couldn't
replace a library (because it is mapped into a process) so used a facility
that replaces it at the next reboot when it isn't in use.

This behaviour is not specific to Linux, I believe all Unix have this.
[example code]

> After the pause on getchar() line, I can actually delete the file by rm
> ~/test. This is a surprise to me. But what surpises me *more* is that
> even the file is deleted, the follow padding operation still succeeds
> without an error (I would expect segfault error). So, what the hell is
> the data written to?


You still had a valid handle to the storage, all operations on it are as
valid as before, but I don't think this surprises you anymore. ;)

Uli

--
http://www.erlenstar.demon.co.uk/unix/
Måns Rullgård

2005-01-22, 5:49 pm

neo_in_matrix@msn.com writes:

> I could not find a Linux programming group to post to (I am new to the
> *nix world). I think Linux share much with Unix, so I post my question
> here. If anyone can direct me to, great thanks to him/her.


The Linux specific groups are called comp.os.linux.*. Your question
is equally relevant here, though.

> I am using Fedora Core 3, which is a distro from redhat.com.
>
> I come from Windows background. In Windows world, for the moment a file
> is memory mapped, it cannot be deleted.
>
> However, I don't see this behavior on Linux. See my program below:


[...]

> After the pause on getchar() line, I can actually delete the file by rm
> ~/test. This is a surprise to me. But what surpises me *more* is that
> even the file is deleted, the follow padding operation still succeeds
> without an error (I would expect segfault error). So, what the hell is
> the data written to?


In typical Unix filesystems, a file doesn't belong to any specific
directory. It simply exists as an inode with a unique number. Any
directory can contain a reference to any inode, so the same inode can
be accessible by several different names (see the "ln" command).

The OS keeps a reference count for each inode. A normal file, not
opened by any process, has a count of 1. Each time a file is opened,
its reference count is increased by 1, and decreased when the file is
closed. Deleting a file decreases the reference count. If the
reference count reaches 0, the file is deleted.

When you deleted the file using the "rm" command, you only deleted the
reference to it being stored in the "test" entry of your home
directory. However, your process was still holding a reference to the
inode, so the physical inode was not removed from the disk. What you
got was an unnamed file, only accessible through your open file
descriptor. As soon as you closed that file descriptor, the inode was
physically deleted from the filesystem.

--
Måns Rullgård
mru@inprovide.com
Rich Gibbs

2005-01-23, 2:48 am

neo_in_matrix@msn.com said the following, on 01/22/05 15:54:
> I could not find a Linux programming group to post to (I am new to the
> *nix world). I think Linux share much with Unix, so I post my question
> here. If anyone can direct me to, great thanks to him/her.
>
> I am using Fedora Core 3, which is a distro from redhat.com.
>
> I come from Windows background. In Windows world, for the moment a file
> is memory mapped, it cannot be deleted.
>
> However, I don't see this behavior on Linux. See my program below:

[program snipped]
> After the pause on getchar() line, I can actually delete the file by rm
> ~/test. This is a surprise to me. But what surpises me *more* is that
> even the file is deleted, the follow padding operation still succeeds
> without an error (I would expect segfault error). So, what the hell is
> the data written to?
>
> Can anyone explain this to me? Or direct to any helpful resources.
> Thanks.
>


This is a consequence of how files "work" in most native Unix/Linux
filesystems. A file exists as a collection of disk blocks pointed to by
a structure called an 'inode'. (Directories contain filenames with
pointers to inodes.) For every file/inode, the kernel maintains a
reference count. Conceptually, the count is incremented by 1 when a
hard link (directory entry) is made to the file, or when the file is
opened by a process. The count is decremented when a link is removed,
or when the file is closed by a process.

A file is only removed when the reference count goes to zero, which
obviously will not happen as long as any process has the file open.

You will sometimes see Unix/Linux applications that create and open a
temporary file, then immediately remove it (or, actually, remove its
directory entry). This works because the file will still be there as
long as the process has it open; it will be deleted automagically if the
process crashes or is killed.

--

--
Rich Gibbs
rgibbs@alumni.princeton.edu
Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com