|
Home > Archive > Unix Programming > April 2006 > Whats the practical maximum file size using indexed allocation (I nodes)
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Whats the practical maximum file size using indexed allocation (I nodes)
|
|
| Olumide 2006-02-23, 2:54 am |
| Hi -
'Been reading a few texts - Operating System Concepts [5ed] by
Siberschatz and Galvin, and Operating Systems Concepts: a mordern
perspective [2ed] by Gary Nutt, and according to the later text (page
427), "current versions of BSD UNIX do not use the triple indirect
pointer ... partly because the 32-bit addresses used in the file system
precludes file sizes larger than 2Gb".
The former text shares the same/similar view and states (on page 380)
that: "the number of blocks that can be allocated to a file exceeds the
amount of space addressable by the 4-byte file pointers ..."
Does this mean that the theoretical maximum file size of approx 16Gb
(assuming 1kb disk blocks) cannot be achived on a 32-bit system? ...
I'm trying to get my mind round this, and this is is what I've come up
with so far:
First of all, I dont see the read or write system calls failing since
they return the amount of bytes read or written per call. However,
lseek would be a problem because lseek returnes the position of the
read/write pointer of the file descriptor - the maximum size of which
is off_t (dunno what this is long int?).
Anyway, because the read and write system calls also use this
"pointer", the size of off_t determined the practical maximum
file size on the system in question.
This ismy reasoning. Does it make sense? Is there any other reason why
the theoretical maximum file size is unobtainable?
Thanks,
- Olumide
| |
| Alexei A. Frounze 2006-02-23, 2:54 am |
| "Olumide" <50295@web.de> wrote in message
news:1140667236.316954.269240@f14g2000cwb.googlegroups.com...
> Hi -
>
> 'Been reading a few texts - Operating System Concepts [5ed] by
> Siberschatz and Galvin, and Operating Systems Concepts: a mordern
> perspective [2ed] by Gary Nutt, and according to the later text (page
> 427), "current versions of BSD UNIX do not use the triple indirect
> pointer ... partly because the 32-bit addresses used in the file system
> precludes file sizes larger than 2Gb".
>
> The former text shares the same/similar view and states (on page 380)
> that: "the number of blocks that can be allocated to a file exceeds the
> amount of space addressable by the 4-byte file pointers ..."
>
> Does this mean that the theoretical maximum file size of approx 16Gb
> (assuming 1kb disk blocks) cannot be achived on a 32-bit system? ...
> I'm trying to get my mind round this, and this is is what I've come up
> with so far:
>
> First of all, I dont see the read or write system calls failing since
> they return the amount of bytes read or written per call. However,
> lseek would be a problem because lseek returnes the position of the
> read/write pointer of the file descriptor - the maximum size of which
> is off_t (dunno what this is long int?).
>
> Anyway, because the read and write system calls also use this
> "pointer", the size of off_t determined the practical maximum
> file size on the system in question.
>
> This ismy reasoning. Does it make sense? Is there any other reason why
> the theoretical maximum file size is unobtainable?
I think the reason could be different. Files can be mapped to memory --
that's very handy and modern OSes usually support that and benefit from such
a feature themselves. But files, whose size exceeds the accessible address
space size can't be mapped in whole. That could be the reason why. At least,
this reason is more reasonable than sizeof(int)...
Alex
| |
| Gordon Burditt 2006-02-26, 10:15 am |
| >'Been reading a few texts - Operating System Concepts [5ed] by
>Siberschatz and Galvin, and Operating Systems Concepts: a mordern
>perspective [2ed] by Gary Nutt, and according to the later text (page
>427), "current versions of BSD UNIX do not use the triple indirect
>pointer ... partly because the 32-bit addresses used in the file system
>precludes file sizes larger than 2Gb".
FreeBSD has used 64-bit file offsets for quite some time. Whether
or not it actually uses triple indirect pointers, I don't know, but
you can get some *very* big files that won't be handled by only
double-indirect. And if you're willing to deal with files that
have unallocated holes in them, you can have files big enough to
need a triple-indirect block that fit *on a floppy*.
>The former text shares the same/similar view and states (on page 380)
>that: "the number of blocks that can be allocated to a file exceeds the
>amount of space addressable by the 4-byte file pointers ..."
>Does this mean that the theoretical maximum file size of approx 16Gb
>(assuming 1kb disk blocks) cannot be achived on a 32-bit system? ...
What's a "32-bit system"? A 32-bit int, or a 32-bit long, does not
imply a 32-bit off_t. FreeBSD running on an ia32 (Pentium) processor
is generally considered 32-bit, but that's not what it uses for the
file system. MS-DOS ran on a 16-bit system (8086 processor) but
it was never limited to a maximum file size of 64K.
>I'm trying to get my mind round this, and this is is what I've come up
>with so far:
>
>First of all, I dont see the read or write system calls failing since
>they return the amount of bytes read or written per call. However,
>lseek would be a problem because lseek returnes the position of the
>read/write pointer of the file descriptor - the maximum size of which
>is off_t (dunno what this is long int?).
off_t (64 bits) is bigger than a long int (32 bits) on FreeBSD on a ia32.
Also remember that for lseek, off_t needs to be *signed*.
>Anyway, because the read and write system calls also use this
>"pointer", the size of off_t determined the practical maximum
>file size on the system in question.
>This ismy reasoning. Does it make sense? Is there any other reason why
>the theoretical maximum file size is unobtainable?
Yes. There may be insufficient addressing in the hardware devices,
drivers, or controllers. For example, a decade or two ago there
were problems with hard disks having more than 1024 cylinders because
the controller hardware didn't have enough bits in the registers.
The number of bytes you can put in a SCSI command can be a limitation.
I wrote this silly program on FreeBSD 6.0, and ran it. It seeks
1T into the file, writes one byte, seeks another 1T into the file,
and writes one byte, repeat until it fails. It failed with
write: file too large
% ls -lsh /tmp/big
6160 -rw-rw-r-- 1 root wheel 128T Feb 23 13:23 /tmp/big
% ls -ls /tmp/big
6160 -rw-rw-r-- 1 root wheel 140737488355456 Feb 23 13:23 /tmp/big
%
Let's see, this file takes up 6160K of actual disk space. The
filesystem block size is 16K. There were 128 writes of 1 byte, but
each write occupies one data, one single-indirect, and one
double-indirect (filesystem) block. That leaves 16K left over for
a triple-indirect block, which is exactly what was used.
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
int
main(void)
{
int fd;
off_t offset;
int ret;
offset = 1024; /* 1K */
offset *= 1024; /* 1M */
offset *= 1024; /* 1G */
offset *= 1024; /* 1T */
fd = open("big", O_WRONLY | O_CREAT, 0666);
lseek(fd, offset, 0);
while (1)
{
ret = write(fd, "A", 1);
if (ret < 0)
{
perror("write");
break;
}
lseek(fd, offset, 1);
}
close(fd);
exit(0);
}
Gordon L. Burditt
| |
| Olumide 2006-02-26, 10:16 am |
|
Gordon Burditt wrote:
> FreeBSD has used 64-bit file offsets for quite some time. Whether
> or not it actually uses triple indirect pointers,
> ...
> off_t (64 bits) is bigger than a long int (32 bits) on FreeBSD on a ia32.
> Also remember that for lseek, off_t needs to be *signed*.
>
>
>
> Yes. There may be insufficient addressing in the hardware devices,
> drivers, or controllers.
Thanks for your reply Gordon. Just to confirm, are you saying that the
practical maximum size of a file is determined by the size of the read
write pointer i.e. off_t?
One other issue: According to the man page, "lseek returns the
resulting offset location as measured in bytes from the beginning of
the file." If this is trus, then I suspect that off_t need not be
signed.
- Olumide
| |
| Måns Rullgård 2006-02-26, 10:16 am |
| "Olumide" <50295@web.de> writes:
> One other issue: According to the man page, "lseek returns the
> resulting offset location as measured in bytes from the beginning of
> the file." If this is trus, then I suspect that off_t need not be
> signed.
Hint: SEEK_CUR
--
Måns Rullgård
mru@inprovide.com
| |
| Gordon Burditt 2006-02-26, 10:16 am |
| >Thanks for your reply Gordon. Just to confirm, are you saying that the
>practical maximum size of a file is determined by the size of the read
>write pointer i.e. off_t?
Among other possible limits. There are others (in combination)
which could make it SMALLER, such as:
The number of bits used in a block number, combined with block size.
(FreeBSD seems to be using AT LEAST 33 bits for block size)
The maximum size storage device available.
The number of bits available in a SCSI command for the block number.
etc.
>One other issue: According to the man page, "lseek returns the
>resulting offset location as measured in bytes from the beginning of
>the file." If this is trus, then I suspect that off_t need not be
>signed.
Read the part about the second argument of lseek(), in conjunction
with a third argument of SEEK_CUR or SEEK_END. The offset is being
used as a signed number in that situation.
Gordon L. Burditt
| |
| Maxim S. Shatskih 2006-02-26, 10:16 am |
| > Thanks for your reply Gordon. Just to confirm, are you saying that the
> practical maximum size of a file is determined by the size of the read
> write pointer i.e. off_t?
Correct. The older UNIXen had the 4GB limit only due to using 32bit types for
off_t, and 32bit type for a "file size" field in the on-disk metadata. Nothing
more.
--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
maxim@storagecraft.com
http://www.storagecraft.com
| |
| Jordan Abel 2006-02-26, 10:16 am |
| On 2006-02-24, Gordon Burditt <gordonb.l20vx@burditt.org> wrote:
>
> Where is that supposed to be defined? I don't see it in
> any source file for FreeBSD (other than as part of some
> symbols of the form *_TOO_LARGE, mostly in openssl ).
>
>
> No such manual page.
>
> Where can I buy storage, cheap, that needs more than 64 bits for
> the length of the file? And what do I need it for? Archiving
> the entire contents of Google hourly?
I bet he uses Linux. Linux has traditionally maintained two separate and
parallel APIs, one in which off_t is a 'long' (32 on a 32-bit system),
the other in which it's 64 bits. llseek is one of the underlying system
calls for the latter mode, and it is also sometimes misused (where
lseek64 _should_ be used) to be able to access files in "large file"
mode in the former. O_LARGEFILE (misspelled above as O_LARGE) is a bit
used internally by open64() which is also sometimes misused in the same
way.
You can apparently #define _FILE_OFFSET_BITS 64 at the top of your
source file to get all the off64_t crap to be transparently used, but
avoid use of any libraries [other than glibc itself, which magically
knows the difference, or, more likely, has _all_ functions that would
use an off_t replaced with alternate versions] that do anything
involving off_t if you do that. There's probably a way around this, but
it gets progressively less sane.
llseek itself takes two longs, high-order first - which I suppose are
probably converted from a long long or "loff_t" in the userspace
version.
I assume this was done because they didn't know better at first, and
wanted to maintain binary compatibility later.
In conclusion, llseek and O_LARGEFILE are for idiots who don't know how
to _really_ use the largefile interface.
| |
| Olumide 2006-02-26, 10:16 am |
|
Maxim S. Shatskih wrote:
>
> Correct. The older UNIXen had the 4GB limit only due to using 32bit types for
> off_t, and 32bit type for a "file size" field in the on-disk metadata. Nothing
> more.
Erm ... don't you mean 2GB as off_t is ... erm .. signed?
| |
| Olumide 2006-02-26, 10:16 am |
| Thanks everyone!
Whiel we're on the subject of file systems, I would like to ask about
hard links.
(1) I know they can only refer to data on the same
volume/filestore/partition. The question is why? My reasoning (and I
may have read this somewhere a long time ago) is that because each
volume/filestore/partition has a list of inode numbered from 0 or 1 to
whatever, and linking a target name merely creates a new directory
entry that points to the same inode (number 821 for example) as the
source, and because it is possible to have more than 1
volume/filestore/partition avaliable, restricting hard links to inodes
on the the same volume/filestore/partition as the source is the only
option, right? (After all, each volume/filestore/partition has its own
inode 821.)
(2) Why cant direc tories be hard-linked to? After all, this is what
the OS does when it automatically creates the entries "." and ".."
- Olumide
| |
| Gordon Burditt 2006-02-26, 10:16 am |
| >Whiel we're on the subject of file systems, I would like to ask about
>hard links.
>
>(1) I know they can only refer to data on the same
>volume/filestore/partition. The question is why? My reasoning (and I
How do you refer to data on another filesystem, assuming that there
was such a field present along with the inode number (dev_t, perhaps?
The pair (dev_t, inode) is supposed to be unique in the system.
Not sure what NFS does for the dev_t of a mounted volume, though.)
You have a problem here that filesystems (both the referring and
referred-to filesystems) aren't always mounted in the same place
(or even the same system), and that filesystems are sometimes on
removable media.
Note also that sometimes if you plug in another disk drive, some
of the others get renumbered.
>may have read this somewhere a long time ago) is that because each
>volume/filestore/partition has a list of inode numbered from 0 or 1 to
>whatever, and linking a target name merely creates a new directory
>entry that points to the same inode (number 821 for example) as the
>source, and because it is possible to have more than 1
>volume/filestore/partition avaliable, restricting hard links to inodes
>on the the same volume/filestore/partition as the source is the only
>option, right? (After all, each volume/filestore/partition has its own
>inode 821.)
It's NOT the only option, but how do you reasonably refer to another
volume when you don't know if or where it's mounted?
>(2) Why cant direc tories be hard-linked to? After all, this is what
>the OS does when it automatically creates the entries "." and ".."
In UNIX V7, you could. mkdir() as a system call didn't exist.
And you could make an awful mess where, for example:
/
/.
/./.
/..
/../.
/./..
referred to 6 different directories and the last 5 were reachable
*ONLY* by the paths given. Other than neat ways for viruses
to hide stuff, I don't see what use hard-linking to directories has.
And where .. in such a directory is supposed to point is problematical.
Gordon L. Burditt
| |
| Brian Raiter 2006-02-26, 10:16 am |
| > (2) Why can't directories be hard-linked to? After all, this is what
> the OS does when it automatically creates the entries "." and ".."
They can, but you need root access. Basically, your average Unix
system back then was not equipped to handle arbitrary loops in the
directory hierarchy intelligently. Special code handles "..", but
other circular structures could lead to infinite loops. So, the idea
was that only the superuser should be trusted not to introduce such
structures haphazardly.
b
| |
| Maxim S. Shatskih 2006-02-26, 10:16 am |
| > option, right? (After all, each volume/filestore/partition has its own
> inode 821.)
Correct, and so, allowing cross-volume hardlinks would require storing the
volume name in dirent together with the inode number.
> (2) Why cant direc tories be hard-linked to? After all, this is what
> the OS does when it automatically creates the entries "." and ".."
Because this pollutes the notion of the "parent directory" - what directory is
parent - link1 or link2?
This also can break any tool which does directory recursion.
--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
maxim@storagecraft.com
http://www.storagecraft.com
| |
| Olumide 2006-02-26, 10:16 am |
|
Gordon Burditt wrote:
>
> In UNIX V7, you could. mkdir() as a system call didn't exist.
> And you could make an awful mess where, for example:
> /
> /.
> /./.
> /..
> /../.
> /./..
> referred to 6 different directories and the last 5 were reachable
> *ONLY* by the paths given. Other than neat ways for viruses
> to hide stuff, I don't see what use hard-linking to directories has.
> And where .. in such a directory is supposed to point is problematical.
I'm sorry but I dont't get the point you're trying to pass across.
| |
| Gordon Burditt 2006-02-26, 10:16 am |
| >> >(2) Why cant direc tories be hard-linked to? After all, this is what
>
>I'm sorry but I dont't get the point you're trying to pass across.
>
Hard linking directories can make an AWFUL mess of a filesystem
and serves no useful purpose. That's a good reason not
to allow it.
Gordon L. Burditt
| |
| Olumide 2006-02-26, 10:16 am |
| Gordon Burditt wrote:
>
> Hard linking directories can make an AWFUL mess of a filesystem
> and serves no useful purpose. That's a good reason not
> to allow it.
I suppose soft links to directories are less messy then? Right?
(scratches head)
| |
| Maxim S. Shatskih 2006-02-26, 10:16 am |
| > I suppose soft links to directories are less messy then? Right?
Yes, the recurser tools just ignore softlinks. Also softlinks have no problems
going cross-volume.
--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
maxim@storagecraft.com
http://www.storagecraft.com
| |
|
| Olumide wrote:
> Gordon Burditt wrote:
>
>
> I suppose soft links to directories are less messy then? Right?
> (scratches head)
It is conceptually different, although the effect may be the same (if
the target exists). Two hardlinked entities in a filesystem are
indistinguishable, while symlinks have a directedness. Analogies
include duplicate A records versus CNAMEs in DNS, or HTTP Redirect
versus ServerAliases... Each mechanism has its distinct purposes.
| |
| Olumide 2006-02-26, 5:51 pm |
|
Olumide wrote:
> While we're on the subject of file systems, I would like to ask about
> hard links.
>
> (1) I know they can only refer to data on the same
> volume/filestore/partition. ...
Just to confirm, is this correct. Are *NIX directory entries limited to
inodes on the same volume OR partition OR filestore? (I'm not too sure
about the partition bit. I guess directory entries will be limited, if
each partition numbered its inodes from 0 ...)
| |
| Jordan Abel 2006-02-26, 5:51 pm |
| On 2006-02-26, Logan Shaw <lshaw-usenet@austin.rr.com> wrote:
> Jordan Abel wrote:
>
> Just a data point: on Solaris 8 (and presumably later versions as well
> since Sun is anal-retentive about keeping interface compatibility at
> both the binary and source levels[1]), the documentation says that
> calling open() with O_LARGEFILE is equivalent to calling open64(),
> which to me indicates that using O_LARGEFILE with open() is kosher.
> Solaris also maintains a parallel set of APIs, so that 32-bit
> applications can use either 32-bit or 64-bit file offsets, so Linux
> isn't unique in that regard. (I don't even think Linux was first,
> but I can't remember.)
>
> - Logan
>
> [1] which, by the way, is a good thing in many cases
I won't disagree that binary compatibility is good - but off_t should
have been 64-bit to start with. On FreeBSD, off_t has NEVER been less
than 64 bits.
| |
| John S. Dyson 2006-02-27, 2:48 am |
| In article <slrne042du.1qfk.random832@random.yi.org>,
Jordan Abel <random832@gmail.com> writes:
> On 2006-02-26, Logan Shaw <lshaw-usenet@austin.rr.com> wrote:
>
> I won't disagree that binary compatibility is good - but off_t should
> have been 64-bit to start with. On FreeBSD, off_t has NEVER been less
> than 64 bits.
>
Actually, the API for FreeBSD V1.X had 32 bit lseek arguments. FreeBSD V2.X
had the proper (64bit) offset api, but wasn't fully implemented until about
V2.2.X... (I know, I wrote a lot of the lower level infrastructure.)
John
| |
| Jordan Abel 2006-02-27, 2:48 am |
| On 2006-02-27, John S. Dyson <toor@iquest.net> wrote:
> In article <slrne042du.1qfk.random832@random.yi.org>,
> Jordan Abel <random832@gmail.com> writes:
> Actually, the API for FreeBSD V1.X had 32 bit lseek arguments. FreeBSD V2.X
> had the proper (64bit) offset api, but wasn't fully implemented until about
> V2.2.X... (I know, I wrote a lot of the lower level infrastructure.)
How far back does the cvsweb go, in terms of what version of freebsd? I
traced off_t through all the headers it was in, and it was never
typedef'd to anything other than long long or int64_t.
| |
| Maxim S. Shatskih 2006-02-27, 8:48 pm |
| > Just to confirm, is this correct. Are *NIX directory entries limited to
> inodes on the same volume
Yes.
--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
maxim@storagecraft.com
http://www.storagecraft.com
| |
| Maxim S. Shatskih 2006-03-03, 6:42 pm |
| > And BTW, in the Unix world at least, the terms "volume", "partition",
> "drive", and "filesystem" refer to potentially divergent things.
Isn't "volume" and "filesystem" synonims? Yes, "drive" and "partition" are
another things, but "volume" and "filesystem"?
--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
maxim@storagecraft.com
http://www.storagecraft.com
| |
| Brian Inglis 2006-03-03, 6:42 pm |
| On Wed, 1 Mar 2006 23:47:02 +0300 in comp.unix.internals, "Maxim S.
Shatskih" <maxim@storagecraft.com> wrote:
>
>Isn't "volume" and "filesystem" synonims? Yes, "drive" and "partition" are
>another things, but "volume" and "filesystem"?
Used to be a volume was the media in a drive; a partition was a
subdivision of a volume, and a filesystem could be written in a
partition.
Now, a logical volume can span multiple partitions in various ways,
and a filesystem can be written in a logical volume.
--
Thanks. Take care, Brian Inglis Calgary, Alberta, Canada
Brian.Inglis@CSi.com (Brian[dot]Inglis{at}SystematicSW[dot]ab[dot]ca)
fake address use address above to reply
| |
| Valentin Nechayev 2006-04-30, 1:25 am |
| Mon, Feb 27, 2006 at 05:11:43, random832 (Jordan Abel) wrote about "Whats the practical maximum file size using indexed allocation (I nodes)":
[vbcol=seagreen]
JA> How far back does the cvsweb go, in terms of what version of freebsd? I
JA> traced off_t through all the headers it was in, and it was never
JA> typedef'd to anything other than long long or int64_t.
Current FreeBSD CVS repository doesn't contain code for versions
before 2.0.0 due to licensing reasons (BSD<->AT&T suit; FreeBSD 1.*
was built on Net/2, while FreeBSD 2.0 was built on Lite1).
-netch-
|
|
|
|
|