|
Home > Archive > Unix administration > December 2004 > Find command
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
|
| Dear All,
I have a directory containing more than 300.000 Files;
I am searching files in this directory, but find command take too
long, is there another command than can give me the files more
quickly.
Why find command take too long
Cheers
| |
| Chris F.A. Johnson 2004-12-03, 6:11 pm |
| On 2004-12-03, mika wrote:
> Dear All,
>
> I have a directory containing more than 300.000 Files;
> I am searching files in this directory, but find command take too
> long, is there another command than can give me the files more
> quickly.
If the files are all in the same directory, you don't need find.
> Why find command take too long
Show us the command you are using, and perhaps we can help.
--
Chris F.A. Johnson http://cfaj.freeshell.org/shell
========================================
===========================
My code (if any) in this post is copyright 2004, Chris F.A. Johnson
and may be copied under the terms of the GNU General Public License
| |
| Michael Tosch 2004-12-04, 7:47 am |
| In article <5573513d.0412031238.3be77f9d@posting.google.com>, akim_ziadi@hotmail.com (mika) writes:
> Dear All,
>
> I have a directory containing more than 300.000 Files;
> I am searching files in this directory, but find command take too
> long, is there another command than can give me the files more
> quickly.
> Why find command take too long
>
> Cheers
Maybe the following is not the main problem here,
but I want to share my observations with Solaris find.
Doing a
truss -t stat,lstat find . >/dev/null
the native Solaris find does
lstat64(".", 0xFFBEE108) = 0
lstat64("file2", 0xFFBEDFE0) = 0
lstat64("file1", 0xFFBEDFE0) = 0
lstat64("dir", 0xFFBEDFE0) = 0
lstat64("file4", 0xFFBEDEB8) = 0
lstat64("file3", 0xFFBEDEB8) = 0
lstat64("dir2", 0xFFBEDEB8) = 0
lstat64("file5", 0xFFBEDD90) = 0
lstat64("file6", 0xFFBEDD90) = 0
lstat64(".", 0xFFBEDEB8) = 0
lstat64(".", 0xFFBEDFE0) = 0
and output is
..
../file2
../file1
../dir
../dir/file4
../dir/file3
../dir/dir2
../dir/dir2/file5
../dir/dir2/file6
while GNU find does only
lstat64(".", 0xFFBEEA10) = 0
lstat64(".", 0xFFBEE918) = 0
lstat64("file2", 0xFFBEE7B0) = 0
lstat64("file1", 0xFFBEE7B0) = 0
lstat64("dir", 0xFFBEE7B0) = 0
lstat64("file4", 0xFFBEE648) = 0
lstat64("file3", 0xFFBEE648) = 0
lstat64("dir2", 0xFFBEE648) = 0
and output is the same:
..
../file2
../file1
../dir
../dir/file4
../dir/file3
../dir/dir2
../dir/dir2/file5
../dir/dir2/file6
This makes GNU find faster than the Solaris find,
especially if there are many files on the deepest
directory level.
--
Michael Tosch
IT Specialist
Managed Services Germany
Hewlett-Packard GmbH
Phone: +49 2407 575 313
Mail: michael.tosch:hp.com
| |
| Barry Margolin 2004-12-04, 6:03 pm |
| In article <cosh1k$l68$1@aken.eed.ericsson.se>,
eedmit@NO.eed.SPAM.ericsson.PLS.se (Michael Tosch) wrote:
> Maybe the following is not the main problem here,
> but I want to share my observations with Solaris find.
That's really confusing. How can GNU find work properly if it doesn't
stat file5 and file6? How does it know they're not directories that it
should recurse into?
>
> Doing a
>
> truss -t stat,lstat find . >/dev/null
>
> the native Solaris find does
> lstat64(".", 0xFFBEE108) = 0
> lstat64("file2", 0xFFBEDFE0) = 0
> lstat64("file1", 0xFFBEDFE0) = 0
> lstat64("dir", 0xFFBEDFE0) = 0
> lstat64("file4", 0xFFBEDEB8) = 0
> lstat64("file3", 0xFFBEDEB8) = 0
> lstat64("dir2", 0xFFBEDEB8) = 0
> lstat64("file5", 0xFFBEDD90) = 0
> lstat64("file6", 0xFFBEDD90) = 0
> lstat64(".", 0xFFBEDEB8) = 0
> lstat64(".", 0xFFBEDFE0) = 0
> and output is
> .
> ./file2
> ./file1
> ./dir
> ./dir/file4
> ./dir/file3
> ./dir/dir2
> ./dir/dir2/file5
> ./dir/dir2/file6
>
> while GNU find does only
> lstat64(".", 0xFFBEEA10) = 0
> lstat64(".", 0xFFBEE918) = 0
> lstat64("file2", 0xFFBEE7B0) = 0
> lstat64("file1", 0xFFBEE7B0) = 0
> lstat64("dir", 0xFFBEE7B0) = 0
> lstat64("file4", 0xFFBEE648) = 0
> lstat64("file3", 0xFFBEE648) = 0
> lstat64("dir2", 0xFFBEE648) = 0
> and output is the same:
> .
> ./file2
> ./file1
> ./dir
> ./dir/file4
> ./dir/file3
> ./dir/dir2
> ./dir/dir2/file5
> ./dir/dir2/file6
>
> This makes GNU find faster than the Solaris find,
> especially if there are many files on the deepest
> directory level.
--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
| |
| Michael F Gordon 2004-12-04, 6:03 pm |
| Barry Margolin <barmar@alum.mit.edu> writes:
>That's really confusing. How can GNU find work properly if it doesn't
>stat file5 and file6? How does it know they're not directories that it
>should recurse into?
It looks at the link count on directories; from the link count of dir2
it knows that it has no subdirectories so it doesn't bother looking for
them. Generally it stops calling stat() when it's seen nlink-2 directories
(to account for . and ..)
Michael
--
Quidquid latine dictum sit, altum viditur.
| |
| Barry Margolin 2004-12-05, 2:48 am |
| In article <cot823$qdi$1@scotsman.ed.ac.uk>,
Michael F Gordon <mfg@ee.ed.ac.uk> wrote:
> Barry Margolin <barmar@alum.mit.edu> writes:
>
> It looks at the link count on directories; from the link count of dir2
> it knows that it has no subdirectories so it doesn't bother looking for
> them. Generally it stops calling stat() when it's seen nlink-2 directories
> (to account for . and ..)
Cute optimization, although it seems like it would not often be helpful.
I suspect that most find expressions include file attributes like -type
or -mtime, which require calling lstat() on everything. The
optimization can only be used when there's no expression, or the
expression only matches on the filename.
--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
| |
| Stephane CHAZELAS 2004-12-05, 7:50 am |
| 2004-12-04, 23:08(-05), Barry Margolin:
[...]
> Cute optimization, although it seems like it would not often be helpful.
> I suspect that most find expressions include file attributes like -type
> or -mtime, which require calling lstat() on everything.
Unless there's a -name '*.thistypeonly', that's why it's better
to put the predicates that don't involve stat before:
find . -name '*.txt' -type f
would be faster than:
find . -type f -name '*.txt'
Note that this optimization can be a problem on some file
systems (Microsoft ones on CDROM for instance), that's why
there's a -noleaf option to disable it.
--
Stephane
| |
| Villy Kruse 2004-12-06, 2:47 am |
| On 3 Dec 2004 20:49:16 GMT,
Chris F.A. Johnson <cfajohnson@gmail.com> wrote:
> On 2004-12-03, mika wrote:
>
> If the files are all in the same directory, you don't need find.
>
It will still take a long time as looking up a file name will need
to read through the directory from start until the file is found.
Some programs use a tree ctructure to avoid this problem, see
for example the terminfo directory, or the cache directory for
squid. A forrest structure will store the files in subdirectories
derived from the first few characters of the file name, for example,
the file "abcdef" is stored as "a/b/c/abcdef".
Villy
| |
| Stephane CHAZELAS 2004-12-06, 2:47 am |
| 2004-12-3, 20:49(+00), Chris F.A. Johnson:
> On 2004-12-03, mika wrote:
>
> If the files are all in the same directory, you don't need find.
But find, contrary to ls or shell globbing doesn't sort the
filenames, so it may be faster (except that it stat(2)s every
file in the current directory).
ls -f
may be useful.
>
> Show us the command you are using, and perhaps we can help.
Yes, it would help.
--
Stephane
|
|
|
|
|