|
Home > Archive > Unix Shell > November 2006 > grep on all files with dedicated suffix in a directory?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
grep on all files with dedicated suffix in a directory?
|
|
|
| Hi all,
I am new to Unix.
I want to grep string "SCJ" on all files with dedicated suffix (*.txt
file) in a directory.
Suppose I have cd to the directory, and I type "grep -r SCJ *.txt". But
it show no match. Please show me some suggestions.
BTW, my environment is Solaries.
Thanks in advance!
Best regards,
Davy
| |
| manish 2006-11-14, 1:38 am |
|
Davy wrote:
> Hi all,
>
> I am new to Unix.
> I want to grep string "SCJ" on all files with dedicated suffix (*.txt
> file) in a directory.
> Suppose I have cd to the directory, and I type "grep -r SCJ *.txt". But
> it show no match. Please show me some suggestions.
>
> BTW, my environment is Solaries.
>
> Thanks in advance!
>
> Best regards,
> Davy
you could use
find . -name *.txt -exec grep "SCJ" ' { } ' \ ;
Regards,
Manish.
| |
| rooksmith 2006-11-14, 1:38 am |
| Maybe I dont understand the question -- Why cant you just use:
grep SCJ *.txt
Did you want to search the entire tree (recursively?)
If so I think the find command is your friend.
manish wrote:
> Davy wrote:
>
> you could use
>
> find . -name *.txt -exec grep "SCJ" ' { } ' \ ;
>
> Regards,
> Manish.
| |
| victorfeng1973@yahoo.com 2006-11-14, 1:38 am |
|
You may try this if the one above does not work well
find . -type f -exec grep 'diag' {} + | grep txt
Victor
| |
|
| Hi rooksmith,
Yes, I want to search the entire tree recursively.
Best regards,
Davy
rooksmith wrote:[vbcol=seagreen]
> Maybe I dont understand the question -- Why cant you just use:
>
> grep SCJ *.txt
>
> Did you want to search the entire tree (recursively?)
> If so I think the find command is your friend.
>
>
> manish wrote:
| |
| Martin Jørgensen 2006-11-14, 1:38 am |
| "manish" <manishmodgil@gmail.com> writes:
> Davy wrote:
>
> you could use
>
> find . -name *.txt -exec grep "SCJ" ' { } ' \ ;
I never really understood the last part:
' { } ' \ ;
What exactly does it do? My guess is that the { } are substituted by the
individual .txt filenames but besides that?
Best regards
Martin Jørgensen
--
---------------------------------------------------------------------------
Home of Martin Jørgensen - http://www.martinjoergensen.dk
| |
|
|
manish wrote:
> Davy wrote:
>
> you could use
>
> find . -name *.txt -exec grep "SCJ" ' { } ' \ ;
Hi Manish,
Can you tell me what's " ' { } ' \ ;" mean in your command?
Thanks!
Davy
>
> Regards,
> Manish.
| |
|
|
victorfeng1973@yahoo.com wrote:
> You may try this if the one above does not work well
> find . -type f -exec grep 'diag' {} + | grep txt
>
> Victor
Hi Victor,
Can you tell me what's "grep 'diag' {} + | grep txt" mean? Thanks!
Best regards,
Davy
| |
| Kaz Kylheku 2006-11-14, 1:38 am |
| Davy wrote:
> Suppose I have cd to the directory, and I type "grep -r SCJ *.txt".
The -r option to grep is not part of the Unix specification. It may or
may not be a option which tells grep to process directories
recursively. If the Solaris grep has such an option, similar to the one
supported by GNU grep, then it will likely only descend into those
directories which are specified on the command line. Your command line
here only contains directory entries that match the pattern *.txt,
which are likely all text files. grep will look into those files, of
course, but most likely none of them are directories it can recurse
into.
The way you'd process everything in the current directory and its
children would be:
grep -r SCJ .
But of course, this doesn't express that you only want *.txt files. GNU
grep has additional options that let you specify patterns to match
while descending into directories. You'd use it like this:
grep -r --include=*.txt SCJ .
The . tells it "search the current directory". The -r means
"recursively process subdirectories". And the --include=*.txt specifies
the pattern to apply to the directory entries. Note that the *.txt is
not protected by quotes; we are relying on the assumption that nothing
in your current directory matches the pattern --include=*.txt.
Without the --include option, or something like it, and also, arguably,
without the --exclude option, a recursive grep is crippled beyond
nearly all real-world usability. For instance, you search a code tree
for some identifier, and it will look into the wrong files, such as
debug logs, object files, documentation, etc, taking way longer than
necessary and reporting garbage along with the desired results.
The users of such a crippled recursive grep will simply resort to using
non-recursive grep in conjunction with find, like in the responses
you've been give so far.
I found this thread about the grep -r option in the OpenSolaris forums.
Looks like it's a newly requested feature.
http://www.opensolaris.org/jive/thr...=2275&tstart=-1
LOL
| |
| comp.unix.solaris@expires-on-2006-11-22.usenet 2006-11-14, 1:38 am |
| On 2006-11-14, Martin Jørgensen <hotmail_spam@hotmail.com> wrote:
> "manish" <manishmodgil@gmail.com> writes:
>
> I never really understood the last part:
>
> ' { } ' \ ;
>
> What exactly does it do? My guess is that the { } are substituted by the
> individual .txt filenames but besides that?
Your guess is right. And »\;« marks the end of the command. You need
to protect the semicolon by a »\« as it is otherwise interpreted by
the shell as a statement separator.
Andreas.
| |
| Bill Marcum 2006-11-14, 1:38 am |
| ["Followup-To:" header set to comp.unix.shell.]
On Tue, 14 Nov 2006 05:19:20 +0100, Martin Jørgensen
<hotmail_spam@hotmail.com> wrote:
>
> I never really understood the last part:
>
> ' { } ' \ ;
>
> What exactly does it do? My guess is that the { } are substituted by the
> individual .txt filenames but besides that?
>
>
There should be no space in {} or \; . The {} is substituted by the
filename and \; marks the end of the -exec arguments. If you use +
instead of \; the -exec is executed with multiple filenames. Older
versions of GNU grep don't recognize +.
--
Unless you love someone, nothing else makes any sense.
-- e.e. cummings
| |
| Kaz Kylheku 2006-11-14, 1:38 am |
| Davy wrote:
> Hi Manish,
>
> Can you tell me what's " ' { } ' \ ;" mean in your command?
The idiot who originally designed find, in an infinite fit of
stupidity, decided to use characters in its syntax which are also
meaningful to the shell. The vast majority of the time find is called,
it is from a shell which understands parentheses and semicolons. So
these characters have to be escaped:
The -exec predicate of grep (execute a command for a matching file) is
followed by a variable number of arguments. Therefore, some way is
needed to indicate the end of that list of arguments. That indication
is an argument which consists of nothing but a semicolon. It must be
escaped, for instance with a backslash, because the shell will
otherwise consume it as a token indicating the end of the entire
command.
The {} notation is simply a syntactic indicator which indicates where
each file that is found should be substituted into the --exec
arguments. Those arguments then become a command which find invokes.
> My guess is that the { } are substituted by the
> individual .txt filenames but besides that?
You could do this in other ways besides -exec. The problem with -exec
is that it spawns a process for every darn file, which slows things
down.
If the number of '*.txt' files in the directory tree is expected to be
small, they could all fit into one big grep command line:
grep SCJ $(find . -name '*.txt')
If the number of files is large, use xargs. The xargs program reads
filenames from its standard input and assembles them into large command
lines, together with some leading arguments. So here, the grep is
called with SCJ /dev/null, followed by matching names. If the number of
names is small, only a single grep command is generated. If the names
won't fit into a single command line, xargs will generate multiple grep
command lines, as many as it takes to exhaust all of the names.
find . -name '*.txt" | xargs grep SCJ /dev/null
The /dev/null is added in case xargs feeds only a single file argument
to grep. In that case, grep behaves differently: it does not include
the name of the file among the results. The /dev/null file provides no
input to grep, but ensures that grep is working with at least two or
more file arguments, which cause it to print the name of the file
before each match.
If you do use -exec, as recommended by Manish, you still need the
/dev/null trick. If you are searching for text in a tree of many files,
you usually need to know which ones matched!
Here is another way you can use -exec, but to process directories at a
time, something like this. It's not particularly useful, but the
approach can be useful in other situations:
find . -type d -exec sh "grep SCJ /dev/null {}/*.txt" \;
This trick uses find to look for directories only. So what is
substituted for the {} syntax is a directory name. Grep is called to
process all files which match in that directory. But in order to expand
the *.txt pattern, we must put the command through the shell.
Lastly, you can do without find at all. Suppose that your directory
hiearchy only goes down three levels, and the number of files isn't
very large: not large enough to overflow the command line limit. You
can simply use shell expansion to do the search:
grep SCJ {*,*/*,*/*/*}.txt
The {...} syntax is worth knowing. It's a kind of Cartesian product
which causes the shell to expand the word multiple times, substituting
each of the comma-separated elements in turn as separate arguments. So
{a,b,c}x means ax bx cx, and {a,b}{1,2} means a1 b1 a2 b2. Here, that
notation serves us a shorthand for writing:
grep SCJ *.txt */*.txt */*/*.txt
I frequently use this approach because it can be much faster than using
find. Why? Because find performs a depth-first search, whereas with
shell globbing you can go depth first if you do it iteratively. Say I'm
searching a C language project for some identifier that is expected to
be in a .c file. First, search the top level:
grep identifier src/*.c
Then I try this, by recalling the previous command line and adding
another /*:
grep identifier src/*/*.c
Then this:
grep identifier src/*/*/*.c
At this point, grep says "no such file or directory", meaning that we
have hit the level where there are no more C files. Now we can apply a
heuristic: there are probably no more files at any lower level either.
The directory hierarchy might be considerably deeper than the level to
which we have probed it, so we'd be wasting time recursing any deeper
looking for .c files.
Suppose that the identifier is found at the second level. You've save a
lot of time. The find program will also eventually find the same match,
but before it gets there, it may have to go very deeply into some trees
where it wastes a lot of time not finding anything.
The -maxdepth argument of find is similarly useful for limiting its
depth. When a depth-first search is limited to a fixed depth, it can
be turned into a breadth-first search, by simply repeating the search
with increasing depths. This algorithm is known by as "iterative
deepening".
Also, if you simply know that all the .txt files are within three
levels, then add "-maxdepth 3". Then find won't be sucked into probing
some big, deep, subtree full of big .xml files or whatever.
| |
| comp.unix.solaris@expires-on-2006-11-22.usenet 2006-11-14, 7:32 am |
| On 2006-11-14, Kaz Kylheku <kkylheku@gmail.com> wrote:
> The idiot who originally designed find, [...]
You are talking about Dick Haight who belonged to the team that developed
the so-called Programmer's Workbench in the 70s. And before you call
him names, you should consider that the shell (at that time still the
old shell by John Mashey which predates the Bourne shell) was developed
in parallel.
Andreas.
| |
| manish 2006-11-14, 7:32 am |
|
Davy wrote:
> manish wrote:
> Hi Manish,
>
> Can you tell me what's " ' { } ' \ ;" mean in your command?
>
The ' { } ' part specifies that the grep "SCJ" works for all the files
returned by the find command. and \ is to indicate end of the grep. the
semicolon follows to end the statement.
[vbcol=seagreen]
> Thanks!
> Davy
>
| |
| Martin Jørgensen 2006-11-14, 1:16 pm |
| Bill Marcum <bmarcum@iglou.com> writes:
> ["Followup-To:" header set to comp.unix.shell.]
> On Tue, 14 Nov 2006 05:19:20 +0100, Martin Jørgensen
> <hotmail_spam@hotmail.com> wrote:
> There should be no space in {} or \; . The {} is substituted by the
> filename and \; marks the end of the -exec arguments. If you use +
> instead of \; the -exec is executed with multiple filenames. Older
> versions of GNU grep don't recognize +.
Ok, thanks... I've seen it many times and think I've used it a couple of
times but now I got the explanation for why it is like that...
Best regards
Martin Jørgensen
--
---------------------------------------------------------------------------
Home of Martin Jørgensen - http://www.martinjoergensen.dk
| |
| Stephane CHAZELAS 2006-11-14, 1:16 pm |
| 2006-11-14, 01:41(-05), Bill Marcum:
> ["Followup-To:" header set to comp.unix.shell.]
> On Tue, 14 Nov 2006 05:19:20 +0100, Martin Jørgensen
> <hotmail_spam@hotmail.com> wrote:
[...][vbcol=seagreen]
> There should be no space in {} or \; . The {} is substituted by the
> filename and \; marks the end of the -exec arguments. If you use +
> instead of \; the -exec is executed with multiple filenames. Older
> versions of GNU grep don't recognize +.
[...]
ITYM older versions of GNU find.
Also, with shells that support recursive globbing (such as zsh
or ksh -G (recent versions of ksh93 only)), you can do:
grep SCJ -- **/*.txt
You may end up with a "arg list too long" error in which case
both shells have a work around: zargs for zsh and command -x for
ksh93.
--
Stéphane
| |
| maxodyne 2006-11-18, 1:29 am |
| Kaz --
Thanks for taking the time to share your atomic-level knowledge
regarding 'find'. I learned a lot from you just now!
Kaz Kylheku wrote:
> Davy wrote:
>
>
>
> The idiot who originally designed find, in an infinite fit of
> stupidity, decided to use characters in its syntax which are also
> [ ... ]
| |
| Stefaan A Eeckels 2006-11-18, 7:22 am |
| On 13 Nov 2006 23:19:20 -0800
"Kaz Kylheku" <kkylheku@gmail.com> wrote:
> You could do this in other ways besides -exec. The problem with -exec
> is that it spawns a process for every darn file, which slows things
> down.
Which is why modern(*) versions of find have the '+' delimiter:
$ find . -type f -exec grep "Kaz" {} +
This makes find put as many file names on the grep command line as will
fit. It tends to be faster and less resource-intensive than xargs, and
easier than mucking around with the length of the command line yourself.
(*) ISTR it was introduced by Solaris and adopted rather quickly by the
GNU gang. It probably works on your system.
--
Stefaan A Eeckels
--
"What is stated clearly conceives easily." -- Inspired sales droid
| |
|
| On 2006-11-18, Stefaan A Eeckels <hoendech@ecc.lu> wrote:
> On 13 Nov 2006 23:19:20 -0800
> "Kaz Kylheku" <kkylheku@gmail.com> wrote:
>
>
> Which is why modern(*) versions of find have the '+' delimiter:
>
> $ find . -type f -exec grep "Kaz" {} +
>
> This makes find put as many file names on the grep command line as will
> fit. It tends to be faster and less resource-intensive than xargs, and
> easier than mucking around with the length of the command line yourself.
>
> (*) ISTR it was introduced by Solaris
From the "find" man page;
SunOS 5.10 SunOS 5.10 Last change: 24 Jun 2004 2
User Commands find(1)
-exec command True if the executed command returns a zero
value as exit status. The end of command
must be punctuated by an escaped semicolon
(;). A command argument {} is replaced by
the current path name. If the last argument
to -exec is {} and you specify + rather than
the semicolon (;), the command is invoked
fewer times, with {} replaced by groups of
pathnames.
Thanks for that!
--
"Other people are not your property."
[email me at huge [at] huge [dot] org [dot] uk]
| |
| Geoff Clare 2006-11-20, 1:18 pm |
| Stefaan A Eeckels <hoendech@ecc.lu> wrote, on Sat, 18 Nov 2006:
> Which is why modern(*) versions of find have the '+' delimiter:
>
> $ find . -type f -exec grep "Kaz" {} +
>
> This makes find put as many file names on the grep command line as will
> fit. It tends to be faster and less resource-intensive than xargs, and
> easier than mucking around with the length of the command line yourself.
>
> (*) ISTR it was introduced by Solaris and adopted rather quickly by the
> GNU gang. It probably works on your system.
It was introduced in SVR4 in the late 1980's, and was standardised
by POSIX in 2001. GNU findutils picked it up relatively recently
(in version 4.2.12, Jan 2005 - one of several improvements in GNU
find's POSIX conformance that happened after it got a new
maintainer, James Youngman).
--
Geoff Clare <netnews@gclare.org.uk>
| |
| Darren Dunham 2006-11-20, 1:18 pm |
| In comp.unix.solaris Stefaan A Eeckels <hoendech@ecc.lu> wrote:
> Which is why modern(*) versions of find have the '+' delimiter:
> $ find . -type f -exec grep "Kaz" {} +
> (*) ISTR it was introduced by Solaris and adopted rather quickly by the
> GNU gang. It probably works on your system.
Actually, it's from SYSV and predates Solaris. It's in every version of
SunOS 5.x, but wasn't documented there until Solaris 9.
David Korn claims authorship.
--
Darren Dunham ddunham@taos.com
Senior Technical Consultant TAOS http://www.taos.com/
Got some Dr Pepper? San Francisco, CA bay area
< This line left intentionally blank to confuse you. >
| |
| Sven Mascheck 2006-11-20, 7:22 pm |
| Darren Dunham wrote:
[vbcol=seagreen]
> David Korn claims authorship.
If so, it's interesting that it's not documented in the AST package.
It's implemented but behaves unexpectedly, because you need
"find -exec cmd +" instead of "find -exec cmd {} +".
(I've informed ast-users on 2006-01-26)
| |
| Darren Dunham 2006-11-21, 1:19 pm |
| In comp.unix.solaris Sven Mascheck <cus.x.mascheck@spamgourmet.com> wrote:
> Darren Dunham wrote:
[vbcol=seagreen]
[vbcol=seagreen]
> If so, it's interesting that it's not documented in the AST package.
> It's implemented but behaves unexpectedly, because you need
> "find -exec cmd +" instead of "find -exec cmd {} +".
> (I've informed ast-users on 2006-01-26)
Perhaps "claims authorship" is the incorrect phrase. Here is a message
where he mentioned adding that feature.
http://opengroup.org/austin/mailarc...l/msg03065.html
--
Darren Dunham ddunham@taos.com
Senior Technical Consultant TAOS http://www.taos.com/
Got some Dr Pepper? San Francisco, CA bay area
< This line left intentionally blank to confuse you. >
| |
| Joerg Schilling 2006-11-29, 1:17 pm |
| In article <RVl8h.27788$TV3.22247@newssvr21.news.prodigy.com>,
Darren Dunham <ddunham@redwood.taos.com> wrote:
>In comp.unix.solaris Stefaan A Eeckels <hoendech@ecc.lu> wrote:
>
>
>
>
>Actually, it's from SYSV and predates Solaris. It's in every version of
>SunOS 5.x, but wasn't documented there until Solaris 9.
>
>David Korn claims authorship.
BTW: it was missing from GNU find until very recently.
--
EMail:joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
js@cs.tu-berlin.de (uni)
schilling@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
|
|
|
|
|