|
Home > Archive > Unix Shell > October 2006 > Using the grep command to filter
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Using the grep command to filter
|
|
| transmute70@gmail.com 2006-10-28, 7:39 pm |
| Hello Everyone,
I am a college student currently enrolled in a Unix class. Unix is in
no way shape or form related to my major, and needless to say, I'm
stumped.
The question I must answer is this:
"Enter a single command-line that would find out the 5 files in/under
/etc that contain the string 'ubuntu,' case insensitively, more than
any other file in/under the /etc directory. Sort the list from most to
least and throw away standard error."
I can do each part the question asks separately, but I've no idea how
to put this all together to create a single working command. Any help
would be greatly appreciated.
Thanks,
D.
| |
| Todd H. 2006-10-28, 7:39 pm |
| transmute70@gmail.com writes:
> Hello Everyone,
>
> I am a college student currently enrolled in a Unix class. Unix is in
> no way shape or form related to my major, and needless to say, I'm
> stumped.
Finally someone asking for help with a homework question coming clean
and saying it's a homework question. Now this is the student I'd like
to help learn the material.
> The question I must answer is this:
>
> "Enter a single command-line that would find out the 5 files in/under
> /etc that contain the string 'ubuntu,' case insensitively, more than
> any other file in/under the /etc directory. Sort the list from most to
> least and throw away standard error."
Okay, so let's break this down. We need to find all files in /etc/
that ubuntu Ubuntu UBUNTU ubunTU whatever in them. Grep is the tool
for the job. grep ubuntu /etc/* of course. But that will do a case
senstive search. I'll bid you to consult
$ man grep
to find the one option you need to add to the grep command to make the
search case insensitive.
Now look at the output of that grep command will give you all the
lines of text in all the files in /etc with ubuntu in em. You may
even notice filenames at the beginning of each matching line.
Hrmm... maybe we can't just to a grep and maybe we need to start with
another command to operate on each file one at a time.
Big Hint: your one big ole command might start with a find /etc -exec
command and in the -exec clause, think about grep, wc, then maybe
piping that output somehow into sort finishing off with head -5
might be the ticket.
> I can do each part the question asks separately, but I've no idea how
> to put this all together to create a single working command. Any help
> would be greatly appreciated.
I think the "one file at a time" operation that find /etc -exec
brings to the table might be the concept you're looking for.
Another approach might involve a shell for loop.
Work on it a while and come back with follow on questions.
Best Regards,
--
Todd H.
http://www.toddh.net/
| |
| Barry Margolin 2006-10-29, 1:36 am |
| In article <84iri4dtde.fsf@ripco.com>, comphelp@toddh.net (Todd H.)
wrote:
> transmute70@gmail.com writes:
>
>
> Finally someone asking for help with a homework question coming clean
> and saying it's a homework question. Now this is the student I'd like
> to help learn the material.
>
>
> Okay, so let's break this down. We need to find all files in /etc/
> that ubuntu Ubuntu UBUNTU ubunTU whatever in them. Grep is the tool
> for the job. grep ubuntu /etc/* of course. But that will do a case
> senstive search. I'll bid you to consult
> $ man grep
>
> to find the one option you need to add to the grep command to make the
> search case insensitive.
>
> Now look at the output of that grep command will give you all the
> lines of text in all the files in /etc with ubuntu in em. You may
> even notice filenames at the beginning of each matching line.
> Hrmm... maybe we can't just to a grep and maybe we need to start with
> another command to operate on each file one at a time.
You could also use the option to grep that tells it to just print the
count of matches, rather than the matching lines themselves.
>
> Big Hint: your one big ole command might start with a find /etc -exec
> command and in the -exec clause, think about grep, wc, then maybe
> piping that output somehow into sort finishing off with head -5
> might be the ticket.
If you use the option to print the count of matches, you don't need to
use wc at all.
>
>
> I think the "one file at a time" operation that find /etc -exec
> brings to the table might be the concept you're looking for.
Not if you use the -c option to grep.
>
> Another approach might involve a shell for loop.
>
> Work on it a while and come back with follow on questions.
>
> Best Regards,
--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
| |
| Jon LaBadie 2006-10-29, 7:17 am |
| Barry Margolin wrote:
> In article <84iri4dtde.fsf@ripco.com>, comphelp@toddh.net (Todd H.)
> wrote:
>
>
> You could also use the option to grep that tells it to just print the
> count of matches, rather than the matching lines themselves.
>
>
> If you use the option to print the count of matches, you don't need to
> use wc at all.
>
>
> Not if you use the -c option to grep.
>
A problem with the grep -c approach is multiple instances of the
pattern on a single line. The -c outputs a count of "records",
i.e. lines, containing the pattern. While convenient, if multiple
instances of the pattern per line is likely, outputing the patterns
and counting them separately would be more accurate.
| |
| Michael Tosch 2006-10-30, 7:30 pm |
| Jon LaBadie wrote:
> Barry Margolin wrote:
>
> A problem with the grep -c approach is multiple instances of the
> pattern on a single line. The -c outputs a count of "records",
> i.e. lines, containing the pattern. While convenient, if multiple
> instances of the pattern per line is likely, outputing the patterns
> and counting them separately would be more accurate.
Assuming this is not the case -
another problem is that -l (list filename) and -c (match count)
are mutually exclusive. (While here GNU grep 2.2 differs from
GNU grep 2.5.1, and both differ from Posix.)
In order to get the file names AND the match count in one stroke,
I suggest the /dev/null trick.
The grep output then is file:count
To allow : characters in file names, I use sed to swap the order
to count:file (and by the way delete the zero matches).
Then do the sort.
find /etc -type f -exec grep -ic ubuntu {} /dev/null \; 2>/dev/null |
sed '/:0$/d; s/\(.*\):\([0-9]*\)$/\2:\1/' |
sort -nr
To only display the file names, you can chain a sed or cut command.
--
Michael Tosch @ hp : com
| |
| transmute70@gmail.com 2006-10-31, 1:21 pm |
| Wow, thanks everyone for your responses! Sorry for not getting back to
you sooner about my trials and tribulations with this question (not to
mention the rest of them).
I hope no one is offended when I say I have no clue what many of you
were referring to, however, I was able to pull some information from
your responses, such as wc, a head command, etc.
My lab partners and I came up with the following mess (I'm embarassed
because it doesn't even work right, but we were on a deadline and this
was as close as we got:
:~$ rgrep -s -i -c ubuntu /etc/ | awk -F: '{print $2 "\t" $1}' | sort
-rn | head -n5 | awk '{print $2}'
Now, this command never succesfully worked as far as I saw, my shell
window just sorta sat there and did nothing, so I am not sure if it was
trying to interpret the command or didn't like it. However, when we
tried it using a different (and smaller directory), the command line
worked, as seen below.
~$ rgrep -s -i -c ubuntu /etc/apt | awk -F: '{print $2 "\t" $1}' | sort
-rn | head -n5
15 /etc/apt/sources.list
2 /etc/apt/trusted.gpg~
2 /etc/apt/trusted.gpg
2 /etc/apt/apt.conf.d/50unattended-upgrades
0 /etc/apt/apt.conf.d/70debconf
:~$ rgrep -s -i -c ubuntu /etc/apt | awk -F: '{print $2 "\t" $1}' |
sort -rn | head -n5 | awk '{print $2}'
/etc/apt/sources.list
/etc/apt/trusted.gpg~
/etc/apt/trusted.gpg
/etc/apt/apt.conf.d/50unattended-upgrades
/etc/apt/apt.conf.d/70debconf
So, there you have it, the fruit of our labors! I am sure ther are more
effcient ways of doing this...specifically the professor mentioned
piping to grep two times, but I think it is obvious, we never figured
that particular method out.
I really appreciate eveyone's input, thanks again!
D
|
|
|
|
|