|
Home > Archive > Unix Shell > March 2006 > removing duplicate fields within a column... should be pretty simple
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
removing duplicate fields within a column... should be pretty simple
|
|
| Jared McQueen 2006-03-09, 5:55 pm |
| so I have a log file that contains a timestamp and some data. I want
to sort | uniq just one column of this data.. here's the example:
logfile.log:
Jan 11, 2006 18:18:27.315 http://www.website.com/file1.jpg
Jan 11, 2006 18:18:27.745 http://www.website.com/file2.jpg <duplicate
Jan 11, 2006 18:18:27.917 http://www.website.com/file2.jpg <duplicate
Jan 11, 2006 18:18:28.321 http://www.website.com/file2.jpg <duplicate
Jan 11, 2006 18:18:29.103 http://www.website.com/file3.jpg
Jan 11, 2006 18:18:29.534 http://www.website.com/file4.jpg
I want to read in, and remove all the duplicate "file2.jpg" entries. I
don't care which one I keep, I just don't want the other 2 in there.
Since all the timestamps are unique, sort | uniq doesn't work. How can
I read this log file in and remove duplicates? I have to use bash
shell scripting. Many thanks.. my head is about to explode!
| |
| Xicheng 2006-03-09, 5:55 pm |
| Jared McQueen wrote:
> so I have a log file that contains a timestamp and some data. I want
> to sort | uniq just one column of this data.. here's the example:
>
> logfile.log:
> Jan 11, 2006 18:18:27.315 http://www.website.com/file1.jpg
> Jan 11, 2006 18:18:27.745 http://www.website.com/file2.jpg <duplicate
> Jan 11, 2006 18:18:27.917 http://www.website.com/file2.jpg <duplicate
> Jan 11, 2006 18:18:28.321 http://www.website.com/file2.jpg <duplicate
> Jan 11, 2006 18:18:29.103 http://www.website.com/file3.jpg
> Jan 11, 2006 18:18:29.534 http://www.website.com/file4.jpg
>
> I want to read in, and remove all the duplicate "file2.jpg" entries. I
> don't care which one I keep, I just don't want the other 2 in there.
perl -alne 'print unless $h{$F[4]}++' logfile.log
Xicheng
> Since all the timestamps are unique, sort | uniq doesn't work. How can
> I read this log file in and remove duplicates? I have to use bash
> shell scripting. Many thanks.. my head is about to explode!
| |
| Xicheng 2006-03-09, 8:48 pm |
| Xicheng wrote:
> Jared McQueen wrote:
>
>
> PERL -alne 'print unless $h{$F[4]}++' logfile.log
>
awk '!k[$5]++' logfile.log
Xicheng
[vbcol=seagreen]
>
>
| |
| Jared McQueen 2006-03-09, 8:48 pm |
| YES YES YES YES YES!!!
you guys are awesome, thanks so much!
here's what I ended up using:
awk '!k[$5]++' log.log | sort -k 5 (i want to sort by domain)
| |
| Robert Bonomi 2006-03-10, 5:57 pm |
| In article <1141947482.772908.311760@j52g2000cwj.googlegroups.com>,
Jared McQueen <jaredmcqueen@gmail.com> wrote:
>so I have a log file that contains a timestamp and some data. I want
>to sort | uniq just one column of this data.. here's the example:
>
>logfile.log:
>Jan 11, 2006 18:18:27.315 http://www.website.com/file1.jpg
>Jan 11, 2006 18:18:27.745 http://www.website.com/file2.jpg <duplicate
>Jan 11, 2006 18:18:27.917 http://www.website.com/file2.jpg <duplicate
>Jan 11, 2006 18:18:28.321 http://www.website.com/file2.jpg <duplicate
>Jan 11, 2006 18:18:29.103 http://www.website.com/file3.jpg
>Jan 11, 2006 18:18:29.534 http://www.website.com/file4.jpg
>
>I want to read in, and remove all the duplicate "file2.jpg" entries. I
>don't care which one I keep, I just don't want the other 2 in there.
>
>Since all the timestamps are unique, sort | uniq doesn't work. How can
Sure they do. to wit:
sort -k4,4 -u <logfile.log |sort >logfile.uniq
>I read this log file in and remove duplicates? I have to use bash
>shell scripting. Many thanks.. my head is about to explode!
>
| |
| Robert Bonomi 2006-03-10, 5:57 pm |
| In article <12142jbomepnv60@corp.supernews.com>,
Robert Bonomi <bonomi@host122.r-bonomi.com> wrote:
>In article <1141947482.772908.311760@j52g2000cwj.googlegroups.com>,
>Jared McQueen <jaredmcqueen@gmail.com> wrote:
>
>Sure they do. to wit:
>
> sort -k4,4 -u <logfile.log |sort >logfile.uniq
WUPS!! make that '-k5,5' *sigh*
|
|
|
|
|