Unix Shell - removing duplicate fields within a column... should be pretty simple

This is Interesting: Free IT Magazines  
Home > Archive > Unix Shell > March 2006 > removing duplicate fields within a column... should be pretty simple





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author removing duplicate fields within a column... should be pretty simple
Jared McQueen

2006-03-09, 5:55 pm

so I have a log file that contains a timestamp and some data. I want
to sort | uniq just one column of this data.. here's the example:

logfile.log:
Jan 11, 2006 18:18:27.315 http://www.website.com/file1.jpg
Jan 11, 2006 18:18:27.745 http://www.website.com/file2.jpg <duplicate
Jan 11, 2006 18:18:27.917 http://www.website.com/file2.jpg <duplicate
Jan 11, 2006 18:18:28.321 http://www.website.com/file2.jpg <duplicate
Jan 11, 2006 18:18:29.103 http://www.website.com/file3.jpg
Jan 11, 2006 18:18:29.534 http://www.website.com/file4.jpg

I want to read in, and remove all the duplicate "file2.jpg" entries. I
don't care which one I keep, I just don't want the other 2 in there.

Since all the timestamps are unique, sort | uniq doesn't work. How can
I read this log file in and remove duplicates? I have to use bash
shell scripting. Many thanks.. my head is about to explode!

Xicheng

2006-03-09, 5:55 pm

Jared McQueen wrote:
> so I have a log file that contains a timestamp and some data. I want
> to sort | uniq just one column of this data.. here's the example:
>
> logfile.log:
> Jan 11, 2006 18:18:27.315 http://www.website.com/file1.jpg
> Jan 11, 2006 18:18:27.745 http://www.website.com/file2.jpg <duplicate
> Jan 11, 2006 18:18:27.917 http://www.website.com/file2.jpg <duplicate
> Jan 11, 2006 18:18:28.321 http://www.website.com/file2.jpg <duplicate
> Jan 11, 2006 18:18:29.103 http://www.website.com/file3.jpg
> Jan 11, 2006 18:18:29.534 http://www.website.com/file4.jpg
>
> I want to read in, and remove all the duplicate "file2.jpg" entries. I
> don't care which one I keep, I just don't want the other 2 in there.



perl -alne 'print unless $h{$F[4]}++' logfile.log

Xicheng


> Since all the timestamps are unique, sort | uniq doesn't work. How can
> I read this log file in and remove duplicates? I have to use bash
> shell scripting. Many thanks.. my head is about to explode!


Xicheng

2006-03-09, 8:48 pm

Xicheng wrote:
> Jared McQueen wrote:
>
>
> PERL -alne 'print unless $h{$F[4]}++' logfile.log
>


awk '!k[$5]++' logfile.log

Xicheng
[vbcol=seagreen]
>
>

Jared McQueen

2006-03-09, 8:48 pm

YES YES YES YES YES!!!

you guys are awesome, thanks so much!

here's what I ended up using:
awk '!k[$5]++' log.log | sort -k 5 (i want to sort by domain)

Robert Bonomi

2006-03-10, 5:57 pm

In article <1141947482.772908.311760@j52g2000cwj.googlegroups.com>,
Jared McQueen <jaredmcqueen@gmail.com> wrote:
>so I have a log file that contains a timestamp and some data. I want
>to sort | uniq just one column of this data.. here's the example:
>
>logfile.log:
>Jan 11, 2006 18:18:27.315 http://www.website.com/file1.jpg
>Jan 11, 2006 18:18:27.745 http://www.website.com/file2.jpg <duplicate
>Jan 11, 2006 18:18:27.917 http://www.website.com/file2.jpg <duplicate
>Jan 11, 2006 18:18:28.321 http://www.website.com/file2.jpg <duplicate
>Jan 11, 2006 18:18:29.103 http://www.website.com/file3.jpg
>Jan 11, 2006 18:18:29.534 http://www.website.com/file4.jpg
>
>I want to read in, and remove all the duplicate "file2.jpg" entries. I
>don't care which one I keep, I just don't want the other 2 in there.
>
>Since all the timestamps are unique, sort | uniq doesn't work. How can


Sure they do. to wit:

sort -k4,4 -u <logfile.log |sort >logfile.uniq

>I read this log file in and remove duplicates? I have to use bash
>shell scripting. Many thanks.. my head is about to explode!
>


Robert Bonomi

2006-03-10, 5:57 pm

In article <12142jbomepnv60@corp.supernews.com>,
Robert Bonomi <bonomi@host122.r-bonomi.com> wrote:
>In article <1141947482.772908.311760@j52g2000cwj.googlegroups.com>,
>Jared McQueen <jaredmcqueen@gmail.com> wrote:
>
>Sure they do. to wit:
>
> sort -k4,4 -u <logfile.log |sort >logfile.uniq


WUPS!! make that '-k5,5' *sigh*

Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2009 webservertalk.com