Unix questions - output from cut command

This is Interesting: Free IT Magazines  
Home > Archive > Unix questions > January 2005 > output from cut command





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author output from cut command
Shawn

2005-01-11, 5:55 pm

Hi,

I have been using this command to extract the unique data from a file:

cut -c1-6 filea |sort -u

I'd like the output to ignore any blanks or empty fields. Just not sure how
to get around that.
Is there a better way to do this?

Thanks


Ed Morton

2005-01-11, 8:50 pm



Shawn wrote:
> Hi,
>
> I have been using this command to extract the unique data from a file:
>
> cut -c1-6 filea |sort -u
>
> I'd like the output to ignore any blanks or empty fields. Just not sure how
> to get around that.
> Is there a better way to do this?


This will ignore leading blanks and empty lines in the input file:

gawk '$1{a[gensub(/(......).*/,"\\1","",$1)]=""}
END{for(t in a)print t}' filea

If that's not what you meant, please clarify.

Regards,

Ed.



> Thanks
>
>

Bill Seivert

2005-01-12, 2:53 am



Ed Morton wrote:
[vbcol=seagreen]
>
>
> Shawn wrote:
>
>
>
> This will ignore leading blanks and empty lines in the input file:
>
> gawk '$1{a[gensub(/(......).*/,"\\1","",$1)]=""}
> END{for(t in a)print t}' filea
>
> If that's not what you meant, please clarify.
>
> Regards,
>
> Ed.
>
>
>
Perhaps more concise:
awk '{if ($1 != "") {print substr ($1, 1, 6);}}' filea | sort -u

or

sed -n -e "s/^[ ]*\(......\).*/\1/p" filea | sort -u

Though the sed command has the down-side of not printing records that
have fewer
than six characters.

Thanks.
Bill Seivert
seivert@pcisys.net

Stephane CHAZELAS

2005-01-12, 7:48 am

2005-01-11, 16:20(-07), Shawn:
[...]
> I have been using this command to extract the unique data from a file:
>
> cut -c1-6 filea |sort -u
>
> I'd like the output to ignore any blanks or empty fields. Just not sure how
> to get around that.

[...]

You meant "a b" to be the same as " a b" when eliminating
duplicates?

< filea cut -c1-6 | sort -k1,1 -k2,2 -k3,3 -bu

(a 6 character string can't have more than 3 fields)

Or:

< filea awk '{$0=substr($0,1,6);$1=$1;print}' | sort -u

If you want the output to be normalized.

--
Stephane
Ed Morton

2005-01-12, 5:55 pm



Bill Seivert wrote:
>
>
> Ed Morton wrote:
>
> Perhaps more concise:
> awk '{if ($1 != "") {print substr ($1, 1, 6);}}' filea | sort -u


Yeah, substr()s fine for this example and is a little less typing than
gensub(). I still wouldn't bother with the pipe to sort though since
it's trivial to just use awk to print the unique lines, and it'd me more
awk-ish not have an explicit "if.." test but instead specify the
condition in the condition part of the body:

awk '$1!=""{a[substr($1,1,6)]=""}END{for(t in a)print t}'

or, if "$1" can never be zero (as I had originally assumed), just:

awk '$1{...}'

Regards,

Ed.
Shawn

2005-01-13, 8:48 pm


"Ed Morton" <morton@lsupcaemnt.com> wrote in message
news:MOGdnT_R8P_qsnjcRVn-rw@comcast.com...
>
>
> Bill Seivert wrote:
>
> Yeah, substr()s fine for this example and is a little less typing than
> gensub(). I still wouldn't bother with the pipe to sort though since it's
> trivial to just use awk to print the unique lines, and it'd me more
> awk-ish not have an explicit "if.." test but instead specify the condition
> in the condition part of the body:
>
> awk '$1!=""{a[substr($1,1,6)]=""}END{for(t in a)print t}'
>
> or, if "$1" can never be zero (as I had originally assumed), just:
>
> awk '$1{...}'
>
> Regards,
>
> Ed.
>


Ok, I tried some of the suggestions. And my output doesn't return and blank
lines but it does return data that I didn't expect.

For example, here is a sample of a file:
55555512/12/04xxxxxxxx,xxxxxxx
111111
55555512/12/04yyyyyyyy,yyyyyyy
222222

What I was would return is :

555555
111111
222222


What I'd want is:
555555

Now, I can get this if run this command off the file:

cut -c1-6 filea| grep -v ' ' | sort -u

But, I'm unsure about this command and if it will return what I want all the
time. Any suggestions?

Thanks again,
Shawn


Bill Seivert

2005-01-14, 8:47 pm



Ed Morton wrote:

>
>
> Bill Seivert wrote:
>
>
>
> Yeah, substr()s fine for this example and is a little less typing than
> gensub(). I still wouldn't bother with the pipe to sort though since
> it's trivial to just use awk to print the unique lines, and it'd me
> more awk-ish not have an explicit "if.." test but instead specify the
> condition in the condition part of the body:
>
> awk '$1!=""{a[substr($1,1,6)]=""}END{for(t in a)print t}'
>
> or, if "$1" can never be zero (as I had originally assumed), just:
>
> awk '$1{...}'
>
> Regards,
>
> Ed.


The for (t in a) will produce the unique records but they are not
guaranteed to be in any specific
order ("implementation dependent"), so the sort -u is an option of
ordering is important.

Bill Seivert
seivert@pcisys.net

sharma__r@hotmail.com

2005-01-15, 2:47 am


Shawn wrote:
> Hi,
>
> I have been using this command to extract the unique data from a

file:
>
> cut -c1-6 filea |sort -u
>
> I'd like the output to ignore any blanks or empty fields. Just not

sure how
> to get around that.
>



< fileA sed -e 's/^\(.\{6\}\).*/\1/;/[^ ]/\!d' | sort -bu

Ed Morton

2005-01-15, 5:49 pm



Bill Seivert wrote:
>
>
> Ed Morton wrote:
>
>
>
> The for (t in a) will produce the unique records but they are not
> guaranteed to be in any specific
> order ("implementation dependent"), so the sort -u is an option of
> ordering is important.


That's right, but there's nothing in the OPs posting to suggest ordering
is important. He just wants to "extract the unique data from a file". If
ordering was important, he could just set the WHINY_USERS gawk variable
(it's a real variable - honest!) to order the output.

Ed.

> Bill Seivert
> seivert@pcisys.net
>

Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com