|
Home > Archive > Unix questions > January 2005 > output from cut command
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
output from cut command
|
|
|
| Hi,
I have been using this command to extract the unique data from a file:
cut -c1-6 filea |sort -u
I'd like the output to ignore any blanks or empty fields. Just not sure how
to get around that.
Is there a better way to do this?
Thanks
| |
| Ed Morton 2005-01-11, 8:50 pm |
|
Shawn wrote:
> Hi,
>
> I have been using this command to extract the unique data from a file:
>
> cut -c1-6 filea |sort -u
>
> I'd like the output to ignore any blanks or empty fields. Just not sure how
> to get around that.
> Is there a better way to do this?
This will ignore leading blanks and empty lines in the input file:
gawk '$1{a[gensub(/(......).*/,"\\1","",$1)]=""}
END{for(t in a)print t}' filea
If that's not what you meant, please clarify.
Regards,
Ed.
> Thanks
>
>
| |
| Bill Seivert 2005-01-12, 2:53 am |
|
Ed Morton wrote:
[vbcol=seagreen]
>
>
> Shawn wrote:
>
>
>
> This will ignore leading blanks and empty lines in the input file:
>
> gawk '$1{a[gensub(/(......).*/,"\\1","",$1)]=""}
> END{for(t in a)print t}' filea
>
> If that's not what you meant, please clarify.
>
> Regards,
>
> Ed.
>
>
>
Perhaps more concise:
awk '{if ($1 != "") {print substr ($1, 1, 6);}}' filea | sort -u
or
sed -n -e "s/^[ ]*\(......\).*/\1/p" filea | sort -u
Though the sed command has the down-side of not printing records that
have fewer
than six characters.
Thanks.
Bill Seivert
seivert@pcisys.net
| |
| Stephane CHAZELAS 2005-01-12, 7:48 am |
| 2005-01-11, 16:20(-07), Shawn:
[...]
> I have been using this command to extract the unique data from a file:
>
> cut -c1-6 filea |sort -u
>
> I'd like the output to ignore any blanks or empty fields. Just not sure how
> to get around that.
[...]
You meant "a b" to be the same as " a b" when eliminating
duplicates?
< filea cut -c1-6 | sort -k1,1 -k2,2 -k3,3 -bu
(a 6 character string can't have more than 3 fields)
Or:
< filea awk '{$0=substr($0,1,6);$1=$1;print}' | sort -u
If you want the output to be normalized.
--
Stephane
| |
| Ed Morton 2005-01-12, 5:55 pm |
|
Bill Seivert wrote:
>
>
> Ed Morton wrote:
>
> Perhaps more concise:
> awk '{if ($1 != "") {print substr ($1, 1, 6);}}' filea | sort -u
Yeah, substr()s fine for this example and is a little less typing than
gensub(). I still wouldn't bother with the pipe to sort though since
it's trivial to just use awk to print the unique lines, and it'd me more
awk-ish not have an explicit "if.." test but instead specify the
condition in the condition part of the body:
awk '$1!=""{a[substr($1,1,6)]=""}END{for(t in a)print t}'
or, if "$1" can never be zero (as I had originally assumed), just:
awk '$1{...}'
Regards,
Ed.
| |
|
|
"Ed Morton" <morton@lsupcaemnt.com> wrote in message
news:MOGdnT_R8P_qsnjcRVn-rw@comcast.com...
>
>
> Bill Seivert wrote:
>
> Yeah, substr()s fine for this example and is a little less typing than
> gensub(). I still wouldn't bother with the pipe to sort though since it's
> trivial to just use awk to print the unique lines, and it'd me more
> awk-ish not have an explicit "if.." test but instead specify the condition
> in the condition part of the body:
>
> awk '$1!=""{a[substr($1,1,6)]=""}END{for(t in a)print t}'
>
> or, if "$1" can never be zero (as I had originally assumed), just:
>
> awk '$1{...}'
>
> Regards,
>
> Ed.
>
Ok, I tried some of the suggestions. And my output doesn't return and blank
lines but it does return data that I didn't expect.
For example, here is a sample of a file:
55555512/12/04xxxxxxxx,xxxxxxx
111111
55555512/12/04yyyyyyyy,yyyyyyy
222222
What I was would return is :
555555
111111
222222
What I'd want is:
555555
Now, I can get this if run this command off the file:
cut -c1-6 filea| grep -v ' ' | sort -u
But, I'm unsure about this command and if it will return what I want all the
time. Any suggestions?
Thanks again,
Shawn
| |
| Bill Seivert 2005-01-14, 8:47 pm |
|
Ed Morton wrote:
>
>
> Bill Seivert wrote:
>
>
>
> Yeah, substr()s fine for this example and is a little less typing than
> gensub(). I still wouldn't bother with the pipe to sort though since
> it's trivial to just use awk to print the unique lines, and it'd me
> more awk-ish not have an explicit "if.." test but instead specify the
> condition in the condition part of the body:
>
> awk '$1!=""{a[substr($1,1,6)]=""}END{for(t in a)print t}'
>
> or, if "$1" can never be zero (as I had originally assumed), just:
>
> awk '$1{...}'
>
> Regards,
>
> Ed.
The for (t in a) will produce the unique records but they are not
guaranteed to be in any specific
order ("implementation dependent"), so the sort -u is an option of
ordering is important.
Bill Seivert
seivert@pcisys.net
| |
| sharma__r@hotmail.com 2005-01-15, 2:47 am |
|
Shawn wrote:
> Hi,
>
> I have been using this command to extract the unique data from a
file:
>
> cut -c1-6 filea |sort -u
>
> I'd like the output to ignore any blanks or empty fields. Just not
sure how
> to get around that.
>
< fileA sed -e 's/^\(.\{6\}\).*/\1/;/[^ ]/\!d' | sort -bu
| |
| Ed Morton 2005-01-15, 5:49 pm |
|
Bill Seivert wrote:
>
>
> Ed Morton wrote:
>
>
>
> The for (t in a) will produce the unique records but they are not
> guaranteed to be in any specific
> order ("implementation dependent"), so the sort -u is an option of
> ordering is important.
That's right, but there's nothing in the OPs posting to suggest ordering
is important. He just wants to "extract the unique data from a file". If
ordering was important, he could just set the WHINY_USERS gawk variable
(it's a real variable - honest!) to order the output.
Ed.
> Bill Seivert
> seivert@pcisys.net
>
|
|
|
|
|