Unix Shell - ** Sort Question **

This is Interesting: Free IT Magazines  
Home > Archive > Unix Shell > April 2005 > ** Sort Question **





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author ** Sort Question **
validus1@gmail.com

2005-04-21, 8:48 pm

I have a question I hope some of you could shed some light on.
I was trying to figure out the proper options to the sort command
to do a multiple key sort on a file of email addresses. The file is
basically just a list of primary and alternate email addresses
separated by whitespace (tab or multiple spaces). Here is an example
file.

bob@hotmail.com bigman@yahoo.com
tony@juno.com patriotsfan@google.com
adrian@netscape.com pizzaboy@yahoo.com
stacy@netscape.com redrider@yahoo.com
johnboy@netscape.com alex@yahoo.com

Actually I'm only interesting in sorting on the primary addresses but
the first sort key would be the domain and the second key would be the
user.

I'm having problems with key 1. Key 1 should begin at the first @
symbol and extend to the first whitespace, but I think the problem is
that it is looking at everything to the second occurrence of the @
symbol. That is what I wasn't sure how to fix. I guess my question is
can or how do you tell the key to begin at the delimiter and end at the
first whitespace character? If possible, I would like to avoid using
the older origin-zero syntax
but if I must, that is fine.

This is using version 4.5.3 of the sort from the FSF on Red Hat 9
running bash 2.05b.

cheers,
validus

Icarus Sparry

2005-04-22, 2:53 am

On Thu, 21 Apr 2005 18:48:08 -0700, validus1 wrote:

> I have a question I hope some of you could shed some light on.
> I was trying to figure out the proper options to the sort command
> to do a multiple key sort on a file of email addresses. The file is
> basically just a list of primary and alternate email addresses
> separated by whitespace (tab or multiple spaces). Here is an example
> file.
>
> bob@hotmail.com bigman@yahoo.com
> tony@juno.com patriotsfan@google.com
> adrian@netscape.com pizzaboy@yahoo.com
> stacy@netscape.com redrider@yahoo.com
> johnboy@netscape.com alex@yahoo.com
>
> Actually I'm only interesting in sorting on the primary addresses but
> the first sort key would be the domain and the second key would be the
> user.
>
> I'm having problems with key 1. Key 1 should begin at the first @
> symbol and extend to the first whitespace, but I think the problem is
> that it is looking at everything to the second occurrence of the @
> symbol. That is what I wasn't sure how to fix. I guess my question is
> can or how do you tell the key to begin at the delimiter and end at the
> first whitespace character? If possible, I would like to avoid using
> the older origin-zero syntax
> but if I must, that is fine.
>
> This is using version 4.5.3 of the sort from the FSF on Red Hat 9
> running bash 2.05b.


Unfortunately sort only allows a single character to be specified as the
field separator, with some special case code for '\0' and nothing
specified. So you have to adapt your data to sort.

There are 2 approaches
1) Change your data so it only has a single key,
The most obvious transformation is to change the spaces/tabs into another
'@' character.


tr ' \t' '@' < data | sort -k2,2 -k1,1 | sed 's/@/ /2'

You can do more fancy stuff to get the output into neater columns if you want.

2) Add extra data, so it has the information in the manner you want, sort,
and then remove the extra data.

sed 's/^\(^@*@\)\([^ ]*\)/\2 \1&/' data | sort | sed 's/[^@]*@//'
.....................^^^^^ a space and a tab character

Icarus
Chris F.A. Johnson

2005-04-22, 2:53 am

On Fri, 22 Apr 2005 at 01:48 GMT, validus1@gmail.com wrote:
> I have a question I hope some of you could shed some light on.
> I was trying to figure out the proper options to the sort command
> to do a multiple key sort on a file of email addresses. The file is
> basically just a list of primary and alternate email addresses
> separated by whitespace (tab or multiple spaces). Here is an example
> file.
>
> bob@hotmail.com bigman@yahoo.com
> tony@juno.com patriotsfan@google.com
> adrian@netscape.com pizzaboy@yahoo.com
> stacy@netscape.com redrider@yahoo.com
> johnboy@netscape.com alex@yahoo.com
>
> Actually I'm only interesting in sorting on the primary addresses but
> the first sort key would be the domain and the second key would be the
> user.
>
> I'm having problems with key 1. Key 1 should begin at the first @
> symbol and extend to the first whitespace, but I think the problem is
> that it is looking at everything to the second occurrence of the @
> symbol. That is what I wasn't sure how to fix. I guess my question is
> can or how do you tell the key to begin at the delimiter and end at the
> first whitespace character? If possible, I would like to avoid using
> the older origin-zero syntax
> but if I must, that is fine.


TAB=$'\t' ## Use a literal tab with shell that don't support this.
awk -F '[@ $TAB]' '{printf "%s\t%s\n", $2, $0}' FILE | sort | cut -f2-

> This is using version 4.5.3 of the sort from the FSF on Red Hat 9
> running bash 2.05b.


--
Chris F.A. Johnson http://cfaj.freeshell.org/shell
========================================
===========================
My code (if any) in this post is copyright 2005, Chris F.A. Johnson
and may be copied under the terms of the GNU General Public License
Chris F.A. Johnson

2005-04-24, 2:50 am

On Fri, 22 Apr 2005 at 01:48 GMT, validus1@gmail.com wrote:
> I have a question I hope some of you could shed some light on.
> I was trying to figure out the proper options to the sort command
> to do a multiple key sort on a file of email addresses. The file is
> basically just a list of primary and alternate email addresses
> separated by whitespace (tab or multiple spaces). Here is an example
> file.
>
> bob@hotmail.com bigman@yahoo.com
> tony@juno.com patriotsfan@google.com
> adrian@netscape.com pizzaboy@yahoo.com
> stacy@netscape.com redrider@yahoo.com
> johnboy@netscape.com alex@yahoo.com
>
> Actually I'm only interesting in sorting on the primary addresses but
> the first sort key would be the domain and the second key would be the
> user.
>
> I'm having problems with key 1. Key 1 should begin at the first @
> symbol and extend to the first whitespace, but I think the problem is
> that it is looking at everything to the second occurrence of the @
> symbol. That is what I wasn't sure how to fix. I guess my question is
> can or how do you tell the key to begin at the delimiter and end at the
> first whitespace character? If possible, I would like to avoid using
> the older origin-zero syntax
> but if I must, that is fine.


TAB=$'\t' ## Use a literal tab with shell that don't support this.
awk -F '[@ $TAB]' '{printf "%s\t%s\n", $2, $0}' FILE | sort | cut -f2-

> This is using version 4.5.3 of the sort from the FSF on Red Hat 9
> running bash 2.05b.


--
Chris F.A. Johnson http://cfaj.freeshell.org/shell
========================================
===========================
My code (if any) in this post is copyright 2005, Chris F.A. Johnson
and may be copied under the terms of the GNU General Public License
Daniel C. von Asmuth

2005-04-30, 6:10 pm

Icarus Sparry wrote:
> On Thu, 21 Apr 2005 18:48:08 -0700, validus1 wrote:
>
> There are 2 approaches
> 1) Change your data so it only has a single key,
> The most obvious transformation is to change the spaces/tabs into another
> '@' character.
>
> tr ' \t' '@' < data | sort -k2,2 -k1,1 | sed 's/@/ /2'


Or you can change the at-sign into a space, as in
sed -e '/@/s// /' data | sort -b -i -k2,2 -k1,1 | sed -e '/ /s//@/'
('data' would be the name of the input file)

Kind regards,


Daniel von Asmuth

Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com