Unix Programming - help with sort syntax

This is Interesting: Free IT Magazines  
Home > Archive > Unix Programming > May 2006 > help with sort syntax





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author help with sort syntax
sam billabong

2006-05-18, 7:21 am

Hi all,
the syntax of the SORT command id baffling me :-((

I'm trying to sort this flat file by two "absolute" positions.
That is: I would like to sort it by the 9-th column (the one containing
'9' '5' '5' '5') and THEN by the 13-th column ( '3' '7' '4' '1')

1234567890123456789
zzzzzz 5a 6781....
zzz zz 5z 7481....
z b a r 5b 1181....

I've tried :

sort +0.8 -0.8 +0.12 -0.12 o.txt
and variations like : sort +0.9 -0.9 +0.13 -0.13 o.txt ...

and also
sort -k 1.9,1.9 -k 1.13,1.13 o.txt ... and variations

but I'm not even able to understand the output ( i know: i'm definitely
stupid )
I've googled and MAN'nned and re-googled and re-MAN'ed to no avail .

is there anyone willing to help me ?

the problem is:
i want to sort a file :
first based on the content of byte 9 , and then on the content of byte
13.
every "line" of the file is a record, the record does NOT contains
fileds.

THANKYOU

if that matters, here's the output of "uname": Linux 2.6.5-7.202.7 ...
i386GNU/Linux

Rainer Temme

2006-05-18, 7:21 am

sam billabong wrote:

Hi sam,

> I'm trying to sort this flat file by two "absolute" positions.
> That is: I would like to sort it by the 9-th column (the one containing
> '9' '5' '5' '5') and THEN by the 13-th column ( '3' '7' '4' '1')
>
> 1234567890123456789
> zzzzzz 5a 6781....
> zzz zz 5z 7481....
> z b a r 5b 1181....


> sort -k 1.9,1.9 -k 1.13,1.13 o.txt ... and variations


the output of this on my system (SuSE Linux 9.3) is
z b a r 5b 1181....
zzz zz 5z 7481....
zzzzzz 5a 6781....
1234567890123456789
wha'ts wrong with that?

> the problem is:
> i want to sort a file :
> first based on the content of byte 9 , and then on the content of byte
> 13.
> every "line" of the file is a record, the record does NOT contains
> fileds.


Well, may be your sort it a bit overpicky with fields ... how about
sort --field-separator=# -k 1.9,1.9 -k 1.13,1.13 o.txt
this might help.

Rainer
Daniel P. Valentine

2006-05-18, 7:21 am

In article <1147942403.337028.298130@j33g2000cwa.googlegroups.com>,
"sam billabong" <rfacco@email.it> wrote:

> Hi all,
> the syntax of the SORT command id baffling me :-((
>
> I'm trying to sort this flat file by two "absolute" positions.
> That is: I would like to sort it by the 9-th column (the one containing
> '9' '5' '5' '5') and THEN by the 13-th column ( '3' '7' '4' '1')
>
> 1234567890123456789
> zzzzzz 5a 6781....
> zzz zz 5z 7481....
> z b a r 5b 1181....
>
> I've tried :
>
> sort +0.8 -0.8 +0.12 -0.12 o.txt
> and variations like : sort +0.9 -0.9 +0.13 -0.13 o.txt ...
>
> and also
> sort -k 1.9,1.9 -k 1.13,1.13 o.txt ... and variations


The biggest problem you are having is that sort is splitting your record
into fields based on the spaces in the record. To make this last one
work, all I had to do was to specify a field delimiter that forced it to
consider the (at least) the first 13 bytes as a single field.

You can use any character you "know" will not appear in the first 13
bytes--all you need is to ensure that the first 13 bytes don't get split
up. For the data you show here, a semicolon does the trick:

% sort -t ";" +0.9 -0.9 +0.13 -0.13 testsort.txt
1234567890123456789
z b a r 5b 1181....
zzz zz 5z 7481....
zzzzzz 5a 6781....

If you can't make a prediction on what may or may not appear in the
first 13 characters, you're probably safest using the newline character
(or whatever separates records from each other) as the field delimiter.

% sort -t "\
" +0.9 -0.9 +0.13 -0.13 testsort.txt
1234567890123456789
z b a r 5b 1181....
zzz zz 5z 7481....
zzzzzz 5a 6781....

To make that happen, I typed \, ^V, and ^M between the quotes. The
backslash told the shell that the newline character was not the end of
the command line, and the control-V told the shell to take the next
character verbatim. The control-M is a newline.

> [...]
> if that matters, here's the output of "uname": Linux 2.6.5-7.202.7 ...
> i386GNU/Linux


If it had mattered, the version of sort would be more important in this
case:

% sort --version
sort (GNU coreutils) 5.93
Copyright (C) 2005 Free Software Foundation, Inc.
This is free software. You may redistribute copies of it under the
terms of
the GNU General Public License <http://www.gnu.org/licenses/gpl.html>.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and Paul Eggert.

--
dpv
sam billabong

2006-05-18, 7:21 am


Daniel P. Valentine wrote:
> % sort -t ";" +0.9 -0.9 +0.13 -0.13 testsort.txt
> 1234567890123456789
> z b a r 5b 1181....
> zzz zz 5z 7481....
> zzzzzz 5a 6781....


Dan: thank you for your help !
on my system I have the same output as you ... unfortunately
I don't understand if and how it is sorted , for sure it is NOT sorted
nor for the 9-th nor for the 13-th column ... or am i completely out to
lunch ?

the 9-th column in the output is '9' '5' '5' '5' (reverse order ?? !)
and the 13-th is '3' '1' '4' '7'

Thank you for your patience.
------------------------------------------
~ # sort --version
sort (coreutils) 5.2.1
Written by Mike Haertel and Paul Eggert.

Copyright (C) 2004 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is
NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.

Rainer Temme

2006-05-18, 7:21 am

sam billabong wrote:
> Daniel P. Valentine wrote:
>
> Dan: thank you for your help !
> on my system I have the same output as you ... unfortunately
> I don't understand if and how it is sorted ...


The Startposition is the very first character.

> ~ # sort --version
> sort (coreutils) 5.2.1


$ cat in
1234567890123456789
zzzzzz 5a 6781....
zzz zz 5z 7481....
z b a r 5b 1181....

$ sort --version
sort (GNU coreutils) 5.3.0

$ cat in | sort -t '#' -k 1.9,1.9 -k 1.13,1.13
z b a r 5b 1181....
zzz zz 5z 7481....
zzzzzz 5a 6781....
1234567890123456789

May be you should try to upgrade to version 5.3.0

Rainer
Carl Lowenstein

2006-05-18, 1:16 pm

In article <1147947044.377207.305710@i39g2000cwa.googlegroups.com>,
sam billabong <rfacco@email.it> wrote:
>
>Daniel P. Valentine wrote:
>
>Dan: thank you for your help !
>on my system I have the same output as you ... unfortunately
>I don't understand if and how it is sorted , for sure it is NOT sorted
>nor for the 9-th nor for the 13-th column ... or am i completely out to
>lunch ?
>
>the 9-th column in the output is '9' '5' '5' '5' (reverse order ?? !)
>and the 13-th is '3' '1' '4' '7'
>
>Thank you for your patience.
>------------------------------------------
>~ # sort --version
>sort (coreutils) 5.2.1
>Written by Mike Haertel and Paul Eggert.
>
>Copyright (C) 2004 Free Software Foundation, Inc.
>This is free software; see the source for copying conditions. There is
>NO
>warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
>PURPOSE.


For better or worse, when using the old-time options to sort like +0.9
-0.9 the column that you have labeled "9" is column number 8. Likewise,
the column labeled "3" for 13 is column number 12. Numbers start from 0.

Version 5.3 of sort eliminates this confusion by no longer allowing this
+n.m column specification. Of course it adds to the confusion of those of
us who painfully learned sort(1) syntax back in 6th Edition Unix days.

carl
--
carl lowenstein marine physical lab u.c. san diego
clowenst@ucsd.edu
Jim Cochrane

2006-05-18, 7:15 pm

["Followup-To:" header set to comp.unix.programmer.]
On 2006-05-18, Carl Lowenstein <cdl@deeptow.ucsd.edu> wrote:
> In article <1147947044.377207.305710@i39g2000cwa.googlegroups.com>,
> sam billabong <rfacco@email.it> wrote:

Looks like, as explained by carl, this won't work on modern versions of
sort - from "info sort":

On older systems, `sort' supports an obsolete origin-zero syntax
`+POS1 [-POS2]' for specifying sort keys. POSIX 1003.1-2001 (*note
Standards conformance: does not allow this; use `-k' instead.
[vbcol=seagreen]
>
> For better or worse, when using the old-time options to sort like +0.9
> -0.9 the column that you have labeled "9" is column number 8. Likewise,
> the column labeled "3" for 13 is column number 12. Numbers start from 0.
>
> Version 5.3 of sort eliminates this confusion by no longer allowing this
> +n.m column specification. Of course it adds to the confusion of those of
> us who painfully learned sort(1) syntax back in 6th Edition Unix days.
>
> carl


Looks like the behavior of version 5.2.1 (and, I'd bet, several versions
preceding 5.2.1) is the same as 5.3:

$ cat in
1234567890123456789
zzzzzz 5a 6781....
zzz zz 5z 7481....
z b a r 5b 1181....
$ sort -k 1.9,1.9 -k 1.13,1.13 in
z b a r 5b 1181....
zzz zz 5z 7481....
zzzzzz 5a 6781....
1234567890123456789
$ sort -t '' -k 1.9,1.9 -k 1.13,1.13 in
z b a r 5b 1181....
zzz zz 5z 7481....
zzzzzz 5a 6781....
1234567890123456789
$ sort --version
sort (coreutils) 5.2.1
Written by Mike Haertel and Paul Eggert.

Copyright (C) 2004 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


(Thanks to the hard work of Haertel and Eggert.)

--


*** Posted via a free Usenet account from http://www.teranews.com ***
Daniel P. Valentine

2006-05-18, 7:15 pm

In article <1147947044.377207.305710@i39g2000cwa.googlegroups.com>,
"sam billabong" <rfacco@email.it> wrote:

> Daniel P. Valentine wrote:
>
> Dan: thank you for your help !
> on my system I have the same output as you ... unfortunately
> I don't understand if and how it is sorted , for sure it is NOT sorted
> nor for the 9-th nor for the 13-th column ... or am i completely out to
> lunch ?
>
> the 9-th column in the output is '9' '5' '5' '5' (reverse order ?? !)
> and the 13-th is '3' '1' '4' '7'
>
> Thank you for your patience.
> ------------------------------------------
> ~ # sort --version
> sort (coreutils) 5.2.1
> Written by Mike Haertel and Paul Eggert.
>
> Copyright (C) 2004 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions. There is
> NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
> PURPOSE.


Yes, of course you are right. I pasted the wrong version of my sort
commands into the editor. As others have pointed out, this version of
the key definition is deprecated, and did not work when I was testing
it, either.

I have the results with the "-k" key definition still, which I should
have posted, and they are

% sort -t ";" -k 1.9,1.9 -k 1.13,1.13 testsort.txt
z b a r 5b 1181....
zzz zz 5z 7481....
zzzzzz 5a 6781....
1234567890123456789
% sort -t "\
" -k 1.9,1.9 -k 1.13,1.13 testsort.txt
z b a r 5b 1181....
zzz zz 5z 7481....
zzzzzz 5a 6781....
1234567890123456789

I'll have to proofread better next time.

--
dpv
Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com