Unix Shell - Sorting by basename of file

This is Interesting: Free IT Magazines  
Home > Archive > Unix Shell > October 2004 > Sorting by basename of file





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Sorting by basename of file
Vikas Agnihotri

2004-09-28, 3:32 am

I have a file containing 2 full pathnames per line like

/full/path/.../to/file /another/full/.../path/to/file

How can I sort this file by the "basename" of the second filename on the
line?

Thanks

[Sorry for the cross-posting, wasnt sure if there is a shell-only
solution or awk is more effective here]
Michael Tosch

2004-09-28, 3:32 am

In article <2rr2r9F1dgak2U1@uni-berlin.de>, Vikas Agnihotri <usenet@vikas.mailshell.com> writes:
> I have a file containing 2 full pathnames per line like
>
> /full/path/.../to/file /another/full/.../path/to/file
>
> How can I sort this file by the "basename" of the second filename on the
> line?
>
> Thanks
>
> [Sorry for the cross-posting, wasnt sure if there is a shell-only
> solution or awk is more effective here]



awk -F/ '{print $NF"/"$0}' file | sort | cut -d/ -f2-


--
Michael Tosch
IT Specialist
HP Managed Services
Technology Solutions Group
Hewlett-Packard GmbH
Phone: +49 2407 575 313
Mail: michael.tosch:hp.com


Kenny McCormack

2004-09-28, 3:32 am

In article <cj9mjd$gnb$1@aken.eed.ericsson.se>,
Michael Tosch <eedmit@NO.eed.SPAM.ericsson.PLS.se> wrote:
>In article <2rr2r9F1dgak2U1@uni-berlin.de>, Vikas Agnihotri <usenet@vikas.mailshell.com> writes:
>
>
>awk -F/ '{print $NF"/"$0}' file | sort | cut -d/ -f2-


$ WHINY_USERS=HAPPY_USERS gawk -F/ '{x[$NF]=$0} END{for (i in x) print x[i]}'

Dana French

2004-09-28, 5:56 pm

eedmit@NO.eed.SPAM.ericsson.PLS.se (Michael Tosch) wrote in message news:<cj9mjd$gnb$1@aken.eed.ericsson.se>...
> In article <2rr2r9F1dgak2U1@uni-berlin.de>, Vikas Agnihotri <usenet@vikas.mailshell.com> writes:
>
>
> awk -F/ '{print $NF"/"$0}' file | sort | cut -d/ -f2-



Excellent one-liner and it has been added to the King/Queen of the one-liners page:

http://www.mtxia.com/fancyIndex/Too.../oneliners.html

--------------------------------------------------------
Dana French dfrench@mtxia.com
Mt Xia Technical Consulting Group http://www.mtxia.com
100% Spam Free Email http://www.ridmail.com
MicroEmacs http://uemacs.tripod.com
Korn Shell Web http://dfrench.tripod.com/kshweb.html
Juhan Leemet

2004-09-29, 3:09 am

On Mon, 27 Sep 2004 18:36:29 +0000, Michael Tosch wrote:
> In article <2rr2r9F1dgak2U1@uni-berlin.de>, Vikas Agnihotri <usenet@vikas.mailshell.com> writes:
>
>
> awk -F/ '{print $NF"/"$0}' file | sort | cut -d/ -f2-


Picking a nit...

Hmm, my natural inclination would have been to output $NF" "$0, so that
the sort works on a blank terminated 1st field, instead of that field with
the 1st fullpath concatenated to it. Turns out that since (in ASCII) '/'
is lexically less than most allowable file name characters.... except for
the character '.', so...

Your solution IMO sorts filenames like "a" and "a." in the wrong order,
with "a." appearing first in the list, instead of later. Tiny nit, tho,
and I agree the form of your solution is beautiful. In any case we are
impacted also by the sort order of the locale, so be careful out there. I
would argue that presenting sort with fields that only contain the
basename of the 2nd filepath is "more correct", with corresponding
adjustment to the final cut command (i.e. use -d\ (escaped space)). Maybe
one should also restrict the sort to the first field (see more below?)?

or, alternatively, you could have specified '/' to be the field separator
for sort, and restrict the sort to the first field, by using "sort -t/
-k1,1" for that part in the pipeline. I'm now buried in my own nits, e.g.
I'm not sure why I have to restrict the field -k with the -t option?
Otherwise it does not seem to work the same as with a blank terminator.

Ah, I think it was my own data, which had same pathnames in it? but that
shows up another flaw in the original? isn't the 1st pathname used in the
sort? should it be, according to the "spec"? if not specified, I would
suggest the order should not be altered, i.e. for identical 2nd basename?
That just goes to show that nit picking can make anything a huge problem.
For most applications the original solution works "well enough" IMO.

BTW, to my chagrin I discovered that this example worked correctly on
Solaris, but on my SuSE 9.1 Linux it seemed to sort the filenames in a
funny order, with ' ' being put after [a-z], so "aaaa" came before "a"!
Huh? That went away when I explicitly forced LC_ALL=C for the sort
command. What on earth was sort using for locale in my standard SuSE 9.1
Linux installation? I do not have any locale specified in my environment,
so it must be compiled in? Off the top of my head I don't even know how to
check. You would think that gnu sort should report that with --version
switch? Even stronger warning to be careful out there! (locales... grumble!)

p.s. I can't think of a shell only solution (without using awk), since
there might be varying pathname depths and AFAIK sort does not have any
way to specify $NF. So, it appears that you need awk (or some such).

--
Juhan Leemet
Logicognosis, Inc.

Michael Tosch

2004-09-29, 8:09 pm

In article <pan.2004.09.29.01.10.13.488504@logicognosis.com>, Juhan Leemet <juhan@logicognosis.com> writes:
> On Mon, 27 Sep 2004 18:36:29 +0000, Michael Tosch wrote:
>
> Picking a nit...
>
> Hmm, my natural inclination would have been to output $NF" "$0, so that
> the sort works on a blank terminated 1st field, instead of that field with
> the 1st fullpath concatenated to it. Turns out that since (in ASCII) '/'
> is lexically less than most allowable file name characters.... except for
> the character '.', so...


The separator must be / because file names can contain any character
but /.

>
> Your solution IMO sorts filenames like "a" and "a." in the wrong order,
> with "a." appearing first in the list, instead of later. Tiny nit, tho,
> and I agree the form of your solution is beautiful. In any case we are
> impacted also by the sort order of the locale, so be careful out there. I
> would argue that presenting sort with fields that only contain the
> basename of the 2nd filepath is "more correct", with corresponding
> adjustment to the final cut command (i.e. use -d\ (escaped space)). Maybe
> one should also restrict the sort to the first field (see more below?)?
>
> or, alternatively, you could have specified '/' to be the field separator
> for sort, and restrict the sort to the first field, by using "sort -t/
> -k1,1" for that part in the pipeline. I'm now buried in my own nits, e.g.
> I'm not sure why I have to restrict the field -k with the -t option?
> Otherwise it does not seem to work the same as with a blank terminator.
>
> Ah, I think it was my own data, which had same pathnames in it? but that
> shows up another flaw in the original? isn't the 1st pathname used in the
> sort? should it be, according to the "spec"? if not specified, I would
> suggest the order should not be altered, i.e. for identical 2nd basename?
> That just goes to show that nit picking can make anything a huge problem.
> For most applications the original solution works "well enough" IMO.
>
> BTW, to my chagrin I discovered that this example worked correctly on
> Solaris, but on my SuSE 9.1 Linux it seemed to sort the filenames in a
> funny order, with ' ' being put after [a-z], so "aaaa" came before "a"!
> Huh? That went away when I explicitly forced LC_ALL=C for the sort
> command. What on earth was sort using for locale in my standard SuSE 9.1
> Linux installation? I do not have any locale specified in my environment,
> so it must be compiled in? Off the top of my head I don't even know how to
> check. You would think that gnu sort should report that with --version
> switch? Even stronger warning to be careful out there! (locales... grumble!)
>
> p.s. I can't think of a shell only solution (without using awk), since
> there might be varying pathname depths and AFAIK sort does not have any
> way to specify $NF. So, it appears that you need awk (or some such).
>


To honor your observations:

awk -F/ '{print $NF"/"$0}' file | sort -t / -k 1,1 | cut -d/ -f2-

is more correct, as it excludes the / separator from the sort-object.


--
Michael Tosch
IT Specialist
HP Managed Services
Technology Solutions Group
Hewlett-Packard GmbH
Phone: +49 2407 575 313
Mail: michael.tosch:hp.com


Juhan Leemet

2004-09-29, 8:09 pm

On Wed, 29 Sep 2004 07:46:04 +0000, Michael Tosch wrote:
[snippage]
> To honor your observations:
>
> awk -F/ '{print $NF"/"$0}' file | sort -t / -k 1,1 | cut -d/ -f2-
>
> is more correct, as it excludes the / separator from the sort-object.


and what's beyond the / separator as well! (like parts of 1st pathname?)

This form has the benefit of not changing the order of lines that happen
to have the same 2nd basename. Admittedly, the "spec" was not clear
(silent) on this issue. The generally accepted practice for sorting (right
back to Hollerith cards?) is to preserve the original order when the keys
are identical. Admittedly, the RDBMS guys toss out that idea, because for
them there is no intrinsic order in the bag of results, except what you
force on it, all else being "undefined". I prefer the sorting convention.

Don't mind my nit picking. I'm rusty at awk, tho I used to write a bit of
it. I'm just brushing up and exercising those flabby mental muscles.

Thanks for the interesting dialogue(s).

--
Juhan Leemet
Logicognosis, Inc.

Michael Tosch

2004-10-02, 9:12 pm

In article <2rr2r9F1dgak2U1@uni-berlin.de>, Vikas Agnihotri <usenet@vikas.mailshell.com> writes:
> I have a file containing 2 full pathnames per line like
>
> /full/path/.../to/file /another/full/.../path/to/file
>
> How can I sort this file by the "basename" of the second filename on the
> line?
>
> Thanks
>
> [Sorry for the cross-posting, wasnt sure if there is a shell-only
> solution or awk is more effective here]



awk -F/ '{print $NF"/"$0}' file | sort | cut -d/ -f2-


--
Michael Tosch
IT Specialist
HP Managed Services
Technology Solutions Group
Hewlett-Packard GmbH
Phone: +49 2407 575 313
Mail: michael.tosch:hp.com


Laurent Schneider

2004-10-02, 9:12 pm

sort -t/ -1

HTH
Laurent
Carl Lowenstein

2004-10-02, 9:12 pm

In article <fe0e47b1.0409300401.75fd6207@posting.google.com>,
Laurent Schneider <laurentschneider@yahoo.com> wrote:
>sort -t/ -1
>
>HTH


Neat idea, doesn't help much.

[cdl@delta tmp]$ sort -t/ -1 file.lst
sort: invalid option -- 1
Try `sort --help' for more information.
[cdl@delta tmp]$ sort --version
sort (coreutils) 5.2.1

Where did you get your "sort"?

carl
--
carl lowenstein marine physical lab u.c. san diego
clowenst@ucsd.edu
Brian Inglis

2004-10-02, 9:13 pm

On Fri, 1 Oct 2004 04:30:32 +0000 (UTC) in comp.lang.awk,
cdl@deeptow.ucsd.edu (Carl Lowenstein) wrote:

>In article <fe0e47b1.0409300401.75fd6207@posting.google.com>,
>Laurent Schneider <laurentschneider@yahoo.com> wrote:
[vbcol=seagreen]
>Neat idea, doesn't help much.
>
> [cdl@delta tmp]$ sort -t/ -1 file.lst
> sort: invalid option -- 1
> Try `sort --help' for more information.
> [cdl@delta tmp]$ sort --version
> sort (coreutils) 5.2.1
>
>Where did you get your "sort"?


Where did you get yours? ;^>

GNU sort is more helpful:
$ sort --version
sort (GNU textutils) 2.0
....

$ sort -t/ -1 file.lst
/usr/local/bin/sort: when using the old-style +POS and -POS key
specifiers, the +POS specifier must come first
Try `/usr/local/bin/sort --help' for more information.

The OP's sort may default to +0, he probably meant:

$ sort -t/ +0 -1 file.lst

--
Thanks. Take care, Brian Inglis Calgary, Alberta, Canada

Brian.Inglis@CSi.com (Brian[dot]Inglis{at}SystematicSW[dot]ab[dot]ca)
fake address use address above to reply
Paul Jarc

2004-10-02, 9:13 pm

Brian Inglis <Brian.Inglis@SystematicSW.Invalid> wrote:
> On Fri, 1 Oct 2004 04:30:32 +0000 (UTC) in comp.lang.awk,
> cdl@deeptow.ucsd.edu (Carl Lowenstein) wrote:
>
> Where did you get yours? ;^>
>
> GNU sort is more helpful:
> $ sort --version
> sort (GNU textutils) 2.0


Carl's sort *is* GNU sort, and the latest version. textutils,
fileutils, and sh-utils were merged into one package, coreutils.
5.2.1 is the latest version. This version, in many cases, conforms to
the latest POSIX/SUS standard by rejecting many -<digit> arguments.
(This is configurable at run time, and at compile time; the
compile-time default is taken from what version of POSIX the system's
libc claims to conform to.)


paul
Juhan Leemet

2004-10-02, 9:13 pm

On Wed, 29 Sep 2004 07:46:04 +0000, Michael Tosch wrote:
[snippage]
> To honor your observations:
>
> awk -F/ '{print $NF"/"$0}' file | sort -t / -k 1,1 | cut -d/ -f2-
>
> is more correct, as it excludes the / separator from the sort-object.


and what's beyond the / separator as well! (like parts of 1st pathname?)

This form has the benefit of not changing the order of lines that happen
to have the same 2nd basename. Admittedly, the "spec" was not clear
(silent) on this issue. The generally accepted practice for sorting (right
back to Hollerith cards?) is to preserve the original order when the keys
are identical. Admittedly, the RDBMS guys toss out that idea, because for
them there is no intrinsic order in the bag of results, except what you
force on it, all else being "undefined". I prefer the sorting convention.

Don't mind my nit picking. I'm rusty at awk, tho I used to write a bit of
it. I'm just brushing up and exercising those flabby mental muscles.

Thanks for the interesting dialogue(s).

--
Juhan Leemet
Logicognosis, Inc.

Laurent Schneider

2004-10-02, 9:13 pm

sort -t/ -1

HTH
Laurent
Brian Inglis

2004-10-04, 6:01 pm

On Fri, 1 Oct 2004 04:30:32 +0000 (UTC) in comp.lang.awk,
cdl@deeptow.ucsd.edu (Carl Lowenstein) wrote:

>In article <fe0e47b1.0409300401.75fd6207@posting.google.com>,
>Laurent Schneider <laurentschneider@yahoo.com> wrote:
[vbcol=seagreen]
>Neat idea, doesn't help much.
>
> [cdl@delta tmp]$ sort -t/ -1 file.lst
> sort: invalid option -- 1
> Try `sort --help' for more information.
> [cdl@delta tmp]$ sort --version
> sort (coreutils) 5.2.1
>
>Where did you get your "sort"?


Where did you get yours? ;^>

GNU sort is more helpful:
$ sort --version
sort (GNU textutils) 2.0
....

$ sort -t/ -1 file.lst
/usr/local/bin/sort: when using the old-style +POS and -POS key
specifiers, the +POS specifier must come first
Try `/usr/local/bin/sort --help' for more information.

The OP's sort may default to +0, he probably meant:

$ sort -t/ +0 -1 file.lst

--
Thanks. Take care, Brian Inglis Calgary, Alberta, Canada

Brian.Inglis@CSi.com (Brian[dot]Inglis{at}SystematicSW[dot]ab[dot]ca)
fake address use address above to reply
Laurent Schneider

2004-10-04, 6:01 pm

> >> [cdl@delta tmp]$ sort -t/ -1 file.lst[vbcol=seagreen]

Sorry about this, I have *only* aix here and did not know the -n
option (nth field starting from the right) was aix specific.

Regards
Laurent
Jürgen Kahrs

2004-10-04, 6:01 pm

Laurent Schneider wrote:

> Sorry about this, I have *only* aix here and did not know the -n
> option (nth field starting from the right) was aix specific.


-n is _not_ AIX specific:

http://www.opengroup.org/onlinepubs...9/xcu/sort.html
Stepan Kasal

2004-10-04, 6:01 pm

Hi all,

In article <2sd4ohF1k6jspU1@uni-berlin.de>, Jürgen Kahrs wrote:
> Laurent Schneider wrote:
>
>
> -n is _not_ AIX specific:
>
> http://www.opengroup.org/onlinepubs...9/xcu/sort.html


Laurent meant that the option -<num>, ie. -1, -7 or -42 is aix specific.

Hope this clears it.

Stepan
Paul Jarc

2004-10-04, 6:01 pm

Stepan Kasal <kasal@ucw.cz> wrote:
> Laurent meant that the option -<num>, ie. -1, -7 or -42 is aix specific.


Actually, that isn't AIX-specific either. Brian Inglis's sort from
GNU textutils 2.0 also supports it. The latest POSIX/SUS standard
says not to support such arguments, though, so the latest GNU
coreutils follows that, when configured that way.


paul
Brian Inglis

2004-10-04, 6:01 pm

On Mon, 4 Oct 2004 15:10:36 +0000 (UTC) in comp.lang.awk, Stepan Kasal
<kasal@ucw.cz> wrote:

>Hi all,
>
>In article <2sd4ohF1k6jspU1@uni-berlin.de>, Jürgen Kahrs wrote:

That is not what it means.
[vbcol=seagreen]
>
>Laurent meant that the option -<num>, ie. -1, -7 or -42 is aix specific.
>
>Hope this clears it.


It is *not* AIX specific, it is just the same as in the POSIX spec,
but the start field defaults to the start of the line, see:
http://publib.boulder.ibm.com/infoc...xcmds5/sort.htm

--
Thanks. Take care, Brian Inglis Calgary, Alberta, Canada

Brian.Inglis@CSi.com (Brian[dot]Inglis{at}SystematicSW[dot]ab[dot]ca)
fake address use address above to reply
Michael Tosch

2004-10-05, 5:58 pm

In article <7h63m05pv794qtovnfr6lno8aofipr4dss@4ax.com>, Brian Inglis <Brian.Inglis@SystematicSW.Invalid> writes:
> On Mon, 4 Oct 2004 15:10:36 +0000 (UTC) in comp.lang.awk, Stepan Kasal
> <kasal@ucw.cz> wrote:
>
>
> That is not what it means.
>
>
> It is *not* AIX specific, it is just the same as in the POSIX spec,
> but the start field defaults to the start of the line, see:
> http://publib.boulder.ibm.com/infoc...xcmds5/sort.htm
>


Hmm, why does this list +pos1 and -pos2 in a row?

Solaris /usr/xpg4/bin/sort claims to be Posix compliant,
but demands a +pos before a -pos argument.
sort -t/ -1
yields a syntax error and must be changed to
sort -t/ +0 -1

So maybe only the latter is Posix compliant,
and omitting +0 is AIX specific?

--
Michael Tosch
IT Specialist
HP Managed Services
Technology Solutions Group
Hewlett-Packard GmbH
Phone: +49 2407 575 313
Mail: michael.tosch:hp.com


Laurent Schneider

2004-10-06, 7:50 am

Stepan Kasal <kasal@ucw.cz> wrote in message news:<slrncm2pvt.ddo.kasal@matsrv.math.cas.cz>...
> Hi all,
>
> In article <2sd4ohF1k6jspU1@uni-berlin.de>, Jürgen Kahrs wrote:
>
>
> Laurent meant that the option -<num>, ie. -1, -7 or -42 is aix specific.
>
> Hope this clears it.
>
> Stepan


in the link above, I can read [+pos1[-pos2]] when in aix manpage I
have [[+Position1][-Position2]]

However, I apologize because sort -t/ -1 file.lst did work by me only
by hasard... I made more tests and -1 is not the rightmost field, not
even in aix :-(
Laurent Schneider

2004-10-07, 2:56 am

> >> [cdl@delta tmp]$ sort -t/ -1 file.lst[vbcol=seagreen]

Sorry about this, I have *only* aix here and did not know the -n
option (nth field starting from the right) was aix specific.

Regards
Laurent
Stepan Kasal

2004-10-07, 2:56 am

Hi all,

In article <2sd4ohF1k6jspU1@uni-berlin.de>, Jürgen Kahrs wrote:
> Laurent Schneider wrote:
>
>
> -n is _not_ AIX specific:
>
> http://www.opengroup.org/onlinepubs...9/xcu/sort.html


Laurent meant that the option -<num>, ie. -1, -7 or -42 is aix specific.

Hope this clears it.

Stepan
Geoff Clare

2004-10-08, 7:49 am

"Michael Tosch" <eedmit@no.eed.spam.ericsson.pls.se> wrote, on Tue, 05 Oct 2004:

> Solaris /usr/xpg4/bin/sort claims to be Posix compliant,
> but demands a +pos before a -pos argument.
> sort -t/ -1
> yields a syntax error and must be changed to
> sort -t/ +0 -1
>
> So maybe only the latter is Posix compliant,


The latter *was* POSIX compliant between 1992 and 2001 but it was marked
as obsolescent in the 1992 POSIX.2 standard. Neither is POSIX compliant
as of 2001.

The POSIX equivalent of the old "+0 -1" is "-k 1,1".

--
Geoff Clare <nospam@gclare.org.uk>
Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com