 |
|
 |
|
|
 |
Sorting by basename of file |
 |
 |
|
|
09-28-04 08:32 AM
I have a file containing 2 full pathnames per line like
/full/path/.../to/file /another/full/.../path/to/file
How can I sort this file by the "basename" of the second filename on the
line?
Thanks
[Sorry for the cross-posting, wasnt sure if there is a shell-only
solution or awk is more effective here]
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: Sorting by basename of file |
 |
 |
|
|
09-28-04 08:32 AM
In article <2rr2r9F1dgak2U1@uni-berlin.de>, Vikas Agnihotri <usenet@vikas.mailshell.com> wri
tes:
> I have a file containing 2 full pathnames per line like
>
> /full/path/.../to/file /another/full/.../path/to/file
>
> How can I sort this file by the "basename" of the second filename on the
> line?
>
> Thanks
>
> [Sorry for the cross-posting, wasnt sure if there is a shell-only
> solution or awk is more effective here]
awk -F/ '{print $NF"/"$0}' file | sort | cut -d/ -f2-
--
Michael Tosch
IT Specialist
HP Managed Services
Technology Solutions Group
Hewlett-Packard GmbH
Phone: +49 2407 575 313
Mail: michael.tosch:hp.com
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: Sorting by basename of file |
 |
 |
|
|
09-28-04 08:32 AM
In article <cj9mjd$gnb$1@aken.eed.ericsson.se>,
Michael Tosch <eedmit@NO.eed.SPAM.ericsson.PLS.se> wrote:
>In article <2rr2r9F1dgak2U1@uni-berlin.de>, Vikas Agnihotri <usenet@vikas.m
ailshell.com> writes:
>
>
>awk -F/ '{print $NF"/"$0}' file | sort | cut -d/ -f2-
$ WHINY_USERS=HAPPY_USERS gawk -F/ '{x[$NF]=$0} END{for (i in
x) print x[i]}'
[ Post a follow-up to this message ]
|
|
|
 |
|
|
|
 |
Re: Sorting by basename of file |
 |
 |
|
|
 |
|
 |
|
|
 |
Re: Sorting by basename of file |
 |
 |
|
|
09-29-04 08:09 AM
On Mon, 27 Sep 2004 18:36:29 +0000, Michael Tosch wrote:
> In article <2rr2r9F1dgak2U1@uni-berlin.de>, Vikas Agnihotri <usenet@vikas.
mailshell.com> writes:
>
>
> awk -F/ '{print $NF"/"$0}' file | sort | cut -d/ -f2-
Picking a nit...
Hmm, my natural inclination would have been to output $NF" "$0, so that
the sort works on a blank terminated 1st field, instead of that field with
the 1st fullpath concatenated to it. Turns out that since (in ASCII) '/'
is lexically less than most allowable file name characters.... except for
the character '.', so...
Your solution IMO sorts filenames like "a" and "a." in the wrong order,
with "a." appearing first in the list, instead of later. Tiny nit, tho,
and I agree the form of your solution is beautiful. In any case we are
impacted also by the sort order of the locale, so be careful out there. I
would argue that presenting sort with fields that only contain the
basename of the 2nd filepath is "more correct", with corresponding
adjustment to the final cut command (i.e. use -d\ (escaped space)). Maybe
one should also restrict the sort to the first field (see more below?)?
or, alternatively, you could have specified '/' to be the field separator
for sort, and restrict the sort to the first field, by using "sort -t/
-k1,1" for that part in the pipeline. I'm now buried in my own nits, e.g.
I'm not sure why I have to restrict the field -k with the -t option?
Otherwise it does not seem to work the same as with a blank terminator.
Ah, I think it was my own data, which had same pathnames in it? but that
shows up another flaw in the original? isn't the 1st pathname used in the
sort? should it be, according to the "spec"? if not specified, I would
suggest the order should not be altered, i.e. for identical 2nd basename?
That just goes to show that nit picking can make anything a huge problem.
For most applications the original solution works "well enough" IMO.
BTW, to my chagrin I discovered that this example worked correctly on
Solaris, but on my SuSE 9.1 Linux it seemed to sort the filenames in a
funny order, with ' ' being put after [a-z], so "aaaa" came before "a"!
Huh? That went away when I explicitly forced LC_ALL=C for the sort
command. What on earth was sort using for locale in my standard SuSE 9.1
Linux installation? I do not have any locale specified in my environment,
so it must be compiled in? Off the top of my head I don't even know how to
check. You would think that gnu sort should report that with --version
switch? Even stronger warning to be careful out there! (locales... grumble!)
p.s. I can't think of a shell only solution (without using awk), since
there might be varying pathname depths and AFAIK sort does not have any
way to specify $NF. So, it appears that you need awk (or some such).
--
Juhan Leemet
Logicognosis, Inc.
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: Sorting by basename of file |
 |
 |
|
|
09-30-04 01:09 AM
In article <pan.2004.09.29.01.10.13.488504@logicognosis.com>, Juhan Leemet <juhan@logicognos
is.com> writes:
> On Mon, 27 Sep 2004 18:36:29 +0000, Michael Tosch wrote:
>
> Picking a nit...
>
> Hmm, my natural inclination would have been to output $NF" "$0, so that
> the sort works on a blank terminated 1st field, instead of that field with
> the 1st fullpath concatenated to it. Turns out that since (in ASCII) '/'
> is lexically less than most allowable file name characters.... except for
> the character '.', so...
The separator must be / because file names can contain any character
but /.
>
> Your solution IMO sorts filenames like "a" and "a." in the wrong order,
> with "a." appearing first in the list, instead of later. Tiny nit, tho,
> and I agree the form of your solution is beautiful. In any case we are
> impacted also by the sort order of the locale, so be careful out there. I
> would argue that presenting sort with fields that only contain the
> basename of the 2nd filepath is "more correct", with corresponding
> adjustment to the final cut command (i.e. use -d\ (escaped space)). Maybe
> one should also restrict the sort to the first field (see more below?)?
>
> or, alternatively, you could have specified '/' to be the field separator
> for sort, and restrict the sort to the first field, by using "sort -t/
> -k1,1" for that part in the pipeline. I'm now buried in my own nits, e.g.
> I'm not sure why I have to restrict the field -k with the -t option?
> Otherwise it does not seem to work the same as with a blank terminator.
>
> Ah, I think it was my own data, which had same pathnames in it? but that
> shows up another flaw in the original? isn't the 1st pathname used in the
> sort? should it be, according to the "spec"? if not specified, I would
> suggest the order should not be altered, i.e. for identical 2nd basename?
> That just goes to show that nit picking can make anything a huge problem.
> For most applications the original solution works "well enough" IMO.
>
> BTW, to my chagrin I discovered that this example worked correctly on
> Solaris, but on my SuSE 9.1 Linux it seemed to sort the filenames in a
> funny order, with ' ' being put after [a-z], so "aaaa" came before "a"
!
> Huh? That went away when I explicitly forced LC_ALL=C for the sort
> command. What on earth was sort using for locale in my standard SuSE 9.1
> Linux installation? I do not have any locale specified in my environment,
> so it must be compiled in? Off the top of my head I don't even know how to
> check. You would think that gnu sort should report that with --version
> switch? Even stronger warning to be careful out there! (locales... grumble
!)
>
> p.s. I can't think of a shell only solution (without using awk), since
> there might be varying pathname depths and AFAIK sort does not have any
> way to specify $NF. So, it appears that you need awk (or some such).
>
To honor your observations:
awk -F/ '{print $NF"/"$0}' file | sort -t / -k 1,1 | cut -d/ -f2-
is more correct, as it excludes the / separator from the sort-object.
--
Michael Tosch
IT Specialist
HP Managed Services
Technology Solutions Group
Hewlett-Packard GmbH
Phone: +49 2407 575 313
Mail: michael.tosch:hp.com
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: Sorting by basename of file |
 |
 |
|
|
09-30-04 01:09 AM
On Wed, 29 Sep 2004 07:46:04 +0000, Michael Tosch wrote:
[snippage]
> To honor your observations:
>
> awk -F/ '{print $NF"/"$0}' file | sort -t / -k 1,1 | cut -d/ -f2-
>
> is more correct, as it excludes the / separator from the sort-object.
and what's beyond the / separator as well! (like parts of 1st pathname?)
This form has the benefit of not changing the order of lines that happen
to have the same 2nd basename. Admittedly, the "spec" was not clear
(silent) on this issue. The generally accepted practice for sorting (right
back to Hollerith cards?) is to preserve the original order when the keys
are identical. Admittedly, the RDBMS guys toss out that idea, because for
them there is no intrinsic order in the bag of results, except what you
force on it, all else being "undefined". I prefer the sorting convention.
Don't mind my nit picking. I'm rusty at awk, tho I used to write a bit of
it. I'm just brushing up and exercising those flabby mental muscles.
Thanks for the interesting dialogue(s).
--
Juhan Leemet
Logicognosis, Inc.
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: Sorting by basename of file |
 |
 |
|
|
10-03-04 02:12 AM
In article <2rr2r9F1dgak2U1@uni-berlin.de>, Vikas Agnihotri <usenet@vikas.mailshell.com> wri
tes:
> I have a file containing 2 full pathnames per line like
>
> /full/path/.../to/file /another/full/.../path/to/file
>
> How can I sort this file by the "basename" of the second filename on the
> line?
>
> Thanks
>
> [Sorry for the cross-posting, wasnt sure if there is a shell-only
> solution or awk is more effective here]
awk -F/ '{print $NF"/"$0}' file | sort | cut -d/ -f2-
--
Michael Tosch
IT Specialist
HP Managed Services
Technology Solutions Group
Hewlett-Packard GmbH
Phone: +49 2407 575 313
Mail: michael.tosch:hp.com
[ Post a follow-up to this message ]
|
|
|
 |
|
|
|
 |
Re: Sorting by basename of file |
 |
 |
|
|
|
|
 |
Re: Sorting by basename of file |
 |
 |
|
|
10-03-04 02:12 AM
In article <fe0e47b1.0409300401.75fd6207@posting.google.com>,
Laurent Schneider <laurentschneider@yahoo.com> wrote:
>sort -t/ -1
>
>HTH
Neat idea, doesn't help much.
[cdl@delta tmp]$ sort -t/ -1 file.lst
sort: invalid option -- 1
Try `sort --help' for more information.
[cdl@delta tmp]$ sort --version
sort (coreutils) 5.2.1
Where did you get your "sort"?
carl
--
carl lowenstein marine physical lab u.c. san diego
clowenst@ucsd.edu
[ Post a follow-up to this message ]
|
|
|
 |
|
|
|
|
Sponsored Links |
 |
 |
|
|
 |
All times are GMT. The time now is 12:03 PM. |
 |
|
|
 |
|
 |
|
|
 |
|
Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
|
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
|
|
|
|
Medical and Health forum | Computer Games Reviews | Graphics design forum
|
 |
|
 |
|