Unix Programming - Sorting and then removing sort-by cols from a fixed-width flat file

This is Interesting: Free IT Magazines  
Home > Archive > Unix Programming > June 2006 > Sorting and then removing sort-by cols from a fixed-width flat file





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Sorting and then removing sort-by cols from a fixed-width flat file
aditya.chaudhary@gmail.com

2006-06-29, 1:23 am

Hi All,

I just joined this group to find solutions for my requirement. Maybe
this type of question has already been answered but when I tried
searching the posts I couldn't get proper solution from them. So I
would highly appreciate if somebody can again help me out with this.

Basically I would be merging 3 flat files and wanted to sort and group
it's records so that they can be transmitted in a specified format. But
the problem is the group by col is situated at different position in
each of the 3 flat files. So I thought of adding 2 common cols in the
beginning of each of the 3 files and then sort the records using the
same 2 cols so that they can be grouped and then remove those cols
after sorting operation as they are not required to be present in the
final file. I would be creating a shell script to first merge the 3
files, then sorting them and then removing the cols.

The 2 cols are Request_Id (string of rec length 15, i.e. position 1-15
in file) and Serial_Number (string of rec length 5, i.e. position 16-20
in file).

So I need to know:
1) how to write the 'sort' command of Unix so that I can use both these
cols for sorting the recs of the file.
2) the remove command or process so that I can drop/remove these 2 cols
after all the recs have been sorted out.

Do let me know if you have any doubts or queries regarding this.

Thanks and regards,
Adi

Jim Cochrane

2006-06-29, 7:37 am

On 2006-06-29, aditya.chaudhary@gmail.com <aditya.chaudhary@gmail.com> wrote:
> Hi All,
>
> I just joined this group to find solutions for my requirement. Maybe
> this type of question has already been answered but when I tried
> searching the posts I couldn't get proper solution from them. So I
> would highly appreciate if somebody can again help me out with this.
>
> Basically I would be merging 3 flat files and wanted to sort and group
> it's records so that they can be transmitted in a specified format. But
> the problem is the group by col is situated at different position in
> each of the 3 flat files. So I thought of adding 2 common cols in the
> beginning of each of the 3 files and then sort the records using the
> same 2 cols so that they can be grouped and then remove those cols
> after sorting operation as they are not required to be present in the
> final file. I would be creating a shell script to first merge the 3
> files, then sorting them and then removing the cols.
>
> The 2 cols are Request_Id (string of rec length 15, i.e. position 1-15
> in file) and Serial_Number (string of rec length 5, i.e. position 16-20
> in file).


No solution here (no time for it, and I think more specs. are needed), but
here are some tools, besides sort, that may be useful re. your problem:

cut
paste
sed
uniq
perl
awk

It's up to you, of course, to figure out which to use and how to use them
(by reading the man pages). (Hint: probably obvious, but you probably
wonldn't use both PERL and awk.)

Good luck.

>
> So I need to know:
> 1) how to write the 'sort' command of Unix so that I can use both these
> cols for sorting the recs of the file.
> 2) the remove command or process so that I can drop/remove these 2 cols
> after all the recs have been sorted out.
>
> Do let me know if you have any doubts or queries regarding this.
>
> Thanks and regards,
> Adi
>



--

aditya.chaudhary@gmail.com

2006-06-29, 7:37 am

Hey Logan,

Thanks for your inputs. I have some doubts and concerns rgding same. I
have answer your questions (comments) below:

Logan Shaw wrote:
> aditya.chaudhary@gmail.com wrote:
>
> Transmitted? I thought you were just merging them.


I will first merge them, then do the sorting, then remove the cols
which were added just for making sorting easy....finally I have to ftp
this file.
>
>
> Does the "group by col" mean the column which contains the sort key?


No. 'Group by col' means that there is a col exisiting in the records
which can be used for sorting the data alongwith Record_Type, but the
issue is its existing at different positions in each of 3 flat files'
records. If for each file it had been exisiting from say position 15-18
then I could have used it for sorting.

>
>
> That's one way to do it. The other way to do it is to do all that
> work in your comparison function, so that the information it is never
> added to any files but is temporarily created only when you are
> comparing two elements.
>

I don't get what you meant by this. Maybe because I'm not familiar with
much shell scripting techniques.

>
> Merging has a specific meaning when you are talking about sorting.
> I believe what you are saying is that you will first convert all
> three files into a common format, then sort them, then remove
> the extra columns.
>

Yes. Common format is already defined - it has to be fixed-width file.
So the 3 files have to be merged and then sorted so that data appears
in some 'order by' fashion when it goes to the user.

>
> Sorting by character column numbers is generally not the easiest thing
> in Unix, at least if you are using the "sort" command. The "sort"
> command expects a field separator character rather than using fixed-
> length fields. There may be a way to "trick" it into using a fixed
> range of columns, but it's much easier to use some character, like
> ":" as a separator. Then you can do
>
> sort -t: +0 +1
>
> in order to sort by the 1st and 2nd colon-delimited fields. Of course
> you can use any character instead of ":" as long as that character
> does not occur within your sort keys.
>

I cannot use a delimiter for just 2 new cols. The file is in
fixed-width format and I have to format such a file. So kindly suggest
the sort syntax for fixed width.

>
> Removing the first 20 characters from every line of a file is easy.
> It's just as easy as this:
>
> sed -e 's/.\{20\}//'
>
> That matches the pattern
>
> .\{20\}
>
> which is 20 of any character ("." stands for any character) and
> replaces the pattern it matches with the empty string.


Thanks for this solution. I would try it once and let u know if it
works fine or not.

>
> However, above I recommended that you use a delimiter instead of
> fixed-width fields. In that case, to remove the first two
> colon-delimited fields, you would instead want to use the cut
> command:
>
> cut -d: -f3-
>
> That prints field 3 and following ("3-") with ":" as the delimiter.
> Note that this is 1-based indexing, whereas the sort command (at
> least in the syntax I gave -- it accepts more than one syntax)
> uses 0-based indexing.
>
> - Logan


Jim Cochrane

2006-06-29, 7:21 pm

On 2006-06-29, aditya.chaudhary@gmail.com <aditya.chaudhary@gmail.com> wrote:
> Hey Logan,
>
> Thanks for your inputs. I have some doubts and concerns rgding same. I
> have answer your questions (comments) below:
>
> ...
> I cannot use a delimiter for just 2 new cols. The file is in
> fixed-width format and I have to format such a file. So kindly suggest
> the sort syntax for fixed width.


I believe the old UNIX sort cannot do this, but the FSF sort can (with the
-k option) - check your man page. (If you're using a modern Linux, you
should be OK.)
Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com