Unix Shell - AIX Script to Summarize By First Column By Adding Values in Numerical Columns

This is Interesting: Free IT Magazines  
Home > Archive > Unix Shell > October 2004 > AIX Script to Summarize By First Column By Adding Values in Numerical Columns





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author AIX Script to Summarize By First Column By Adding Values in Numerical Columns
SAP BASIS Consultant

2004-09-29, 8:09 pm

Hello,

I would greatly appreciate help for the following: (I am using ksh on
AIX).

I have a file which looks in style as follows (It is the output of ps
aux, though that it not important):

(The COLs are not part of the file..For description)

COL1 COL2 COL3 COL4 COL5 COL6
name1 xxx xxx xxx 1000 11
name1 xxx xxx xxx 50 40
name2 xxx xxx xxx 30 5
name1 xxx xxx xxx 50 10
name2 xxx xxx xxx 100 4


I would like to summarize the data by the first column, ignoring
columns #2, #3 and #4, and adding the values in col#5 and #6.
The output would be as follows: (For name1, 1100=1000+50+50,
etc..)

name1 1100 61
name2 130 9


Any ideas would be greatly appreciated.



Thanks,
BASIS Consultant
William Park

2004-09-29, 8:09 pm

SAP BASIS Consultant <basis_consultant@hotmail.com> wrote:
> Hello,
>
> I would greatly appreciate help for the following: (I am using ksh on
> AIX).
>
> I have a file which looks in style as follows (It is the output of ps
> aux, though that it not important):
>
> (The COLs are not part of the file..For description)
>
> COL1 COL2 COL3 COL4 COL5 COL6
> name1 xxx xxx xxx 1000 11
> name1 xxx xxx xxx 50 40
> name2 xxx xxx xxx 30 5
> name1 xxx xxx xxx 50 10
> name2 xxx xxx xxx 100 4
>
>
> I would like to summarize the data by the first column, ignoring
> columns #2, #3 and #4, and adding the values in col#5 and #6.
> The output would be as follows: (For name1, 1100=1000+50+50,
> etc..)
>
> name1 1100 61
> name2 130 9
>
>
> Any ideas would be greatly appreciated.


man awk
man ksh (associative array)

--
William Park <opengeometry@yahoo.ca>
Open Geometry Consulting, Toronto, Canada
Seb

2004-09-29, 8:09 pm

On 2004-09-29, SAP BASIS Consultant <basis_consultant@hotmail.com> wrote:
> COL1 COL2 COL3 COL4 COL5 COL6
> name1 xxx xxx xxx 1000 11
> name1 xxx xxx xxx 50 40
> name2 xxx xxx xxx 30 5
> name1 xxx xxx xxx 50 10
> name2 xxx xxx xxx 100 4
>
>
> I would like to summarize the data by the first column, ignoring
> columns #2, #3 and #4, and adding the values in col#5 and #6.
> The output would be as follows: (For name1, 1100=1000+50+50,
> etc..)
>
> name1 1100 61
> name2 130 9


sort -u foo | awk '{print $1}' | while read name ; do # find unique names
awk -v name=$name '$0 ~ name {s5+=$5 ; s6+=$6} END {print name, s5, s6}' foo
done

There is most definitely a pure-awk solution, but I haven't figured it
out yet

--Seb
Seb

2004-09-29, 8:09 pm

On 2004-09-29, Seb <seb@foo.com> wrote:
> sort -u foo | awk '{print $1}' | while read name ; do # find unique names
> awk -v name=$name '$0 ~ name {s5+=$5 ; s6+=$6} END {print name, s5, s6}' foo
> done


hrm, sorry:

awk '{print $1}' foo | sort -u | while read name ; do # find unique names
awk -v name=$name '$0 ~ name {s5+=$5 ; s6+=$6} END {print name, s5, s6}' foo
done

--Seb
Chris F.A. Johnson

2004-09-29, 8:09 pm

On 2004-09-29, SAP BASIS Consultant wrote:
> Hello,
>
> I would greatly appreciate help for the following: (I am using ksh on
> AIX).
>
> I have a file which looks in style as follows (It is the output of ps
> aux, though that it not important):
>
> (The COLs are not part of the file..For description)
>
> COL1 COL2 COL3 COL4 COL5 COL6
> name1 xxx xxx xxx 1000 11
> name1 xxx xxx xxx 50 40
> name2 xxx xxx xxx 30 5
> name1 xxx xxx xxx 50 10
> name2 xxx xxx xxx 100 4
>
>
> I would like to summarize the data by the first column, ignoring
> columns #2, #3 and #4, and adding the values in col#5 and #6.
> The output would be as follows: (For name1, 1100=1000+50+50,
> etc..)
>
> name1 1100 61
> name2 130 9


It's a one-liner:

awk '{x[$1]+=$5;y[$1]+=$6}END{for(n in x) printf "%s\t%d\t%d\n",n,x[n],y[n]}'

Which, presented more sanely, is:

awk '{ x[$1] += $5
y[$1] += $6 }
END { for ( n in x )
printf "%s\t%s\t%s\n", n, x[n], y[n]
}'


--
Chris F.A. Johnson http://cfaj.freeshell.org/shell
========================================
===========================
My code (if any) in this post is copyright 2004, Chris F.A. Johnson
and may be copied under the terms of the GNU General Public License
Dana French

2004-10-02, 9:12 pm

"Chris F.A. Johnson" <cfajohnson@gmail.com> wrote in message news:<2s0kqqF1fkkavU1@uni-berlin.de>...
> On 2004-09-29, SAP BASIS Consultant wrote:
>
> It's a one-liner:
>
> awk '{x[$1]+=$5;y[$1]+=$6}END{for(n in x) printf "%s\t%d\t%d\n",n,x[n],y[n]}'
>
> Which, presented more sanely, is:
>
> awk '{ x[$1] += $5
> y[$1] += $6 }
> END { for ( n in x )
> printf "%s\t%s\t%s\n", n, x[n], y[n]
> }'



Another execellent one-liner, I'll add it to the one-liners page:

http://www.mtxia.com/fancyIndex/Too.../oneliners.html


The following is same concept presented by Chris, only in ksh93:

typeset -A x y
while read -- VAL1 VAL2 VAL3 VAL4 VAL5 VAL6
do
x[${VAL1}]=$(( ${x[${VAL1}]:=0} + ${VAL5} ))
y[${VAL1}]=$(( ${y[${VAL1}]:=0} + ${VAL6} ))
done < data.in
for n in "${!x[@]}"
do
print "${n}\t${x[${n}]}\t${y[${n}]}"
done


--------------------------------------------------------
Dana French dfrench@mtxia.com
Mt Xia Technical Consulting Group http://www.mtxia.com
100% Spam Free Email http://www.ridmail.com
MicroEmacs http://uemacs.tripod.com
Korn Shell Web http://dfrench.tripod.com/kshweb.html
Dana French

2004-10-03, 2:48 am

"Chris F.A. Johnson" <cfajohnson@gmail.com> wrote in message news:<2s0kqqF1fkkavU1@uni-berlin.de>...
> On 2004-09-29, SAP BASIS Consultant wrote:
>
> It's a one-liner:
>
> awk '{x[$1]+=$5;y[$1]+=$6}END{for(n in x) printf "%s\t%d\t%d\n",n,x[n],y[n]}'
>
> Which, presented more sanely, is:
>
> awk '{ x[$1] += $5
> y[$1] += $6 }
> END { for ( n in x )
> printf "%s\t%s\t%s\n", n, x[n], y[n]
> }'



Another execellent one-liner, I'll add it to the one-liners page:

http://www.mtxia.com/fancyIndex/Too.../oneliners.html


The following is same concept presented by Chris, only in ksh93:

typeset -A x y
while read -- VAL1 VAL2 VAL3 VAL4 VAL5 VAL6
do
x[${VAL1}]=$(( ${x[${VAL1}]:=0} + ${VAL5} ))
y[${VAL1}]=$(( ${y[${VAL1}]:=0} + ${VAL6} ))
done < data.in
for n in "${!x[@]}"
do
print "${n}\t${x[${n}]}\t${y[${n}]}"
done


--------------------------------------------------------
Dana French dfrench@mtxia.com
Mt Xia Technical Consulting Group http://www.mtxia.com
100% Spam Free Email http://www.ridmail.com
MicroEmacs http://uemacs.tripod.com
Korn Shell Web http://dfrench.tripod.com/kshweb.html
SAP BASIS Consultant

2004-10-04, 6:01 pm

Seb <seb@foo.com> wrote in message news:<XND6d.21508$QJ3.5548@newssvr21.news.prodigy.com>...
> On 2004-09-29, Seb <seb@foo.com> wrote:
>
> hrm, sorry:
>
> awk '{print $1}' foo | sort -u | while read name ; do # find unique names
> awk -v name=$name '$0 ~ name {s5+=$5 ; s6+=$6} END {print name, s5, s6}' foo
> done
>
> --Seb



Thanks to Seb and all others who took the time to respond.
I used Seb's solution, and it works well.



SAP BASIS Consultant
Seb

2004-10-04, 6:01 pm

On 2004-10-04, SAP BASIS Consultant <basis_consultant@hotmail.com> wrote:
>
>
> Thanks to Seb and all others who took the time to respond.
> I used Seb's solution, and it works well.


Hehe, except that my solution is ugly, while Chris's one is nicer and
shorter...

--Seb
Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com