Unix Shell - Awk Array Question

This is Interesting: Free IT Magazines  
Home > Archive > Unix Shell > August 2007 > Awk Array Question





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Awk Array Question
AyOut

2007-08-30, 1:20 pm

I have a shell script that invokes an awk command on a file of the
following format:
(Date, Time)
2007-01-01, 00:00:00,121
2007-01-01, 00:00:00,311
2007-01-01, 00:00:00,432
....
....
2007-01-01, 00:01:10,778
2007-01-01, 00:01:10,981
2007-01-01, 00:01:11,121
....
....

The script I have basically parses the file and generates a comma
separated output of
Date, Time, Count
2007-01-01, 00:00:00, 3
2007-01-01, 00:01:10, 2
2007-01-01, 00:01:11, 1

The command looks like this:

awk '{++hr[$1,substr($2,1,9)]}END{for(item in hr)printf("%s%s\n",item,
hr[item])}' ${logfile} > ${logfile}.csv

At the end of the process I have an output of 86400 lines, too big for
Excel. I want to truncate the output to only output the max value in
any given minute and thus reduce the output to 1440 lines.

I have the following script that I'm struggling with...

#!/usr/bin/ksh

awk 'BEGIN{
for (i=0;i<23;i++) {
for (j=0;j<=59;j++) {
for (k=0;k<=59;k++) {
p=sprintf("%02d:%02d:%02d", i, j, k);
hr[p] = 0;
}
}
}
}
{
++hr[$1,substr($2,1,9)]
}END{
for (i=0;i<23;i++) {
for (j=0;j<=59;j++) {
h=sprintf("%02d:%02d", i, j);
hh[h] = 0;
for (k=0;k<=59;k++) {
p=sprintf("%02d:%02d:%02d", i, j, k);
if(k=0)
hh[h]=hr[p];
else{
pp=sprintf("%02d:%02d:%02d", i, j, k-1);
print hr[pp];
if(hh[pp]>hh[h])
hh[h]=hr[pp];
}
}
}
}
for(item in hr){
printf("%s%s\n",item, hr[item]);
}
}' $1


Can anyone tell me what I'm doing wrong?

I want the following output:

2007-01-01, 00:00, 4
2007-01-01, 00:01, 5
2007-01-01, 00:02, 4
....
....

Thanks

Cyrus Kriticos

2007-08-30, 1:20 pm

AyOut wrote:
> I have a shell script that invokes an awk command on a file of the
> following format:
> (Date, Time)
> 2007-01-01, 00:00:00,121
> 2007-01-01, 00:00:00,311
> 2007-01-01, 00:00:00,432
> ...
> ...
> 2007-01-01, 00:01:10,778
> 2007-01-01, 00:01:10,981
> 2007-01-01, 00:01:11,121
> ...
> ...
>


cut -d: -f1-2 FILE | uniq -c | awk '{print $2 " " $3 " " $1}'

> I want the following output:
>
> 2007-01-01, 00:00, 4
> 2007-01-01, 00:01, 5
> 2007-01-01, 00:02, 4
> ...
> ...


--
Best regards | "The only way to really learn scripting is to write
Cyrus | scripts." -- Advanced Bash-Scripting Guide
Cyrus Kriticos

2007-08-30, 1:20 pm

AyOut wrote:
> I have a shell script that invokes an awk command on a file of the
> following format:
> (Date, Time)
> 2007-01-01, 00:00:00,121
> 2007-01-01, 00:00:00,311
> 2007-01-01, 00:00:00,432
> ...
> ...
> 2007-01-01, 00:01:10,778
> 2007-01-01, 00:01:10,981
> 2007-01-01, 00:01:11,121
> ...
> ...
>


cut -d: -f1-2 FILE | uniq -c | awk '{print $2 " " $3 ", " $1}'

> I want the following output:
>
> 2007-01-01, 00:00, 4
> 2007-01-01, 00:01, 5
> 2007-01-01, 00:02, 4
> ...
> ...


--
Best regards | "The only way to really learn scripting is to write
Cyrus | scripts." -- Advanced Bash-Scripting Guide
Hermann Peifer

2007-08-30, 7:21 pm

AyOut wrote:
> I have a shell script that invokes an awk command on a file of the
> following format:
> (Date, Time)
> 2007-01-01, 00:00:00,121
> 2007-01-01, 00:00:00,311
> 2007-01-01, 00:00:00,432
> ...
> ...
> 2007-01-01, 00:01:10,778
> 2007-01-01, 00:01:10,981
> 2007-01-01, 00:01:11,121
> ...
> ...
>
> The script I have basically parses the file and generates a comma
> separated output of
> Date, Time, Count
> 2007-01-01, 00:00:00, 3
> 2007-01-01, 00:01:10, 2
> 2007-01-01, 00:01:11, 1
>
> The command looks like this:
>
> awk '{++hr[$1,substr($2,1,9)]}END{for(item in hr)printf("%s%s\n",item,
> hr[item])}' ${logfile} > ${logfile}.csv
>
> At the end of the process I have an output of 86400 lines, too big for
> Excel. I want to truncate the output to only output the max value in
> any given minute and thus reduce the output to 1440 lines.
>


Try with substr($2,1,5)] instead of substr($2,1,9)].

Hermann
AyOut

2007-08-30, 7:21 pm

On Aug 30, 2:10 pm, Hermann Peifer <pei...@gmx.eu> wrote:
> AyOut wrote:
>
>
>
>
>
> Try with substr($2,1,5)] instead of substr($2,1,9)].
>
> Hermann



I thought about doing this, but that's only going to give me TPS
granularity at a minute level. Doing $2,1,9 allows me to capture the
hits per second. I.e.:
(counts)
22
23
20
...
...
22
24
21

The max second value for the above given minute would be 24.

Hermann Peifer

2007-08-31, 1:19 pm

AyOut wrote:
> On Aug 30, 2:10 pm, Hermann Peifer <pei...@gmx.eu> wrote:
>
>
> I thought about doing this, but that's only going to give me TPS
> granularity at a minute level. Doing $2,1,9 allows me to capture the
> hits per second. I.e.:
> (counts)
> 22
> 23
> 20
> ..
> ..
> 22
> 24
> 21
>
> The max second value for the above given minute would be 24.
>


I was reading too quickly and misinterpreted "max value per minute" with
the total count per minute. For the max number of hits per second on the
minute level, you could try this:

$ awk '{
second=substr($0,1,20)
minute=substr($0,1,17)
++hr[second]
if (hr[second]>max[minute]) max[minute]=hr[second]
}END{for(item in max)printf "%s, %s\n",item,max[item]}' ...

Hermann
Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2009 webservertalk.com