|
Home > Archive > Unix Shell > August 2007 > Awk Array Question
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Awk Array Question
|
|
|
| I have a shell script that invokes an awk command on a file of the
following format:
(Date, Time)
2007-01-01, 00:00:00,121
2007-01-01, 00:00:00,311
2007-01-01, 00:00:00,432
....
....
2007-01-01, 00:01:10,778
2007-01-01, 00:01:10,981
2007-01-01, 00:01:11,121
....
....
The script I have basically parses the file and generates a comma
separated output of
Date, Time, Count
2007-01-01, 00:00:00, 3
2007-01-01, 00:01:10, 2
2007-01-01, 00:01:11, 1
The command looks like this:
awk '{++hr[$1,substr($2,1,9)]}END{for(item in hr)printf("%s%s\n",item,
hr[item])}' ${logfile} > ${logfile}.csv
At the end of the process I have an output of 86400 lines, too big for
Excel. I want to truncate the output to only output the max value in
any given minute and thus reduce the output to 1440 lines.
I have the following script that I'm struggling with...
#!/usr/bin/ksh
awk 'BEGIN{
for (i=0;i<23;i++) {
for (j=0;j<=59;j++) {
for (k=0;k<=59;k++) {
p=sprintf("%02d:%02d:%02d", i, j, k);
hr[p] = 0;
}
}
}
}
{
++hr[$1,substr($2,1,9)]
}END{
for (i=0;i<23;i++) {
for (j=0;j<=59;j++) {
h=sprintf("%02d:%02d", i, j);
hh[h] = 0;
for (k=0;k<=59;k++) {
p=sprintf("%02d:%02d:%02d", i, j, k);
if(k=0)
hh[h]=hr[p];
else{
pp=sprintf("%02d:%02d:%02d", i, j, k-1);
print hr[pp];
if(hh[pp]>hh[h])
hh[h]=hr[pp];
}
}
}
}
for(item in hr){
printf("%s%s\n",item, hr[item]);
}
}' $1
Can anyone tell me what I'm doing wrong?
I want the following output:
2007-01-01, 00:00, 4
2007-01-01, 00:01, 5
2007-01-01, 00:02, 4
....
....
Thanks
| |
| Cyrus Kriticos 2007-08-30, 1:20 pm |
| AyOut wrote:
> I have a shell script that invokes an awk command on a file of the
> following format:
> (Date, Time)
> 2007-01-01, 00:00:00,121
> 2007-01-01, 00:00:00,311
> 2007-01-01, 00:00:00,432
> ...
> ...
> 2007-01-01, 00:01:10,778
> 2007-01-01, 00:01:10,981
> 2007-01-01, 00:01:11,121
> ...
> ...
>
cut -d: -f1-2 FILE | uniq -c | awk '{print $2 " " $3 " " $1}'
> I want the following output:
>
> 2007-01-01, 00:00, 4
> 2007-01-01, 00:01, 5
> 2007-01-01, 00:02, 4
> ...
> ...
--
Best regards | "The only way to really learn scripting is to write
Cyrus | scripts." -- Advanced Bash-Scripting Guide
| |
| Cyrus Kriticos 2007-08-30, 1:20 pm |
| AyOut wrote:
> I have a shell script that invokes an awk command on a file of the
> following format:
> (Date, Time)
> 2007-01-01, 00:00:00,121
> 2007-01-01, 00:00:00,311
> 2007-01-01, 00:00:00,432
> ...
> ...
> 2007-01-01, 00:01:10,778
> 2007-01-01, 00:01:10,981
> 2007-01-01, 00:01:11,121
> ...
> ...
>
cut -d: -f1-2 FILE | uniq -c | awk '{print $2 " " $3 ", " $1}'
> I want the following output:
>
> 2007-01-01, 00:00, 4
> 2007-01-01, 00:01, 5
> 2007-01-01, 00:02, 4
> ...
> ...
--
Best regards | "The only way to really learn scripting is to write
Cyrus | scripts." -- Advanced Bash-Scripting Guide
| |
| Hermann Peifer 2007-08-30, 7:21 pm |
| AyOut wrote:
> I have a shell script that invokes an awk command on a file of the
> following format:
> (Date, Time)
> 2007-01-01, 00:00:00,121
> 2007-01-01, 00:00:00,311
> 2007-01-01, 00:00:00,432
> ...
> ...
> 2007-01-01, 00:01:10,778
> 2007-01-01, 00:01:10,981
> 2007-01-01, 00:01:11,121
> ...
> ...
>
> The script I have basically parses the file and generates a comma
> separated output of
> Date, Time, Count
> 2007-01-01, 00:00:00, 3
> 2007-01-01, 00:01:10, 2
> 2007-01-01, 00:01:11, 1
>
> The command looks like this:
>
> awk '{++hr[$1,substr($2,1,9)]}END{for(item in hr)printf("%s%s\n",item,
> hr[item])}' ${logfile} > ${logfile}.csv
>
> At the end of the process I have an output of 86400 lines, too big for
> Excel. I want to truncate the output to only output the max value in
> any given minute and thus reduce the output to 1440 lines.
>
Try with substr($2,1,5)] instead of substr($2,1,9)].
Hermann
| |
|
| On Aug 30, 2:10 pm, Hermann Peifer <pei...@gmx.eu> wrote:
> AyOut wrote:
>
>
>
>
>
> Try with substr($2,1,5)] instead of substr($2,1,9)].
>
> Hermann
I thought about doing this, but that's only going to give me TPS
granularity at a minute level. Doing $2,1,9 allows me to capture the
hits per second. I.e.:
(counts)
22
23
20
...
...
22
24
21
The max second value for the above given minute would be 24.
| |
| Hermann Peifer 2007-08-31, 1:19 pm |
| AyOut wrote:
> On Aug 30, 2:10 pm, Hermann Peifer <pei...@gmx.eu> wrote:
>
>
> I thought about doing this, but that's only going to give me TPS
> granularity at a minute level. Doing $2,1,9 allows me to capture the
> hits per second. I.e.:
> (counts)
> 22
> 23
> 20
> ..
> ..
> 22
> 24
> 21
>
> The max second value for the above given minute would be 24.
>
I was reading too quickly and misinterpreted "max value per minute" with
the total count per minute. For the max number of hits per second on the
minute level, you could try this:
$ awk '{
second=substr($0,1,20)
minute=substr($0,1,17)
++hr[second]
if (hr[second]>max[minute]) max[minute]=hr[second]
}END{for(item in max)printf "%s, %s\n",item,max[item]}' ...
Hermann
|
|
|
|
|