Unix Shell - awk to extract data within an arbitrary date range

This is Interesting: Free IT Magazines  
Home > Archive > Unix Shell > January 2006 > awk to extract data within an arbitrary date range





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author awk to extract data within an arbitrary date range
derek.doerr@gmail.com

2006-01-16, 8:51 pm


I have a file in which the first column is the date/time stamp (i.e.
Jan 13 18:54:58.92). I want to be able to use awk, or something else,
if awk isn't the right tool, to extract data within an arbitrary
date/time range. For example, run a script every 30 minutes to extract
data for the previous 30 minutes and then run that data through more
scans/counts with grep -c, etc.

I've tried a number of approaches including the following & don't seem
to get anything to work right


nawk -F'[|]' '$1~/^"Jan 14 00"/,$1~/^"Jan 14 00:05"/'
/log/archive/log.txt.060114_012801

Any ideas on how best to approach this problem?

Ed Morton

2006-01-17, 3:11 am

derek.doerr@gmail.com wrote:
> I have a file in which the first column is the date/time stamp (i.e.
> Jan 13 18:54:58.92). I want to be able to use awk, or something else,
> if awk isn't the right tool, to extract data within an arbitrary
> date/time range. For example, run a script every 30 minutes to extract
> data for the previous 30 minutes and then run that data through more
> scans/counts with grep -c, etc.
>
> I've tried a number of approaches including the following & don't seem
> to get anything to work right
>
>
> nawk -F'[|]' '$1~/^"Jan 14 00"/,$1~/^"Jan 14 00:05"/'
> /log/archive/log.txt.060114_012801
>
> Any ideas on how best to approach this problem?
>


Use gawk as it has builtin date/time functions, and convert the date in
the file into a format that can be compared numerically with other dates
before using it. For example, put this in a file named "chkdate.awk":

#------------
function cvttime(t, a) {
# converts a date like "Jan 13 18:54:58.92" to epoch secs.
split(t,a,"[ :.]")
match("JanFebMarAprMayJunJulAugSepOctNovDec",a[1])
a[1] = sprintf("%02d",(RSTART+2)/3)
century = (a[6] > thisYear ? "19" : "20")
return( mktime(century a[6]" "a[1]" "a[2]" "a[3]" "a[4]" "a[5]) )
}
BEGIN{
thisYear = strftime("%y")
start = cvttime("Jan 13 18:54:58.92")
end = start + (30 * 60) # start + 30 minutes
}
{now = cvttime($1)}
(now >= start) && (now <= end)
#-------------

and run it as:

gawk -f chkdate.awk file

Not that since the century (1900, 2000, etc.) is missing from your
example, I added a guess to the cvttime code.

As for "grep -c" and anything else you're considering, unless it proves
untenable for some reason, just use awk rather than creating chains of
pointless pipes among various tools.

Regards,

Ed.
Bill Marcum

2006-01-17, 6:05 pm

On 16 Jan 2006 18:49:26 -0800, derek.doerr@gmail.com
<derek.doerr@gmail.com> wrote:
>
> I have a file in which the first column is the date/time stamp (i.e.
> Jan 13 18:54:58.92). I want to be able to use awk, or something else,
> if awk isn't the right tool, to extract data within an arbitrary
> date/time range. For example, run a script every 30 minutes to extract
> data for the previous 30 minutes and then run that data through more
> scans/counts with grep -c, etc.
>
> I've tried a number of approaches including the following & don't seem
> to get anything to work right
>
>
> nawk -F'[|]' '$1~/^"Jan 14 00"/,$1~/^"Jan 14 00:05"/'
> /log/archive/log.txt.060114_012801
>
> Any ideas on how best to approach this problem?
>

Can you show some sample lines of the input file? Is the date really
printed in double quotes?


--
I want the presidency so bad I can already taste the hors d'oeuvres.
derek.doerr@gmail.com

2006-01-19, 8:49 pm

Here's a sample of records a log file:

Dec 23
10:18:46. 83|TUCPU=260|TKCPU=80|0|8002||GrammarMan
ager::DoDelayThrow() -
no delayed throw scheduled
Dec 23 10:18:46. 83|TUCPU=260|TKCPU=80|0|8002||VXI::DoInn
erJump()
Dec 23 10:18:46. 83|TUCPU=260|TKCPU=80|0|8002||VXI::Colle
ctPhase -
(block)
Dec 23 10:18:46. 84|TUCPU=260|TKCPU=80|0|8002||VXI::block
_element()
Dec 23 10:18:46. 84|TUCPU=260|TKCPU=80|0|8002||VXI::execu
te_content()
Dec 23 10:18:46. 84|TUCPU=260|TKCPU=80|0|8002||VXI::execu
table_element -
(audio)
Dec 23 10:18:46. 84|TUCPU=260|TKCPU=80|0|8002||VXI::PlayP
rompt()
Dec 23 10:18:46. 84|TUCPU=260|TKCPU=80|0|8002||PromptMana
ger::from
Queue() properties, bargein is ON
Dec 23 10:18:46. 84|TUCPU=260|TKCPU=80|0|5000||AVBpromptQ
ueue() bargein
= true
Dec 23 10:18:46.84|TUCPU=260|TKCPU=80|0|5000||Found AUDIO type to be:
audio/x-wav
Dec 23 10:18:46.84|TUCPU=260|TKCPU=80|0|5000||Queuing AUDIO:
/gmcars/ivr/audio/gm_cars_ivr_menu_000525.wav
Dec 23 10:18:46. 85|TUCPU=270|TKCPU=90|0|5000||AVBpromptQ
ueue() - mime
type: audio/x-wav
Dec 23 10:18:46. 85|TUCPU=270|TKCPU=90|0|5000||AVBPromptQ
ueue: Fresh
internet file Transcode
InFile(/voice1/vxmlcache/cache_sbinet/526/BAAESaOLg) MIME(audio/x-wav)
VisFile(/voice1/vxmlcache/cache_sbinet/526/BAAESaOLg.ulaw)
Dec 23
10:18:46. 86|TUCPU=270|TKCPU=90|0|8002||PromptMana
ger::enabledSegmentInQueue
= true
Dec 23
10:18:46. 86|TUCPU=270|TKCPU=90|0|8002||PromptMana
ger::AddSegment() - no
Play() because bargein = true
Dec 23 10:18:46. 86|TUCPU=270|TKCPU=90|0|8002||VXI::DoInn
erJump()
Dec 23 10:18:46. 86|TUCPU=270|TKCPU=90|0|8002||VXI::Colle
ctPhase -
(object)
Dec 23 10:18:46. 86|TUCPU=270|TKCPU=90|0|8002||VXI::objec
t_element()
Dec 23 10:18:46. 86|TUCPU=270|TKCPU=90|0|8002||VXI::queue
_prompts()
Dec 23 10:18:46. 86|TUCPU=280|TKCPU=90|0|8002||PromptMana
ger::PlayAll()
Dec 23 10:18:46. 86|TUCPU=280|TKCPU=90|0|5000||AVBpromptP
lay(we can can
not play))
Dec 23 10:18:46. 86|TUCPU=280|TKCPU=90|0|5000||AVBpromptW
ait(we can can
play))
Dec 23 10:18:51. 37|TUCPU=280|TKCPU=90|0|8002||PromptMana
ger::PlayAll()
- enabledSegmentInQueue = false
Dec 23 10:18:51. 39|TUCPU=280|TKCPU=90|0|8002||VXI::objec
t_element -
done
Dec 23 10:18:51. 39|TUCPU=280|TKCPU=90|0|8002||VXI::execu
te_content()
Dec 23 10:18:51. 39|TUCPU=290|TKCPU=90|0|8002||VXI::execu
table_element -
(assign)
Dec 23
10:18:51. 39|TUCPU=290|TKCPU=90|0|8002||VXI::assig
n_element(name=retCode
exp=f_Main_CallScript_transfercallscript
.retVal)
Dec 23 10:18:51. 39|TUCPU=290|TKCPU=90|0|8002||VXI::execu
table_element -
(assign)
Dec 23
10:18:51. 39|TUCPU=290|TKCPU=90|0|8002||VXI::assig
n_element(name=retArgNo
exp=f_Main_CallScript_transfercallscript
.numRet)
Dec 23 10:18:51. 39|TUCPU=290|TKCPU=90|0|8002||VXI::execu
table_element -
(assign)
Dec 23
10:18:51. 39|TUCPU=290|TKCPU=90|0|8002||VXI::assig
n_element(name=ani
exp=f_Main_CallScript_transfercallscript
.ret1)
Dec 23 10:18:51. 39|TUCPU=290|TKCPU=90|0|8002||VXI::execu
table_element -
(assign)
Dec 23
10:18:51. 39|TUCPU=290|TKCPU=90|0|8002||VXI::assig
n_element(name=dnis
exp=f_Main_CallScript_transfercallscript
.ret2)
Dec 23 10:18:51. 39|TUCPU=290|TKCPU=90|0|8002||VXI::execu
table_element -
(assign)

derek.doerr@gmail.com

2006-01-19, 8:49 pm

Ed,

Thanks for your reply. As you can tell, I'm a newbie at awk
programming so I've been studying your script & trying to really
understand how it works. I have a number of questions about it:

(1) match() returns the index position where a[1] starts in the string
"JanFeb...". Where is the index value assigned to a variable for
subsequent use? Is this the value of RSTART?
(2) Am I correct that the script is invoked, once for each record in
"file"; how does $1 get assigned a value? I assume that this is the
date/timestamp on the record.
(3) how would I pass values into the script to set the "start" value
and the duration used to calculate the "end" value?

Thanks so much for your help!
- Derek

derek.doerr@gmail.com

2006-01-19, 8:49 pm

One more question....

Your invokation of of cvttime($1) only has one argument being passed to
cvttime when the method signature has two parameters. Is this an
omission?

Ed Morton

2006-01-20, 2:50 am

derek.doerr@gmail.com wrote:
> Ed,
>
> Thanks for your reply.


Please read http://cfaj.freeshell.org/google before posting again as
you're falling prey to google groups.

As you can tell, I'm a newbie at awk
> programming so I've been studying your script & trying to really
> understand how it works. I have a number of questions about it:
>
> (1) match() returns the index position where a[1] starts in the string
> "JanFeb...". Where is the index value assigned to a variable for
> subsequent use? Is this the value of RSTART?


Yes.

> (2) Am I correct that the script is invoked, once for each record in
> "file";


No, the script gets invoked once for the entire file and these 2 lines:

{now = cvttime($1)}
(now >= start) && (now <= end)

get executed once for every record (line by default) in the file.

how does $1 get assigned a value?

$1 is just one of the many variables awk populates for you as it reads
records.

I assume that this is the
> date/timestamp on the record.


It's the first field (space-separated by default), which in this case is
the date/timestamp.

> (3) how would I pass values into the script to set the "start" value
> and the duration used to calculate the "end" value?


See question 24 in the FAQ http://home.comcast.net/~j.p.h/cus-faq-2.html#24

>
> Thanks so much for your help!
> - Derek
>


You're welcome,

Ed.
Ed Morton

2006-01-20, 2:50 am

derek.doerr@gmail.com wrote:

> One more question....


Please read http://cfaj.freeshell.org/google before posting again as
you're falling prey to google groups.

> Your invokation of of cvttime($1) only has one argument being passed to
> cvttime when the method signature has two parameters. Is this an
> omission?
>


No, it's deliberate. awk has no provision for declaring local variables,
so adding an unused parameter separated from the real parameters by
signficiant white space is a common workaround. I should've also added
"century" to the parameter list to make it local.

I recommed you buy the book "Effective Awk Programming" by Arnold
Robbins (http://www.oreilly.com/catalog/awkprog3/) if you plan to do
much awk programming.

Ed.
Kenny McCormack

2006-01-20, 8:13 am

In article <1137722178.942394.293550@o13g2000cwo.googlegroups.com>,
derek.doerr@gmail.com <derek.doerr@gmail.com> wrote:
>One more question....
>
>Your invokation of of cvttime($1) only has one argument being passed to
>cvttime when the method signature has two parameters. Is this an
>omission?


What's a "method signature"??? Does it have something to do with checking
oneself in and out of a drug treatment facility?

In any case, I think it is irrelevant in the context of AWK programming.

Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com