|
Home > Archive > Unix Shell > January 2006 > awk to extract data within an arbitrary date range
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
awk to extract data within an arbitrary date range
|
|
| derek.doerr@gmail.com 2006-01-16, 8:51 pm |
|
I have a file in which the first column is the date/time stamp (i.e.
Jan 13 18:54:58.92). I want to be able to use awk, or something else,
if awk isn't the right tool, to extract data within an arbitrary
date/time range. For example, run a script every 30 minutes to extract
data for the previous 30 minutes and then run that data through more
scans/counts with grep -c, etc.
I've tried a number of approaches including the following & don't seem
to get anything to work right
nawk -F'[|]' '$1~/^"Jan 14 00"/,$1~/^"Jan 14 00:05"/'
/log/archive/log.txt.060114_012801
Any ideas on how best to approach this problem?
| |
| Ed Morton 2006-01-17, 3:11 am |
| derek.doerr@gmail.com wrote:
> I have a file in which the first column is the date/time stamp (i.e.
> Jan 13 18:54:58.92). I want to be able to use awk, or something else,
> if awk isn't the right tool, to extract data within an arbitrary
> date/time range. For example, run a script every 30 minutes to extract
> data for the previous 30 minutes and then run that data through more
> scans/counts with grep -c, etc.
>
> I've tried a number of approaches including the following & don't seem
> to get anything to work right
>
>
> nawk -F'[|]' '$1~/^"Jan 14 00"/,$1~/^"Jan 14 00:05"/'
> /log/archive/log.txt.060114_012801
>
> Any ideas on how best to approach this problem?
>
Use gawk as it has builtin date/time functions, and convert the date in
the file into a format that can be compared numerically with other dates
before using it. For example, put this in a file named "chkdate.awk":
#------------
function cvttime(t, a) {
# converts a date like "Jan 13 18:54:58.92" to epoch secs.
split(t,a,"[ :.]")
match("JanFebMarAprMayJunJulAugSepOctNovDec",a[1])
a[1] = sprintf("%02d",(RSTART+2)/3)
century = (a[6] > thisYear ? "19" : "20")
return( mktime(century a[6]" "a[1]" "a[2]" "a[3]" "a[4]" "a[5]) )
}
BEGIN{
thisYear = strftime("%y")
start = cvttime("Jan 13 18:54:58.92")
end = start + (30 * 60) # start + 30 minutes
}
{now = cvttime($1)}
(now >= start) && (now <= end)
#-------------
and run it as:
gawk -f chkdate.awk file
Not that since the century (1900, 2000, etc.) is missing from your
example, I added a guess to the cvttime code.
As for "grep -c" and anything else you're considering, unless it proves
untenable for some reason, just use awk rather than creating chains of
pointless pipes among various tools.
Regards,
Ed.
| |
| Bill Marcum 2006-01-17, 6:05 pm |
| On 16 Jan 2006 18:49:26 -0800, derek.doerr@gmail.com
<derek.doerr@gmail.com> wrote:
>
> I have a file in which the first column is the date/time stamp (i.e.
> Jan 13 18:54:58.92). I want to be able to use awk, or something else,
> if awk isn't the right tool, to extract data within an arbitrary
> date/time range. For example, run a script every 30 minutes to extract
> data for the previous 30 minutes and then run that data through more
> scans/counts with grep -c, etc.
>
> I've tried a number of approaches including the following & don't seem
> to get anything to work right
>
>
> nawk -F'[|]' '$1~/^"Jan 14 00"/,$1~/^"Jan 14 00:05"/'
> /log/archive/log.txt.060114_012801
>
> Any ideas on how best to approach this problem?
>
Can you show some sample lines of the input file? Is the date really
printed in double quotes?
--
I want the presidency so bad I can already taste the hors d'oeuvres.
| |
| derek.doerr@gmail.com 2006-01-19, 8:49 pm |
| Here's a sample of records a log file:
Dec 23
10:18:46. 83|TUCPU=260|TKCPU=80|0|8002||GrammarMan
ager::DoDelayThrow() -
no delayed throw scheduled
Dec 23 10:18:46. 83|TUCPU=260|TKCPU=80|0|8002||VXI::DoInn
erJump()
Dec 23 10:18:46. 83|TUCPU=260|TKCPU=80|0|8002||VXI::Colle
ctPhase -
(block)
Dec 23 10:18:46. 84|TUCPU=260|TKCPU=80|0|8002||VXI::block
_element()
Dec 23 10:18:46. 84|TUCPU=260|TKCPU=80|0|8002||VXI::execu
te_content()
Dec 23 10:18:46. 84|TUCPU=260|TKCPU=80|0|8002||VXI::execu
table_element -
(audio)
Dec 23 10:18:46. 84|TUCPU=260|TKCPU=80|0|8002||VXI::PlayP
rompt()
Dec 23 10:18:46. 84|TUCPU=260|TKCPU=80|0|8002||PromptMana
ger::from
Queue() properties, bargein is ON
Dec 23 10:18:46. 84|TUCPU=260|TKCPU=80|0|5000||AVBpromptQ
ueue() bargein
= true
Dec 23 10:18:46.84|TUCPU=260|TKCPU=80|0|5000||Found AUDIO type to be:
audio/x-wav
Dec 23 10:18:46.84|TUCPU=260|TKCPU=80|0|5000||Queuing AUDIO:
/gmcars/ivr/audio/gm_cars_ivr_menu_000525.wav
Dec 23 10:18:46. 85|TUCPU=270|TKCPU=90|0|5000||AVBpromptQ
ueue() - mime
type: audio/x-wav
Dec 23 10:18:46. 85|TUCPU=270|TKCPU=90|0|5000||AVBPromptQ
ueue: Fresh
internet file Transcode
InFile(/voice1/vxmlcache/cache_sbinet/526/BAAESaOLg) MIME(audio/x-wav)
VisFile(/voice1/vxmlcache/cache_sbinet/526/BAAESaOLg.ulaw)
Dec 23
10:18:46. 86|TUCPU=270|TKCPU=90|0|8002||PromptMana
ger::enabledSegmentInQueue
= true
Dec 23
10:18:46. 86|TUCPU=270|TKCPU=90|0|8002||PromptMana
ger::AddSegment() - no
Play() because bargein = true
Dec 23 10:18:46. 86|TUCPU=270|TKCPU=90|0|8002||VXI::DoInn
erJump()
Dec 23 10:18:46. 86|TUCPU=270|TKCPU=90|0|8002||VXI::Colle
ctPhase -
(object)
Dec 23 10:18:46. 86|TUCPU=270|TKCPU=90|0|8002||VXI::objec
t_element()
Dec 23 10:18:46. 86|TUCPU=270|TKCPU=90|0|8002||VXI::queue
_prompts()
Dec 23 10:18:46. 86|TUCPU=280|TKCPU=90|0|8002||PromptMana
ger::PlayAll()
Dec 23 10:18:46. 86|TUCPU=280|TKCPU=90|0|5000||AVBpromptP
lay(we can can
not play))
Dec 23 10:18:46. 86|TUCPU=280|TKCPU=90|0|5000||AVBpromptW
ait(we can can
play))
Dec 23 10:18:51. 37|TUCPU=280|TKCPU=90|0|8002||PromptMana
ger::PlayAll()
- enabledSegmentInQueue = false
Dec 23 10:18:51. 39|TUCPU=280|TKCPU=90|0|8002||VXI::objec
t_element -
done
Dec 23 10:18:51. 39|TUCPU=280|TKCPU=90|0|8002||VXI::execu
te_content()
Dec 23 10:18:51. 39|TUCPU=290|TKCPU=90|0|8002||VXI::execu
table_element -
(assign)
Dec 23
10:18:51. 39|TUCPU=290|TKCPU=90|0|8002||VXI::assig
n_element(name=retCode
exp=f_Main_CallScript_transfercallscript
.retVal)
Dec 23 10:18:51. 39|TUCPU=290|TKCPU=90|0|8002||VXI::execu
table_element -
(assign)
Dec 23
10:18:51. 39|TUCPU=290|TKCPU=90|0|8002||VXI::assig
n_element(name=retArgNo
exp=f_Main_CallScript_transfercallscript
.numRet)
Dec 23 10:18:51. 39|TUCPU=290|TKCPU=90|0|8002||VXI::execu
table_element -
(assign)
Dec 23
10:18:51. 39|TUCPU=290|TKCPU=90|0|8002||VXI::assig
n_element(name=ani
exp=f_Main_CallScript_transfercallscript
.ret1)
Dec 23 10:18:51. 39|TUCPU=290|TKCPU=90|0|8002||VXI::execu
table_element -
(assign)
Dec 23
10:18:51. 39|TUCPU=290|TKCPU=90|0|8002||VXI::assig
n_element(name=dnis
exp=f_Main_CallScript_transfercallscript
.ret2)
Dec 23 10:18:51. 39|TUCPU=290|TKCPU=90|0|8002||VXI::execu
table_element -
(assign)
| |
| derek.doerr@gmail.com 2006-01-19, 8:49 pm |
| Ed,
Thanks for your reply. As you can tell, I'm a newbie at awk
programming so I've been studying your script & trying to really
understand how it works. I have a number of questions about it:
(1) match() returns the index position where a[1] starts in the string
"JanFeb...". Where is the index value assigned to a variable for
subsequent use? Is this the value of RSTART?
(2) Am I correct that the script is invoked, once for each record in
"file"; how does $1 get assigned a value? I assume that this is the
date/timestamp on the record.
(3) how would I pass values into the script to set the "start" value
and the duration used to calculate the "end" value?
Thanks so much for your help!
- Derek
| |
| derek.doerr@gmail.com 2006-01-19, 8:49 pm |
| One more question....
Your invokation of of cvttime($1) only has one argument being passed to
cvttime when the method signature has two parameters. Is this an
omission?
| |
| Ed Morton 2006-01-20, 2:50 am |
| derek.doerr@gmail.com wrote:
> Ed,
>
> Thanks for your reply.
Please read http://cfaj.freeshell.org/google before posting again as
you're falling prey to google groups.
As you can tell, I'm a newbie at awk
> programming so I've been studying your script & trying to really
> understand how it works. I have a number of questions about it:
>
> (1) match() returns the index position where a[1] starts in the string
> "JanFeb...". Where is the index value assigned to a variable for
> subsequent use? Is this the value of RSTART?
Yes.
> (2) Am I correct that the script is invoked, once for each record in
> "file";
No, the script gets invoked once for the entire file and these 2 lines:
{now = cvttime($1)}
(now >= start) && (now <= end)
get executed once for every record (line by default) in the file.
how does $1 get assigned a value?
$1 is just one of the many variables awk populates for you as it reads
records.
I assume that this is the
> date/timestamp on the record.
It's the first field (space-separated by default), which in this case is
the date/timestamp.
> (3) how would I pass values into the script to set the "start" value
> and the duration used to calculate the "end" value?
See question 24 in the FAQ http://home.comcast.net/~j.p.h/cus-faq-2.html#24
>
> Thanks so much for your help!
> - Derek
>
You're welcome,
Ed.
| |
| Ed Morton 2006-01-20, 2:50 am |
| derek.doerr@gmail.com wrote:
> One more question....
Please read http://cfaj.freeshell.org/google before posting again as
you're falling prey to google groups.
> Your invokation of of cvttime($1) only has one argument being passed to
> cvttime when the method signature has two parameters. Is this an
> omission?
>
No, it's deliberate. awk has no provision for declaring local variables,
so adding an unused parameter separated from the real parameters by
signficiant white space is a common workaround. I should've also added
"century" to the parameter list to make it local.
I recommed you buy the book "Effective Awk Programming" by Arnold
Robbins (http://www.oreilly.com/catalog/awkprog3/) if you plan to do
much awk programming.
Ed.
| |
| Kenny McCormack 2006-01-20, 8:13 am |
| In article <1137722178.942394.293550@o13g2000cwo.googlegroups.com>,
derek.doerr@gmail.com <derek.doerr@gmail.com> wrote:
>One more question....
>
>Your invokation of of cvttime($1) only has one argument being passed to
>cvttime when the method signature has two parameters. Is this an
>omission?
What's a "method signature"??? Does it have something to do with checking
oneself in and out of a drug treatment facility?
In any case, I think it is irrelevant in the context of AWK programming.
|
|
|
|
|