Unix Programming - multi dimensional array in unix

This is Interesting: Free IT Magazines  
Home > Archive > Unix Programming > March 2006 > multi dimensional array in unix





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author multi dimensional array in unix
John Smith

2006-03-03, 6:43 pm

Problem:
I have to parse the following *sample* input:

A 2006-03-02 22:59 +0000 testID 1.1 ScenarioCSGJVParser.java

M 2006-03-02 23:15 +0000 testID 1.2 ScenarioCSGJVParser.java

A 2006-02-27 19:41 +0000 testID 1.1 check_instrument.sh

M 2006-02-27 19:42 +0000 testID 1.2 check_instrument.sh

M 2006-02-27 22:01 +0000 testID 1.3 check_instrument.sh

M 2006-02-28 14:56 +0000 testID 1.4 check_instrument.sh

A 2006-02-28 14:19 +0000 testID 1.1 create_instruments.sh
M 2006-02-28 14:56 +0000 testID 1.2 create_instruments.sh
A 2006-02-27 22:04 +0000 testID 1.1 create_market_instrument.sh
A 2006-03-03 15:23 +0000 testID 1.1 daily_cvs_file_changes.sh

M 2006-03-03 16:10 +0000 testID 1.2 daily_cvs_file_changes.sh

M 2006-03-03 18:29 +0000 testID 1.3 daily_cvs_file_changes.sh

M 2006-03-03 18:50 +0000 testID 1.4 daily_cvs_file_changes.sh

M 2006-03-03 19:16 +0000 testID 1.5 daily_cvs_file_changes.sh

A 2006-01-11 22:16 +0000 testID 1.1 dummy.txt

M 2006-01-11 22:28 +0000 testID 1.2 dummy.txt

M 2006-01-12 20:03 +0000 testID 1.4 dummy.txt

A 2006-02-28 14:19 +0000 testID 1.1 exec_instrument.sh
A 2006-02-27 22:04 +0000 testID 1.1 exec_market_instrument.sh
A 2006-03-03 19:08 +0000 testID 1.1 test.sh

A 2006-01-11 22:33 +0000 testID 1.1 test2.pl

A 2006-03-01 18:10 +0000 testID 1.1 var_vol_scenario.java

A 2006-02-08 21:57 +0000 testID 1.1 check_kbmacros_reports.sh

A 2006-01-13 16:37 +0000 testID 1.1 dummylist.txt

M 2005-12-15 20:56 +0000 testID 1.2 hello.pl

A 2006-02-08 21:57 +0000 testID 1.1 scandiv.sh

A 2006-01-17 21:59 +0000 testID 1.1 scanrate.sh

A 2006-01-13 16:31 +0000 testID 1.1 scanvol.sh

M 2006-01-13 16:37 +0000 testID 1.2 scanvol.sh

M 2006-02-08 21:57 +0000 testID 1.3 scanvol.sh

A 2006-02-08 21:57 +0000 testID 1.1 run_kbmacros.sh

A 2006-02-08 21:57 +0000 testID 1.1 check_divs.sh

A 2006-01-17 21:59 +0000 testID 1.1 check_rate

A 2006-01-13 16:31 +0000 testID 1.1 check_rate_main.pl

M 2006-01-13 16:37 +0000 testID 1.2 check_rate_main.pl

A 2006-01-13 16:31 +0000 testID 1.1 check_vol

M 2006-01-13 16:37 +0000 testID 1.2 check_vol

A 2006-01-13 16:31 +0000 testID 1.1 check_vol_main.pl

M 2006-01-13 16:37 +0000 testID 1.2 check_vol_main.pl

A 2006-01-13 16:31 +0000 testID 1.1 check_vols.sh

M 2006-01-25 23:43 +0000 testID 1.2 check_vols.sh

A 2006-01-13 15:43 +0000 testID 1.1 check_rate.html

M 2006-01-13 15:45 +0000 testID 1.2 check_rate.html


for each unique filename, I have to know the filename (e.g.
check_rate.html), and the 2 most recent revisions. (1.1 and 1.2)

My difficulty:
I can't think up of a good way to store all the filenames I encounter
while I scan the input file. The input file varies with every run of
the script. e.g. I may get *only* version 1.1 with filename1 on day 1;
but I may get versions 1.2, 1.3, and 1.4 with filename1 on day2.

I also don't know how many unique filenames will be in the input file
everyday.

the only algorithm I can think of is:



for each line
pick up file name
check this file name against list of diff-ed files
if in list
skip current line
else # not in list
update $SecondLastRevision and $LastRevision
end if
end for
add filename to list of diff-ed files


repeat the above procedure with all unique filenames in the input file

but this algorithm is very complicated and somewhat inefficient.
Is there a better algorithm?

Barry Margolin

2006-03-03, 8:47 pm

In article <1141419233.753777.291610@t39g2000cwt.googlegroups.com>,
"John Smith" <wleung7@gmail.com> wrote:

> for each unique filename, I have to know the filename (e.g.
> check_rate.html), and the 2 most recent revisions. (1.1 and 1.2)
>
> My difficulty:
> I can't think up of a good way to store all the filenames I encounter
> while I scan the input file. The input file varies with every run of
> the script. e.g. I may get *only* version 1.1 with filename1 on day 1;
> but I may get versions 1.2, 1.3, and 1.4 with filename1 on day2.
>
> I also don't know how many unique filenames will be in the input file
> everyday.
>
> the only algorithm I can think of is:
>
>
>
> for each line
> pick up file name
> check this file name against list of diff-ed files
> if in list
> skip current line
> else # not in list
> update $SecondLastRevision and $LastRevision
> end if
> end for
> add filename to list of diff-ed files


This should be pretty straightforward to implement using associative
arrays (aka hashes) in awk or perl. "check this file name against list"
means looking up the name in the hash.

--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
Pascal Bourguignon

2006-03-03, 8:47 pm

"John Smith" <wleung7@gmail.com> writes:

> Problem:
> I have to parse the following *sample* input:


awk '
/^$/{next;}
{ version=$6;name=$7;
if(last[name]==""){
last[name]=version;
beforelast[name]=-1;
}else if(last[name]<version){
beforelast[name]=last[name];
last[name]=version;
}else if(beforelast[name]<version){
beforelast[name]=version;
}
}
END{
for(name in last){
if(beforelast[name]<0){
printf("%30s %-5s\n",name,last[name]);
}else{
printf("%30s %-5s %-5s\n",name,last[name],beforelast[name]);
}
}
}
'<<EOF
A 2006-03-02 22:59 +0000 testID 1.1 ScenarioCSGJVParser.java

M 2006-03-02 23:15 +0000 testID 1.2 ScenarioCSGJVParser.java

A 2006-02-27 19:41 +0000 testID 1.1 check_instrument.sh

M 2006-02-27 19:42 +0000 testID 1.2 check_instrument.sh

M 2006-02-27 22:01 +0000 testID 1.3 check_instrument.sh

M 2006-02-28 14:56 +0000 testID 1.4 check_instrument.sh

A 2006-02-28 14:19 +0000 testID 1.1 create_instruments.sh
M 2006-02-28 14:56 +0000 testID 1.2 create_instruments.sh
A 2006-02-27 22:04 +0000 testID 1.1 create_market_instrument.sh
A 2006-03-03 15:23 +0000 testID 1.1 daily_cvs_file_changes.sh

M 2006-03-03 16:10 +0000 testID 1.2 daily_cvs_file_changes.sh

M 2006-03-03 18:29 +0000 testID 1.3 daily_cvs_file_changes.sh

M 2006-03-03 18:50 +0000 testID 1.4 daily_cvs_file_changes.sh

M 2006-03-03 19:16 +0000 testID 1.5 daily_cvs_file_changes.sh

A 2006-01-11 22:16 +0000 testID 1.1 dummy.txt

M 2006-01-11 22:28 +0000 testID 1.2 dummy.txt

M 2006-01-12 20:03 +0000 testID 1.4 dummy.txt

A 2006-02-28 14:19 +0000 testID 1.1 exec_instrument.sh
A 2006-02-27 22:04 +0000 testID 1.1 exec_market_instrument.sh
A 2006-03-03 19:08 +0000 testID 1.1 test.sh

A 2006-01-11 22:33 +0000 testID 1.1 test2.pl

A 2006-03-01 18:10 +0000 testID 1.1 var_vol_scenario.java

A 2006-02-08 21:57 +0000 testID 1.1 check_kbmacros_reports.sh

A 2006-01-13 16:37 +0000 testID 1.1 dummylist.txt

M 2005-12-15 20:56 +0000 testID 1.2 hello.pl

A 2006-02-08 21:57 +0000 testID 1.1 scandiv.sh

A 2006-01-17 21:59 +0000 testID 1.1 scanrate.sh

A 2006-01-13 16:31 +0000 testID 1.1 scanvol.sh

M 2006-01-13 16:37 +0000 testID 1.2 scanvol.sh

M 2006-02-08 21:57 +0000 testID 1.3 scanvol.sh

A 2006-02-08 21:57 +0000 testID 1.1 run_kbmacros.sh

A 2006-02-08 21:57 +0000 testID 1.1 check_divs.sh

A 2006-01-17 21:59 +0000 testID 1.1 check_rate

A 2006-01-13 16:31 +0000 testID 1.1 check_rate_main.pl

M 2006-01-13 16:37 +0000 testID 1.2 check_rate_main.pl

A 2006-01-13 16:31 +0000 testID 1.1 check_vol

M 2006-01-13 16:37 +0000 testID 1.2 check_vol

A 2006-01-13 16:31 +0000 testID 1.1 check_vol_main.pl

M 2006-01-13 16:37 +0000 testID 1.2 check_vol_main.pl

A 2006-01-13 16:31 +0000 testID 1.1 check_vols.sh

M 2006-01-25 23:43 +0000 testID 1.2 check_vols.sh

A 2006-01-13 15:43 +0000 testID 1.1 check_rate.html

M 2006-01-13 15:45 +0000 testID 1.2 check_rate.html
EOF

check_rate.html 1.2 1.1
run_kbmacros.sh 1.1
check_rate 1.1
check_divs.sh 1.1
scanvol.sh 1.3 1.2
check_vol_main.pl 1.2 1.1
check_vol 1.2 1.1
check_rate_main.pl 1.2 1.1
test.sh 1.1
dummy.txt 1.4 1.2
scandiv.sh 1.1
check_kbmacros_reports.sh 1.1
var_vol_scenario.java 1.1
daily_cvs_file_changes.sh 1.5 1.4
ScenarioCSGJVParser.java 1.2 1.1
scanrate.sh 1.1
hello.pl 1.2
dummylist.txt 1.1
exec_market_instrument.sh 1.1
create_market_instrument.sh 1.1
check_vols.sh 1.2 1.1
exec_instrument.sh 1.1
create_instruments.sh 1.2 1.1
test2.pl 1.1
check_instrument.sh 1.4 1.3

--
__Pascal Bourguignon__ http://www.informatimago.com/

"You question the worthiness of my code? I should kill you where you
stand!"
William James

2006-03-04, 7:47 am

John Smith wrote:
> Problem:
> I have to parse the following *sample* input:
>
> A 2006-03-02 22:59 +0000 testID 1.1 ScenarioCSGJVParser.java
>
> M 2006-03-02 23:15 +0000 testID 1.2 ScenarioCSGJVParser.java
>
> A 2006-02-27 19:41 +0000 testID 1.1 check_instrument.sh
>
> M 2006-02-27 19:42 +0000 testID 1.2 check_instrument.sh
>
> M 2006-02-27 22:01 +0000 testID 1.3 check_instrument.sh
>
> M 2006-02-28 14:56 +0000 testID 1.4 check_instrument.sh
>
> A 2006-02-28 14:19 +0000 testID 1.1 create_instruments.sh
> M 2006-02-28 14:56 +0000 testID 1.2 create_instruments.sh
> A 2006-02-27 22:04 +0000 testID 1.1 create_market_instrument.sh
> A 2006-03-03 15:23 +0000 testID 1.1 daily_cvs_file_changes.sh
>
> M 2006-03-03 16:10 +0000 testID 1.2 daily_cvs_file_changes.sh
>
> M 2006-03-03 18:29 +0000 testID 1.3 daily_cvs_file_changes.sh
>
> M 2006-03-03 18:50 +0000 testID 1.4 daily_cvs_file_changes.sh
>
> M 2006-03-03 19:16 +0000 testID 1.5 daily_cvs_file_changes.sh
>
> A 2006-01-11 22:16 +0000 testID 1.1 dummy.txt
>
> M 2006-01-11 22:28 +0000 testID 1.2 dummy.txt
>
> M 2006-01-12 20:03 +0000 testID 1.4 dummy.txt
>
> A 2006-02-28 14:19 +0000 testID 1.1 exec_instrument.sh
> A 2006-02-27 22:04 +0000 testID 1.1 exec_market_instrument.sh
> A 2006-03-03 19:08 +0000 testID 1.1 test.sh
>
> A 2006-01-11 22:33 +0000 testID 1.1 test2.pl
>
> A 2006-03-01 18:10 +0000 testID 1.1 var_vol_scenario.java
>
> A 2006-02-08 21:57 +0000 testID 1.1 check_kbmacros_reports.sh
>
> A 2006-01-13 16:37 +0000 testID 1.1 dummylist.txt
>
> M 2005-12-15 20:56 +0000 testID 1.2 hello.pl
>
> A 2006-02-08 21:57 +0000 testID 1.1 scandiv.sh
>
> A 2006-01-17 21:59 +0000 testID 1.1 scanrate.sh
>
> A 2006-01-13 16:31 +0000 testID 1.1 scanvol.sh
>
> M 2006-01-13 16:37 +0000 testID 1.2 scanvol.sh
>
> M 2006-02-08 21:57 +0000 testID 1.3 scanvol.sh
>
> A 2006-02-08 21:57 +0000 testID 1.1 run_kbmacros.sh
>
> A 2006-02-08 21:57 +0000 testID 1.1 check_divs.sh
>
> A 2006-01-17 21:59 +0000 testID 1.1 check_rate
>
> A 2006-01-13 16:31 +0000 testID 1.1 check_rate_main.pl
>
> M 2006-01-13 16:37 +0000 testID 1.2 check_rate_main.pl
>
> A 2006-01-13 16:31 +0000 testID 1.1 check_vol
>
> M 2006-01-13 16:37 +0000 testID 1.2 check_vol
>
> A 2006-01-13 16:31 +0000 testID 1.1 check_vol_main.pl
>
> M 2006-01-13 16:37 +0000 testID 1.2 check_vol_main.pl
>
> A 2006-01-13 16:31 +0000 testID 1.1 check_vols.sh
>
> M 2006-01-25 23:43 +0000 testID 1.2 check_vols.sh
>
> A 2006-01-13 15:43 +0000 testID 1.1 check_rate.html
>
> M 2006-01-13 15:45 +0000 testID 1.2 check_rate.html
>
>
> for each unique filename, I have to know the filename (e.g.
> check_rate.html), and the 2 most recent revisions. (1.1 and 1.2)
>
> My difficulty:
> I can't think up of a good way to store all the filenames I encounter
> while I scan the input file. The input file varies with every run of
> the script. e.g. I may get *only* version 1.1 with filename1 on day 1;
> but I may get versions 1.2, 1.3, and 1.4 with filename1 on day2.
>
> I also don't know how many unique filenames will be in the input file
> everyday.
>
> the only algorithm I can think of is:
>
>
>
> for each line
> pick up file name
> check this file name against list of diff-ed files
> if in list
> skip current line
> else # not in list
> update $SecondLastRevision and $LastRevision
> end if
> end for
> add filename to list of diff-ed files
>
>
> repeat the above procedure with all unique filenames in the input file
>
> but this algorithm is very complicated and somewhat inefficient.
> Is there a better algorithm?


The awk way. (Assumes that versions are in ascending order.)

NF {
next_to_last[ $NF ] = last[ $NF ]
last[ $NF ] = $(NF-1)
}
END {
for ( file in last )
{ s = sprintf( "%30s %-4s %s", file, last[file], next_to_last[file] )
sub( / +$/, "", s )
print s
}
}

William James

2006-03-04, 7:47 am

John Smith wrote:
> Problem:
> I have to parse the following *sample* input:
>
> A 2006-03-02 22:59 +0000 testID 1.1 ScenarioCSGJVParser.java
>
> M 2006-03-02 23:15 +0000 testID 1.2 ScenarioCSGJVParser.java
>
> A 2006-02-27 19:41 +0000 testID 1.1 check_instrument.sh
>
> M 2006-02-27 19:42 +0000 testID 1.2 check_instrument.sh
>
> M 2006-02-27 22:01 +0000 testID 1.3 check_instrument.sh
>
> M 2006-02-28 14:56 +0000 testID 1.4 check_instrument.sh
>
> A 2006-02-28 14:19 +0000 testID 1.1 create_instruments.sh
> M 2006-02-28 14:56 +0000 testID 1.2 create_instruments.sh
> A 2006-02-27 22:04 +0000 testID 1.1 create_market_instrument.sh
> A 2006-03-03 15:23 +0000 testID 1.1 daily_cvs_file_changes.sh
>
> M 2006-03-03 16:10 +0000 testID 1.2 daily_cvs_file_changes.sh
>
> M 2006-03-03 18:29 +0000 testID 1.3 daily_cvs_file_changes.sh
>
> M 2006-03-03 18:50 +0000 testID 1.4 daily_cvs_file_changes.sh
>
> M 2006-03-03 19:16 +0000 testID 1.5 daily_cvs_file_changes.sh
>
> A 2006-01-11 22:16 +0000 testID 1.1 dummy.txt
>
> M 2006-01-11 22:28 +0000 testID 1.2 dummy.txt
>
> M 2006-01-12 20:03 +0000 testID 1.4 dummy.txt
>
> A 2006-02-28 14:19 +0000 testID 1.1 exec_instrument.sh
> A 2006-02-27 22:04 +0000 testID 1.1 exec_market_instrument.sh
> A 2006-03-03 19:08 +0000 testID 1.1 test.sh
>
> A 2006-01-11 22:33 +0000 testID 1.1 test2.pl
>
> A 2006-03-01 18:10 +0000 testID 1.1 var_vol_scenario.java
>
> A 2006-02-08 21:57 +0000 testID 1.1 check_kbmacros_reports.sh
>
> A 2006-01-13 16:37 +0000 testID 1.1 dummylist.txt
>
> M 2005-12-15 20:56 +0000 testID 1.2 hello.pl
>
> A 2006-02-08 21:57 +0000 testID 1.1 scandiv.sh
>
> A 2006-01-17 21:59 +0000 testID 1.1 scanrate.sh
>
> A 2006-01-13 16:31 +0000 testID 1.1 scanvol.sh
>
> M 2006-01-13 16:37 +0000 testID 1.2 scanvol.sh
>
> M 2006-02-08 21:57 +0000 testID 1.3 scanvol.sh
>
> A 2006-02-08 21:57 +0000 testID 1.1 run_kbmacros.sh
>
> A 2006-02-08 21:57 +0000 testID 1.1 check_divs.sh
>
> A 2006-01-17 21:59 +0000 testID 1.1 check_rate
>
> A 2006-01-13 16:31 +0000 testID 1.1 check_rate_main.pl
>
> M 2006-01-13 16:37 +0000 testID 1.2 check_rate_main.pl
>
> A 2006-01-13 16:31 +0000 testID 1.1 check_vol
>
> M 2006-01-13 16:37 +0000 testID 1.2 check_vol
>
> A 2006-01-13 16:31 +0000 testID 1.1 check_vol_main.pl
>
> M 2006-01-13 16:37 +0000 testID 1.2 check_vol_main.pl
>
> A 2006-01-13 16:31 +0000 testID 1.1 check_vols.sh
>
> M 2006-01-25 23:43 +0000 testID 1.2 check_vols.sh
>
> A 2006-01-13 15:43 +0000 testID 1.1 check_rate.html
>
> M 2006-01-13 15:45 +0000 testID 1.2 check_rate.html
>
>
> for each unique filename, I have to know the filename (e.g.
> check_rate.html), and the 2 most recent revisions. (1.1 and 1.2)


The Ruby way. (Versions can be in any order, and the output
is sorted.)

ary = Hash.new { [] }
ARGF.each_line { |line|
next if "\n" == line
fields = line.split
ary[ fields[-1] ] <<= fields[-2]
}
ary.sort.each {|file,versions|
printf "%30s %s\n", file, versions.sort.reverse[0,2].join(" ")
}

Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com