|
Home > Archive > Unix Programming > March 2006 > multi dimensional array in unix
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
multi dimensional array in unix
|
|
| John Smith 2006-03-03, 6:43 pm |
| Problem:
I have to parse the following *sample* input:
A 2006-03-02 22:59 +0000 testID 1.1 ScenarioCSGJVParser.java
M 2006-03-02 23:15 +0000 testID 1.2 ScenarioCSGJVParser.java
A 2006-02-27 19:41 +0000 testID 1.1 check_instrument.sh
M 2006-02-27 19:42 +0000 testID 1.2 check_instrument.sh
M 2006-02-27 22:01 +0000 testID 1.3 check_instrument.sh
M 2006-02-28 14:56 +0000 testID 1.4 check_instrument.sh
A 2006-02-28 14:19 +0000 testID 1.1 create_instruments.sh
M 2006-02-28 14:56 +0000 testID 1.2 create_instruments.sh
A 2006-02-27 22:04 +0000 testID 1.1 create_market_instrument.sh
A 2006-03-03 15:23 +0000 testID 1.1 daily_cvs_file_changes.sh
M 2006-03-03 16:10 +0000 testID 1.2 daily_cvs_file_changes.sh
M 2006-03-03 18:29 +0000 testID 1.3 daily_cvs_file_changes.sh
M 2006-03-03 18:50 +0000 testID 1.4 daily_cvs_file_changes.sh
M 2006-03-03 19:16 +0000 testID 1.5 daily_cvs_file_changes.sh
A 2006-01-11 22:16 +0000 testID 1.1 dummy.txt
M 2006-01-11 22:28 +0000 testID 1.2 dummy.txt
M 2006-01-12 20:03 +0000 testID 1.4 dummy.txt
A 2006-02-28 14:19 +0000 testID 1.1 exec_instrument.sh
A 2006-02-27 22:04 +0000 testID 1.1 exec_market_instrument.sh
A 2006-03-03 19:08 +0000 testID 1.1 test.sh
A 2006-01-11 22:33 +0000 testID 1.1 test2.pl
A 2006-03-01 18:10 +0000 testID 1.1 var_vol_scenario.java
A 2006-02-08 21:57 +0000 testID 1.1 check_kbmacros_reports.sh
A 2006-01-13 16:37 +0000 testID 1.1 dummylist.txt
M 2005-12-15 20:56 +0000 testID 1.2 hello.pl
A 2006-02-08 21:57 +0000 testID 1.1 scandiv.sh
A 2006-01-17 21:59 +0000 testID 1.1 scanrate.sh
A 2006-01-13 16:31 +0000 testID 1.1 scanvol.sh
M 2006-01-13 16:37 +0000 testID 1.2 scanvol.sh
M 2006-02-08 21:57 +0000 testID 1.3 scanvol.sh
A 2006-02-08 21:57 +0000 testID 1.1 run_kbmacros.sh
A 2006-02-08 21:57 +0000 testID 1.1 check_divs.sh
A 2006-01-17 21:59 +0000 testID 1.1 check_rate
A 2006-01-13 16:31 +0000 testID 1.1 check_rate_main.pl
M 2006-01-13 16:37 +0000 testID 1.2 check_rate_main.pl
A 2006-01-13 16:31 +0000 testID 1.1 check_vol
M 2006-01-13 16:37 +0000 testID 1.2 check_vol
A 2006-01-13 16:31 +0000 testID 1.1 check_vol_main.pl
M 2006-01-13 16:37 +0000 testID 1.2 check_vol_main.pl
A 2006-01-13 16:31 +0000 testID 1.1 check_vols.sh
M 2006-01-25 23:43 +0000 testID 1.2 check_vols.sh
A 2006-01-13 15:43 +0000 testID 1.1 check_rate.html
M 2006-01-13 15:45 +0000 testID 1.2 check_rate.html
for each unique filename, I have to know the filename (e.g.
check_rate.html), and the 2 most recent revisions. (1.1 and 1.2)
My difficulty:
I can't think up of a good way to store all the filenames I encounter
while I scan the input file. The input file varies with every run of
the script. e.g. I may get *only* version 1.1 with filename1 on day 1;
but I may get versions 1.2, 1.3, and 1.4 with filename1 on day2.
I also don't know how many unique filenames will be in the input file
everyday.
the only algorithm I can think of is:
for each line
pick up file name
check this file name against list of diff-ed files
if in list
skip current line
else # not in list
update $SecondLastRevision and $LastRevision
end if
end for
add filename to list of diff-ed files
repeat the above procedure with all unique filenames in the input file
but this algorithm is very complicated and somewhat inefficient.
Is there a better algorithm?
| |
| Barry Margolin 2006-03-03, 8:47 pm |
| In article <1141419233.753777.291610@t39g2000cwt.googlegroups.com>,
"John Smith" <wleung7@gmail.com> wrote:
> for each unique filename, I have to know the filename (e.g.
> check_rate.html), and the 2 most recent revisions. (1.1 and 1.2)
>
> My difficulty:
> I can't think up of a good way to store all the filenames I encounter
> while I scan the input file. The input file varies with every run of
> the script. e.g. I may get *only* version 1.1 with filename1 on day 1;
> but I may get versions 1.2, 1.3, and 1.4 with filename1 on day2.
>
> I also don't know how many unique filenames will be in the input file
> everyday.
>
> the only algorithm I can think of is:
>
>
>
> for each line
> pick up file name
> check this file name against list of diff-ed files
> if in list
> skip current line
> else # not in list
> update $SecondLastRevision and $LastRevision
> end if
> end for
> add filename to list of diff-ed files
This should be pretty straightforward to implement using associative
arrays (aka hashes) in awk or perl. "check this file name against list"
means looking up the name in the hash.
--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
| |
| Pascal Bourguignon 2006-03-03, 8:47 pm |
| "John Smith" <wleung7@gmail.com> writes:
> Problem:
> I have to parse the following *sample* input:
awk '
/^$/{next;}
{ version=$6;name=$7;
if(last[name]==""){
last[name]=version;
beforelast[name]=-1;
}else if(last[name]<version){
beforelast[name]=last[name];
last[name]=version;
}else if(beforelast[name]<version){
beforelast[name]=version;
}
}
END{
for(name in last){
if(beforelast[name]<0){
printf("%30s %-5s\n",name,last[name]);
}else{
printf("%30s %-5s %-5s\n",name,last[name],beforelast[name]);
}
}
}
'<<EOF
A 2006-03-02 22:59 +0000 testID 1.1 ScenarioCSGJVParser.java
M 2006-03-02 23:15 +0000 testID 1.2 ScenarioCSGJVParser.java
A 2006-02-27 19:41 +0000 testID 1.1 check_instrument.sh
M 2006-02-27 19:42 +0000 testID 1.2 check_instrument.sh
M 2006-02-27 22:01 +0000 testID 1.3 check_instrument.sh
M 2006-02-28 14:56 +0000 testID 1.4 check_instrument.sh
A 2006-02-28 14:19 +0000 testID 1.1 create_instruments.sh
M 2006-02-28 14:56 +0000 testID 1.2 create_instruments.sh
A 2006-02-27 22:04 +0000 testID 1.1 create_market_instrument.sh
A 2006-03-03 15:23 +0000 testID 1.1 daily_cvs_file_changes.sh
M 2006-03-03 16:10 +0000 testID 1.2 daily_cvs_file_changes.sh
M 2006-03-03 18:29 +0000 testID 1.3 daily_cvs_file_changes.sh
M 2006-03-03 18:50 +0000 testID 1.4 daily_cvs_file_changes.sh
M 2006-03-03 19:16 +0000 testID 1.5 daily_cvs_file_changes.sh
A 2006-01-11 22:16 +0000 testID 1.1 dummy.txt
M 2006-01-11 22:28 +0000 testID 1.2 dummy.txt
M 2006-01-12 20:03 +0000 testID 1.4 dummy.txt
A 2006-02-28 14:19 +0000 testID 1.1 exec_instrument.sh
A 2006-02-27 22:04 +0000 testID 1.1 exec_market_instrument.sh
A 2006-03-03 19:08 +0000 testID 1.1 test.sh
A 2006-01-11 22:33 +0000 testID 1.1 test2.pl
A 2006-03-01 18:10 +0000 testID 1.1 var_vol_scenario.java
A 2006-02-08 21:57 +0000 testID 1.1 check_kbmacros_reports.sh
A 2006-01-13 16:37 +0000 testID 1.1 dummylist.txt
M 2005-12-15 20:56 +0000 testID 1.2 hello.pl
A 2006-02-08 21:57 +0000 testID 1.1 scandiv.sh
A 2006-01-17 21:59 +0000 testID 1.1 scanrate.sh
A 2006-01-13 16:31 +0000 testID 1.1 scanvol.sh
M 2006-01-13 16:37 +0000 testID 1.2 scanvol.sh
M 2006-02-08 21:57 +0000 testID 1.3 scanvol.sh
A 2006-02-08 21:57 +0000 testID 1.1 run_kbmacros.sh
A 2006-02-08 21:57 +0000 testID 1.1 check_divs.sh
A 2006-01-17 21:59 +0000 testID 1.1 check_rate
A 2006-01-13 16:31 +0000 testID 1.1 check_rate_main.pl
M 2006-01-13 16:37 +0000 testID 1.2 check_rate_main.pl
A 2006-01-13 16:31 +0000 testID 1.1 check_vol
M 2006-01-13 16:37 +0000 testID 1.2 check_vol
A 2006-01-13 16:31 +0000 testID 1.1 check_vol_main.pl
M 2006-01-13 16:37 +0000 testID 1.2 check_vol_main.pl
A 2006-01-13 16:31 +0000 testID 1.1 check_vols.sh
M 2006-01-25 23:43 +0000 testID 1.2 check_vols.sh
A 2006-01-13 15:43 +0000 testID 1.1 check_rate.html
M 2006-01-13 15:45 +0000 testID 1.2 check_rate.html
EOF
check_rate.html 1.2 1.1
run_kbmacros.sh 1.1
check_rate 1.1
check_divs.sh 1.1
scanvol.sh 1.3 1.2
check_vol_main.pl 1.2 1.1
check_vol 1.2 1.1
check_rate_main.pl 1.2 1.1
test.sh 1.1
dummy.txt 1.4 1.2
scandiv.sh 1.1
check_kbmacros_reports.sh 1.1
var_vol_scenario.java 1.1
daily_cvs_file_changes.sh 1.5 1.4
ScenarioCSGJVParser.java 1.2 1.1
scanrate.sh 1.1
hello.pl 1.2
dummylist.txt 1.1
exec_market_instrument.sh 1.1
create_market_instrument.sh 1.1
check_vols.sh 1.2 1.1
exec_instrument.sh 1.1
create_instruments.sh 1.2 1.1
test2.pl 1.1
check_instrument.sh 1.4 1.3
--
__Pascal Bourguignon__ http://www.informatimago.com/
"You question the worthiness of my code? I should kill you where you
stand!"
| |
| William James 2006-03-04, 7:47 am |
| John Smith wrote:
> Problem:
> I have to parse the following *sample* input:
>
> A 2006-03-02 22:59 +0000 testID 1.1 ScenarioCSGJVParser.java
>
> M 2006-03-02 23:15 +0000 testID 1.2 ScenarioCSGJVParser.java
>
> A 2006-02-27 19:41 +0000 testID 1.1 check_instrument.sh
>
> M 2006-02-27 19:42 +0000 testID 1.2 check_instrument.sh
>
> M 2006-02-27 22:01 +0000 testID 1.3 check_instrument.sh
>
> M 2006-02-28 14:56 +0000 testID 1.4 check_instrument.sh
>
> A 2006-02-28 14:19 +0000 testID 1.1 create_instruments.sh
> M 2006-02-28 14:56 +0000 testID 1.2 create_instruments.sh
> A 2006-02-27 22:04 +0000 testID 1.1 create_market_instrument.sh
> A 2006-03-03 15:23 +0000 testID 1.1 daily_cvs_file_changes.sh
>
> M 2006-03-03 16:10 +0000 testID 1.2 daily_cvs_file_changes.sh
>
> M 2006-03-03 18:29 +0000 testID 1.3 daily_cvs_file_changes.sh
>
> M 2006-03-03 18:50 +0000 testID 1.4 daily_cvs_file_changes.sh
>
> M 2006-03-03 19:16 +0000 testID 1.5 daily_cvs_file_changes.sh
>
> A 2006-01-11 22:16 +0000 testID 1.1 dummy.txt
>
> M 2006-01-11 22:28 +0000 testID 1.2 dummy.txt
>
> M 2006-01-12 20:03 +0000 testID 1.4 dummy.txt
>
> A 2006-02-28 14:19 +0000 testID 1.1 exec_instrument.sh
> A 2006-02-27 22:04 +0000 testID 1.1 exec_market_instrument.sh
> A 2006-03-03 19:08 +0000 testID 1.1 test.sh
>
> A 2006-01-11 22:33 +0000 testID 1.1 test2.pl
>
> A 2006-03-01 18:10 +0000 testID 1.1 var_vol_scenario.java
>
> A 2006-02-08 21:57 +0000 testID 1.1 check_kbmacros_reports.sh
>
> A 2006-01-13 16:37 +0000 testID 1.1 dummylist.txt
>
> M 2005-12-15 20:56 +0000 testID 1.2 hello.pl
>
> A 2006-02-08 21:57 +0000 testID 1.1 scandiv.sh
>
> A 2006-01-17 21:59 +0000 testID 1.1 scanrate.sh
>
> A 2006-01-13 16:31 +0000 testID 1.1 scanvol.sh
>
> M 2006-01-13 16:37 +0000 testID 1.2 scanvol.sh
>
> M 2006-02-08 21:57 +0000 testID 1.3 scanvol.sh
>
> A 2006-02-08 21:57 +0000 testID 1.1 run_kbmacros.sh
>
> A 2006-02-08 21:57 +0000 testID 1.1 check_divs.sh
>
> A 2006-01-17 21:59 +0000 testID 1.1 check_rate
>
> A 2006-01-13 16:31 +0000 testID 1.1 check_rate_main.pl
>
> M 2006-01-13 16:37 +0000 testID 1.2 check_rate_main.pl
>
> A 2006-01-13 16:31 +0000 testID 1.1 check_vol
>
> M 2006-01-13 16:37 +0000 testID 1.2 check_vol
>
> A 2006-01-13 16:31 +0000 testID 1.1 check_vol_main.pl
>
> M 2006-01-13 16:37 +0000 testID 1.2 check_vol_main.pl
>
> A 2006-01-13 16:31 +0000 testID 1.1 check_vols.sh
>
> M 2006-01-25 23:43 +0000 testID 1.2 check_vols.sh
>
> A 2006-01-13 15:43 +0000 testID 1.1 check_rate.html
>
> M 2006-01-13 15:45 +0000 testID 1.2 check_rate.html
>
>
> for each unique filename, I have to know the filename (e.g.
> check_rate.html), and the 2 most recent revisions. (1.1 and 1.2)
>
> My difficulty:
> I can't think up of a good way to store all the filenames I encounter
> while I scan the input file. The input file varies with every run of
> the script. e.g. I may get *only* version 1.1 with filename1 on day 1;
> but I may get versions 1.2, 1.3, and 1.4 with filename1 on day2.
>
> I also don't know how many unique filenames will be in the input file
> everyday.
>
> the only algorithm I can think of is:
>
>
>
> for each line
> pick up file name
> check this file name against list of diff-ed files
> if in list
> skip current line
> else # not in list
> update $SecondLastRevision and $LastRevision
> end if
> end for
> add filename to list of diff-ed files
>
>
> repeat the above procedure with all unique filenames in the input file
>
> but this algorithm is very complicated and somewhat inefficient.
> Is there a better algorithm?
The awk way. (Assumes that versions are in ascending order.)
NF {
next_to_last[ $NF ] = last[ $NF ]
last[ $NF ] = $(NF-1)
}
END {
for ( file in last )
{ s = sprintf( "%30s %-4s %s", file, last[file], next_to_last[file] )
sub( / +$/, "", s )
print s
}
}
| |
| William James 2006-03-04, 7:47 am |
| John Smith wrote:
> Problem:
> I have to parse the following *sample* input:
>
> A 2006-03-02 22:59 +0000 testID 1.1 ScenarioCSGJVParser.java
>
> M 2006-03-02 23:15 +0000 testID 1.2 ScenarioCSGJVParser.java
>
> A 2006-02-27 19:41 +0000 testID 1.1 check_instrument.sh
>
> M 2006-02-27 19:42 +0000 testID 1.2 check_instrument.sh
>
> M 2006-02-27 22:01 +0000 testID 1.3 check_instrument.sh
>
> M 2006-02-28 14:56 +0000 testID 1.4 check_instrument.sh
>
> A 2006-02-28 14:19 +0000 testID 1.1 create_instruments.sh
> M 2006-02-28 14:56 +0000 testID 1.2 create_instruments.sh
> A 2006-02-27 22:04 +0000 testID 1.1 create_market_instrument.sh
> A 2006-03-03 15:23 +0000 testID 1.1 daily_cvs_file_changes.sh
>
> M 2006-03-03 16:10 +0000 testID 1.2 daily_cvs_file_changes.sh
>
> M 2006-03-03 18:29 +0000 testID 1.3 daily_cvs_file_changes.sh
>
> M 2006-03-03 18:50 +0000 testID 1.4 daily_cvs_file_changes.sh
>
> M 2006-03-03 19:16 +0000 testID 1.5 daily_cvs_file_changes.sh
>
> A 2006-01-11 22:16 +0000 testID 1.1 dummy.txt
>
> M 2006-01-11 22:28 +0000 testID 1.2 dummy.txt
>
> M 2006-01-12 20:03 +0000 testID 1.4 dummy.txt
>
> A 2006-02-28 14:19 +0000 testID 1.1 exec_instrument.sh
> A 2006-02-27 22:04 +0000 testID 1.1 exec_market_instrument.sh
> A 2006-03-03 19:08 +0000 testID 1.1 test.sh
>
> A 2006-01-11 22:33 +0000 testID 1.1 test2.pl
>
> A 2006-03-01 18:10 +0000 testID 1.1 var_vol_scenario.java
>
> A 2006-02-08 21:57 +0000 testID 1.1 check_kbmacros_reports.sh
>
> A 2006-01-13 16:37 +0000 testID 1.1 dummylist.txt
>
> M 2005-12-15 20:56 +0000 testID 1.2 hello.pl
>
> A 2006-02-08 21:57 +0000 testID 1.1 scandiv.sh
>
> A 2006-01-17 21:59 +0000 testID 1.1 scanrate.sh
>
> A 2006-01-13 16:31 +0000 testID 1.1 scanvol.sh
>
> M 2006-01-13 16:37 +0000 testID 1.2 scanvol.sh
>
> M 2006-02-08 21:57 +0000 testID 1.3 scanvol.sh
>
> A 2006-02-08 21:57 +0000 testID 1.1 run_kbmacros.sh
>
> A 2006-02-08 21:57 +0000 testID 1.1 check_divs.sh
>
> A 2006-01-17 21:59 +0000 testID 1.1 check_rate
>
> A 2006-01-13 16:31 +0000 testID 1.1 check_rate_main.pl
>
> M 2006-01-13 16:37 +0000 testID 1.2 check_rate_main.pl
>
> A 2006-01-13 16:31 +0000 testID 1.1 check_vol
>
> M 2006-01-13 16:37 +0000 testID 1.2 check_vol
>
> A 2006-01-13 16:31 +0000 testID 1.1 check_vol_main.pl
>
> M 2006-01-13 16:37 +0000 testID 1.2 check_vol_main.pl
>
> A 2006-01-13 16:31 +0000 testID 1.1 check_vols.sh
>
> M 2006-01-25 23:43 +0000 testID 1.2 check_vols.sh
>
> A 2006-01-13 15:43 +0000 testID 1.1 check_rate.html
>
> M 2006-01-13 15:45 +0000 testID 1.2 check_rate.html
>
>
> for each unique filename, I have to know the filename (e.g.
> check_rate.html), and the 2 most recent revisions. (1.1 and 1.2)
The Ruby way. (Versions can be in any order, and the output
is sorted.)
ary = Hash.new { [] }
ARGF.each_line { |line|
next if "\n" == line
fields = line.split
ary[ fields[-1] ] <<= fields[-2]
}
ary.sort.each {|file,versions|
printf "%30s %s\n", file, versions.sort.reverse[0,2].join(" ")
}
|
|
|
|
|