| Ed Morton 2005-11-29, 5:58 pm |
| Meghavvarnam wrote:
<snip>
> I completely agree with you. Here is how the modified script looks
> like...
>
> gawk 'NR==FNR{strings[$0]++;next} {
> for (string in strings) {
> if (index($0,">"string"<") || index($0,"\""string"\"")
> || index($0,">"string"\n")) {
> usedStrings[string]++
> delete strings[string] # for efficiency
> } # if
> } # for loop
> }
> END {
> for (string in usedStrings)
> print string
> }' allStrings.txt htm/*.htm > usedStringsfile
<snip>
> Given that the script is saved in a file, it would help if you can tell
> me the correct way to run it from the command line.
>
> We need to get this working.. Help please !
OK, let's just focus on one version for now. If you have the above in a
file named "listused" and it's executable, then just execute it as
/path/listused as you appear to have been doing. So, there's really only
a couple of ways you'd get usedStringsfile empty:
1) allStrings.txt is empty, or
2) There are no files matching htm/*.htm, or
3) None of the files that match htm/*.htm contain any of the strings in
allStrings.txt
So, let's instrument the program for debugging:
gawk '{printf "Working on file %s\n",FILENAME}
NR==FNR{strings[$0]++;printf "Added string %s\n",$0;next}
{
for (string in strings) {
printf "Searching for string \"%s\" in line \"%s\"\n",string,$0
if (index($0,">"string"<") || index($0,"\""string"\"")
|| index($0,">"string"\n")) {
printf "Found string \"%s\" in line \"%s\"\n",string,$0
usedStrings[string]++
delete strings[string] # for efficiency
} # if
} # for loop
}
END {
for (string in usedStrings)
print string
}' allStrings.txt htm/*.htm > usedStringsfile
Then run that on a small sample input and see post the result.
Ed.
|