 |
|
 |
|
01-18-06 10:55 PM
I need a script to automatically modify PostScript files. The
PostScript file has patterns like following:
...
TR12 setfont (Page 1 of 4 ) 2 0 12 515 755 st
0 1 550 539 575 514 box
( ) 0 1 12 552 524.5 st
1 5 0 537 526 tr
1 2 180 161 324 sq <--- Pattern (1)
1 2 90 71 330 tr <--- Pattern (1)
1 2 180 196 337 tr <--- Pattern (1)
454 320 446 320 dl
...
TR12 setfont (Page 2 of 4 ) 2 0 12 515 755 st
1 5 0 161 324 tr <--- Pattern (2)
1 5 0 71 330 tr <--- Pattern (2)
1 5 0 196 337 tr <--- Pattern (2)
1 5 45 193 346 sq
...
Pattern (1) is only in Page 1, its column 1-3 and 6 can be something
else.
Pattern (2) is only in Page 2, its column 1-3 and 6 are always same,
like "1 5 0 x y tr"
What I need is
1) Find the Pattern (2) first by /^1 5 0 [1-9][0-9]*[0-9]*
[1-9][0-9]*[0-9]* tr$/, save the column 4 and 5 as a string (pat
tern
x).
2) Then, search the string (pattern x) and delete the line in Page 1.
3) Delete all the lines found by Pattern (2) on Page 2.
What's the best way to do these?
Thanks for any suggestions.
Scott
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
01-19-06 01:49 AM
zcui@yahoo.com wrote:
> I need a script to automatically modify PostScript files. The
> PostScript file has patterns like following:
>
> ...
> TR12 setfont (Page 1 of 4 ) 2 0 12 515 755 st
> 0 1 550 539 575 514 box
> ( ) 0 1 12 552 524.5 st
> 1 5 0 537 526 tr
> 1 2 180 161 324 sq <--- Pattern (1)
> 1 2 90 71 330 tr <--- Pattern (1)
> 1 2 180 196 337 tr <--- Pattern (1)
> 454 320 446 320 dl
> ...
> TR12 setfont (Page 2 of 4 ) 2 0 12 515 755 st
> 1 5 0 161 324 tr <--- Pattern (2)
> 1 5 0 71 330 tr <--- Pattern (2)
> 1 5 0 196 337 tr <--- Pattern (2)
> 1 5 45 193 346 sq
> ...
>
> Pattern (1) is only in Page 1, its column 1-3 and 6 can be something
> else.
> Pattern (2) is only in Page 2, its column 1-3 and 6 are always same,
> like "1 5 0 x y tr"
>
> What I need is
> 1) Find the Pattern (2) first by /^1 5 0 [1-9][0-9]*[0-9]*
> [1-9][0-9]*[0-9]* tr$/, save the column 4 and 5 as a string (p
attern
> x).
> 2) Then, search the string (pattern x) and delete the line in Page 1.
> 3) Delete all the lines found by Pattern (2) on Page 2.
>
> What's the best way to do these?
>
> Thanks for any suggestions.
> Scott
>
I'd use awk to solve that.
If I understand your task correct you'll need a two-pass processing; so
call your program, e.g. as in
awk -f yourprog.awk yourdata.ps yourdata.ps
Your awk program needs commands like the subsequent ones... (untested)
Identify the page and memorize the state
/TR12 setfont (Page / { onPage1 = 0; onPage2 = 0 }
/TR12 setfont (Page 1 / { onPage1 = 1; onPage2 = 0 }
/TR12 setfont (Page 2 / { onPage1 = 0; onPage2 = 1 }
In the first pass find pattern only on page 2 and store data for second pass
(NR == FNR) && onPage2 && /^1 5 0 [1-9][0-9]*[0-9]*/ { stor
e[$4,$5] }
In the second pass suppress on page 1 output of stored patterns
(NR != FNR) && onPage1 && (($4,$5) in store) { next }
Print all the rest in the second pass
NR != FNR
The complete awk program are the six lines above, and, as I said, untested.
It might already work as you expect, but if not it might give you at least
some hints how to approximate this type of problems using awk.
Hope that helps.
Janis
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
01-19-06 07:57 AM
zcui@yahoo.com wrote:
> I need a script to automatically modify PostScript files. The
> PostScript file has patterns like following:
>
> ...
> TR12 setfont (Page 1 of 4 ) 2 0 12 515 755 st
> 0 1 550 539 575 514 box
> ( ) 0 1 12 552 524.5 st
> 1 5 0 537 526 tr
> 1 2 180 161 324 sq <--- Pattern (1)
> 1 2 90 71 330 tr <--- Pattern (1)
> 1 2 180 196 337 tr <--- Pattern (1)
> 454 320 446 320 dl
> ...
> TR12 setfont (Page 2 of 4 ) 2 0 12 515 755 st
> 1 5 0 161 324 tr <--- Pattern (2)
> 1 5 0 71 330 tr <--- Pattern (2)
> 1 5 0 196 337 tr <--- Pattern (2)
> 1 5 45 193 346 sq
> ...
>
> Pattern (1) is only in Page 1, its column 1-3 and 6 can be something
> else.
> Pattern (2) is only in Page 2, its column 1-3 and 6 are always same,
> like "1 5 0 x y tr"
>
> What I need is
> 1) Find the Pattern (2) first by /^1 5 0 [1-9][0-9]*[0-9]*
> [1-9][0-9]*[0-9]* tr$/, save the column 4 and 5 as a string (p
attern
> x).
> 2) Then, search the string (pattern x) and delete the line in Page 1.
> 3) Delete all the lines found by Pattern (2) on Page 2.
It can be done both with sed and awk. Here is a 2-pass sed solution
(you may need to do some adjustment to fit your situation).
$ sed -f <(sed -n -e '/(Page 2/,/(Page 3/{s:^1 5 0 \(.*\) tr.*:/\1/d:p'
-e } pfile) pfile
TR12 setfont (Page 1 of 4 ) 2 0 12 515 755 st
0 1 550 539 575 514 box
( ) 0 1 12 552 524.5 st
1 5 0 537 526 tr
454 320 446 320 dl
...
TR12 setfont (Page 2 of 4 ) 2 0 12 515 755 st
1 5 45 193 346 sq
...
The logic is straight forward. Firstly make a sed script from the input
file:
/161 324/d
/71 330/d
/196 337/d
Then with the script, we can delete all lines (both in page1 and page2)
contains the pattern (2). To get an more accurate result try to
generate a script like this by your self:
/^\(.[^ ]* \)\{3\}161 324 /d
/^\(.[^ ]* \)\{3\}71 330 /d
/^\(.[^ ]* \)\{3\}196 337/d
--
Regards,
hq00e
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
01-19-06 07:57 AM
zcui@yahoo.com wrote:
> I need a script to automatically modify PostScript files. The
> PostScript file has patterns like following:
>
> ...
> TR12 setfont (Page 1 of 4 ) 2 0 12 515 755 st
> 0 1 550 539 575 514 box
> ( ) 0 1 12 552 524.5 st
> 1 5 0 537 526 tr
> 1 2 180 161 324 sq <--- Pattern (1)
> 1 2 90 71 330 tr <--- Pattern (1)
> 1 2 180 196 337 tr <--- Pattern (1)
> 454 320 446 320 dl
> ...
> TR12 setfont (Page 2 of 4 ) 2 0 12 515 755 st
> 1 5 0 161 324 tr <--- Pattern (2)
> 1 5 0 71 330 tr <--- Pattern (2)
> 1 5 0 196 337 tr <--- Pattern (2)
> 1 5 45 193 346 sq
> ...
>
> Pattern (1) is only in Page 1, its column 1-3 and 6 can be something
> else.
> Pattern (2) is only in Page 2, its column 1-3 and 6 are always same,
> like "1 5 0 x y tr"
>
> What I need is
> 1) Find the Pattern (2) first by /^1 5 0 [1-9][0-9]*[0-9]*
> [1-9][0-9]*[0-9]* tr$/, save the column 4 and 5 as a string (p
attern
> x).
> 2) Then, search the string (pattern x) and delete the line in Page 1.
> 3) Delete all the lines found by Pattern (2) on Page 2.
>
> What's the best way to do these?
>
> Thanks for any suggestions.
> Scott
Using Ruby:
a = [['0']]
while gets
a << [ $1 ] if $_ =~ /^TR12 setfont \(Page (\d) of /
a.last << $_
end
a.assoc('2').reject!{ |s|
if md = /^1 5 0( \d+ \d+ )/.match( s )
a.assoc('1').reject!{ |x| x =~ /^\d+ \d+ \d+#{md[1]}/ }
end
}
a.each{ |x| puts x[1..-1] }
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
01-19-06 01:11 PM
zcui@yahoo.com wrote:
> I need a script to automatically modify PostScript files. The
> PostScript file has patterns like following:
>
> ...
> TR12 setfont (Page 1 of 4 ) 2 0 12 515 755 st
> 0 1 550 539 575 514 box
> ( ) 0 1 12 552 524.5 st
> 1 5 0 537 526 tr
> 1 2 180 161 324 sq <--- Pattern (1)
> 1 2 90 71 330 tr <--- Pattern (1)
> 1 2 180 196 337 tr <--- Pattern (1)
> 454 320 446 320 dl
> ...
> TR12 setfont (Page 2 of 4 ) 2 0 12 515 755 st
> 1 5 0 161 324 tr <--- Pattern (2)
> 1 5 0 71 330 tr <--- Pattern (2)
> 1 5 0 196 337 tr <--- Pattern (2)
> 1 5 45 193 346 sq
> ...
>
> Pattern (1) is only in Page 1, its column 1-3 and 6 can be something
> else.
> Pattern (2) is only in Page 2, its column 1-3 and 6 are always same,
> like "1 5 0 x y tr"
>
> What I need is
> 1) Find the Pattern (2) first by /^1 5 0 [1-9][0-9]*[0-9]*
> [1-9][0-9]*[0-9]* tr$/, save the column 4 and 5 as a string (p
attern
> x).
> 2) Then, search the string (pattern x) and delete the line in Page 1.
> 3) Delete all the lines found by Pattern (2) on Page 2.
>
> What's the best way to do these?
>
> Thanks for any suggestions.
> Scott
>
Try this:
$ cat file
...
TR12 setfont (Page 1 of 4 ) 2 0 12 515 755 st
0 1 550 539 575 514 box
( ) 0 1 12 552 524.5 st
1 5 0 537 526 tr
1 2 180 161 324 sq
1 2 90 71 330 tr
1 2 180 196 337 tr
454 320 446 320 dl
...
TR12 setfont (Page 2 of 4 ) 2 0 12 515 755 st
1 5 0 161 324 tr
1 5 0 71 330 tr
1 5 0 196 337 tr
1 5 45 193 346 sq
...
$ cat rmvLines.awk
BEGIN{ ARGV[ARGC++] = ARGV[1]; phase = 0 }
/Page 1/ && (phase == 0) { phase = 1 } # first pass, page 1
/Page 2/ && (phase == 1) { phase = 2 } # first pass, page 2
/Page 3/ && (phase == 2) { phase = 3 } # first pass, page 3+
/Page 1/ && (phase != 1) { phase = 4 } # second pass, page 1
/Page 2/ && (phase == 4) { phase = 5 } # second pass, page 2
/Page 3/ && (phase == 5) { phase = 6 } # second pass, page 3+
{ key = $4 " " $5 }
phase == 2 && /^1 5 0 [1-9][0-9]* [1-9][0-9]* tr$/ { ke
ys[key] }
phase ~ /4|5/ && !(key in keys) { print }
phase ~ /0|6/ { print }
$ awk -f rmvLines.awk file
...
TR12 setfont (Page 1 of 4 ) 2 0 12 515 755 st
0 1 550 539 575 514 box
( ) 0 1 12 552 524.5 st
1 5 0 537 526 tr
454 320 446 320 dl
...
TR12 setfont (Page 2 of 4 ) 2 0 12 515 755 st
1 5 45 193 346 sq
...
Regards,
Ed.
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
01-19-06 11:24 PM
zcui@yahoo.com wrote:
> I need a script to automatically modify PostScript files. The
> PostScript file has patterns like following:
>
> ...
> TR12 setfont (Page 1 of 4 ) 2 0 12 515 755 st
> 0 1 550 539 575 514 box
> ( ) 0 1 12 552 524.5 st
> 1 5 0 537 526 tr
> 1 2 180 161 324 sq <--- Pattern (1)
> 1 2 90 71 330 tr <--- Pattern (1)
> 1 2 180 196 337 tr <--- Pattern (1)
> 454 320 446 320 dl
> ...
> TR12 setfont (Page 2 of 4 ) 2 0 12 515 755 st
> 1 5 0 161 324 tr <--- Pattern (2)
> 1 5 0 71 330 tr <--- Pattern (2)
> 1 5 0 196 337 tr <--- Pattern (2)
> 1 5 45 193 346 sq
> ...
>
> Pattern (1) is only in Page 1, its column 1-3 and 6 can be something
> else.
> Pattern (2) is only in Page 2, its column 1-3 and 6 are always same,
> like "1 5 0 x y tr"
>
> What I need is
> 1) Find the Pattern (2) first by /^1 5 0 [1-9][0-9]*[0-9]*
> [1-9][0-9]*[0-9]* tr$/, save the column 4 and 5 as a string (p
attern
> x).
This is the most difficult part. To get 3 lines from 'TR12 setfont...',
then slice out 'x' and 'y' (column 4 and 5),
sed -n -e '/^TR12 setfont (Page 2 of 4/ {n; N; N; p; q}' \
| awk '{print $4, $5}' > x_y
> 2) Then, search the string (pattern x) and delete the line in Page 1.
Something like
sed '/^TR12 setfont (Page 1 of 4/,/Page 2/ { /.../d; /.../d; /.../d;}'
> 3) Delete all the lines found by Pattern (2) on Page 2.
>
> What's the best way to do these?
>
> Thanks for any suggestions.
> Scott
--
William Park <opengeometry@yahoo.ca>, Toronto, Canada
ThinFlash: Linux thin-client on USB key (flash) drive
http://home.eol.ca/~parkw/thinflash.html
BashDiff: Super Bash shell
http://freshmeat.net/projects/bashdiff/
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
01-20-06 11:03 PM
Thank you all for the help.
Since I have to modify the data for serval other different patterns
besides the pattern 1 and 2, I finally wrote a PERL program to do the
job.
I'll take your suggestions to try Sed and Awk later when I have time.
Thanks again.
Scott
[ Post a follow-up to this message ]
|
|
|
 |
|
|
01-20-06 11:03 PM
zcui@yahoo.com wrote:
> Thank you all for the help.
>
> Since I have to modify the data for serval other different patterns
> besides the pattern 1 and 2, I finally wrote a PERL program to do the
> job.
>
> I'll take your suggestions to try Sed and Awk later when I have time.
>
> Thanks again.
> Scott
>
Would you ming posting the PERL program so we can see how it looks in
contrast to the awk program?
Ed.
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
01-20-06 11:03 PM
How about this?
open F,$ARGV[0];
undef %pat;
while (<F> ) {
$N = $1 if /Page (\d+) of /;
$pat{$1} = 1 if $N == 2 && /^1 5 0 (\d+ \d+) tr$/;
}
seek(F,0,0);
while (<F> ) {
$N = $1 if /Page (\d+) of /;
next if $N < 3 && /^\d+ \d+ \d+ (\d+ \d+) / && $pat{$1};
print;
}
It would be nice if awk can refer to the matched pattern directly.
James
Ed Morton wrote:
> zcui@yahoo.com wrote:
>
>
> Would you ming posting the PERL program so we can see how it looks in
> contrast to the awk program?
>
> Ed.
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
01-21-06 07:49 AM
James wrote:
> How about this?
>
> open F,$ARGV[0];
> undef %pat;
> while (<F> ) {
> $N = $1 if /Page (\d+) of /;
> $pat{$1} = 1 if $N == 2 && /^1 5 0 (\d+ \d+) tr$/;
If I understand it correctly, \d represents any digit, but the OP only
wants numbers that start with 1-9, not with zero.
> }
> seek(F,0,0);
I assume the above gets you back to the start of the file, so you're
about to start the second phase of parsing.
> while (<F> ) {
> $N = $1 if /Page (\d+) of /;
Do you need to reset $N to zero before starting this second loop so it
doesn't retain it's value from the first pass for the header text
preceeding "Page 1"?
> next if $N < 3 && /^\d+ \d+ \d+ (\d+ \d+) / && $pat{$1};
> print;
> }
> It would be nice if awk can refer to the matched pattern directly.
Yes, it would. Thanks for posting that (but please don't top-post in
future).
In case anyone else finds it interesting, the equivalent design written
in awk (given the OPs posted input format) would be:
BEGIN{ ARGV[ARGC++] = ARGV[1] }
/Page [[:digit:]] of / { N = $4 }
{ key = $4" "$5 }
NR == FNR {
pat[key] = (N == 2 && /^1 5 0 ([[:digit:]]+ ){2}tr$/ ? 1 :
0)
next
}
N < 3 && /^([[:digit:]]+ ){5}/ && pat[key] { next }
{ print }
The awk ones a little shorter, but there's not a huge difference,
really. The main difference, I think, is that as James pointed out, in
awk you can't refer to matched patterns so I had to explicitly hard-code
the field numbers to create "key" and "N", which was fine in this case
but could present problems in other siutations.
Obviously, if your awk doesn't support RE intervals (supported by gawk
--re-interval, or any POSIX awk), then you need to write the full
"[[:digit:]]+ [[:digit:]]+ [[:digit:]]+ [[:d
igit:]]+ [[:digit:]]+ "
instead of just "([[:digit:]]+ ){5}", etc.
Regards,
Ed.
> James
>
> Ed Morton wrote:
>
>
>
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
|
Sponsored Links |
 |
 |
|
|
 |
All times are GMT. The time now is 06:05 PM. |
 |
|
|
 |
|
 |
|
|
 |
|
Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
|
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
|
|
|
|
Medical and Health forum | Computer Games Reviews | Graphics design forum
|
 |
|
 |
|