Sed, Awk, or both?
Web Server forum
Back To The Forum Home!Search!Private Messaging System

Web Server Talk Web Server Talk > Unix and Linux reviews > Free Unix support > Unix Shell > Sed, Awk, or both?




Pages (2): [1] 2 »   Last Thread   Next Thread Next
  Show Printable Version Email this Page Subscribe to this Thread      Post New Thread    Post A Reply      

    Sed, Awk, or both?  
zcui@yahoo.com


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
01-18-06 10:55 PM

I need a script to automatically modify PostScript files.  The
PostScript file has patterns like following:

...
TR12 setfont (Page 1 of 4         ) 2 0 12 515 755 st
0 1 550 539 575 514 box
( ) 0 1 12 552 524.5 st
1 5 0 537 526 tr
1 2 180 161 324 sq               <--- Pattern (1)
1 2 90 71 330 tr                 <--- Pattern (1)
1 2 180 196 337 tr               <--- Pattern (1)
454 320 446 320 dl
...
TR12 setfont (Page 2 of 4         ) 2 0 12 515 755 st
1 5 0 161 324 tr                  <--- Pattern (2)
1 5 0 71 330 tr                   <--- Pattern (2)
1 5 0 196 337 tr                  <--- Pattern (2)
1 5 45 193 346 sq
...

Pattern (1) is only in Page 1, its column 1-3 and 6 can be something
else.
Pattern (2) is only in Page 2, its column 1-3 and 6 are always same,
like "1   5   0   x   y    tr"

What I need is
1) Find the Pattern (2) first by /^1 5 0 [1-9][0-9]*[0-9]*
[1-9][0-9]*[0-9]* tr$/, save the column 4 and 5 as a string (pat
tern
x).
2) Then, search the string (pattern x) and delete the line in Page 1.
3) Delete all the lines found by Pattern (2) on Page 2.

What's the best way to do these?

Thanks for any suggestions.
Scott






[ Post a follow-up to this message ]



    Re: Sed, Awk, or both?  
Janis Papanagnou


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
01-19-06 01:49 AM

zcui@yahoo.com wrote:
> I need a script to automatically modify PostScript files.  The
> PostScript file has patterns like following:
>
> ...
> TR12 setfont (Page 1 of 4         ) 2 0 12 515 755 st
> 0 1 550 539 575 514 box
> ( ) 0 1 12 552 524.5 st
> 1 5 0 537 526 tr
> 1 2 180 161 324 sq               <--- Pattern (1)
> 1 2 90 71 330 tr                 <--- Pattern (1)
> 1 2 180 196 337 tr               <--- Pattern (1)
> 454 320 446 320 dl
> ...
> TR12 setfont (Page 2 of 4         ) 2 0 12 515 755 st
> 1 5 0 161 324 tr                  <--- Pattern (2)
> 1 5 0 71 330 tr                   <--- Pattern (2)
> 1 5 0 196 337 tr                  <--- Pattern (2)
> 1 5 45 193 346 sq
> ...
>
> Pattern (1) is only in Page 1, its column 1-3 and 6 can be something
> else.
> Pattern (2) is only in Page 2, its column 1-3 and 6 are always same,
> like "1   5   0   x   y    tr"
>
> What I need is
> 1) Find the Pattern (2) first by /^1 5 0 [1-9][0-9]*[0-9]*
> [1-9][0-9]*[0-9]* tr$/, save the column 4 and 5 as a string (p
attern
> x).
> 2) Then, search the string (pattern x) and delete the line in Page 1.
> 3) Delete all the lines found by Pattern (2) on Page 2.
>
> What's the best way to do these?
>
> Thanks for any suggestions.
> Scott
>

I'd use awk to solve that.

If I understand your task correct you'll need a two-pass processing; so
call your program, e.g. as in

awk -f yourprog.awk yourdata.ps yourdata.ps

Your awk program needs commands like the subsequent ones... (untested)

Identify the page and memorize the state

/TR12 setfont (Page /   { onPage1 = 0; onPage2 = 0 }
/TR12 setfont (Page 1 / { onPage1 = 1; onPage2 = 0 }
/TR12 setfont (Page 2 / { onPage1 = 0; onPage2 = 1 }

In the first pass find pattern only on page 2 and store data for second pass

(NR == FNR) && onPage2 && /^1 5 0 [1-9][0-9]*[0-9]*/ { stor
e[$4,$5] }

In the second pass suppress on page 1 output of stored patterns

(NR != FNR) && onPage1 && (($4,$5) in store) { next }

Print all the rest in the second pass

NR != FNR

The complete awk program are the six lines above, and, as I said, untested.
It might already work as you expect, but if not it might give you at least
some hints how to approximate this type of problems using awk.

Hope that helps.

Janis





[ Post a follow-up to this message ]



    Re: Sed, Awk, or both?  
hq00e


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
01-19-06 07:57 AM


zcui@yahoo.com wrote:
> I need a script to automatically modify PostScript files.  The
> PostScript file has patterns like following:
>
> ...
> TR12 setfont (Page 1 of 4         ) 2 0 12 515 755 st
> 0 1 550 539 575 514 box
> ( ) 0 1 12 552 524.5 st
> 1 5 0 537 526 tr
> 1 2 180 161 324 sq               <--- Pattern (1)
> 1 2 90 71 330 tr                 <--- Pattern (1)
> 1 2 180 196 337 tr               <--- Pattern (1)
> 454 320 446 320 dl
> ...
> TR12 setfont (Page 2 of 4         ) 2 0 12 515 755 st
> 1 5 0 161 324 tr                  <--- Pattern (2)
> 1 5 0 71 330 tr                   <--- Pattern (2)
> 1 5 0 196 337 tr                  <--- Pattern (2)
> 1 5 45 193 346 sq
> ...
>
> Pattern (1) is only in Page 1, its column 1-3 and 6 can be something
> else.
> Pattern (2) is only in Page 2, its column 1-3 and 6 are always same,
> like "1   5   0   x   y    tr"
>
> What I need is
> 1) Find the Pattern (2) first by /^1 5 0 [1-9][0-9]*[0-9]*
> [1-9][0-9]*[0-9]* tr$/, save the column 4 and 5 as a string (p
attern
> x).
> 2) Then, search the string (pattern x) and delete the line in Page 1.
> 3) Delete all the lines found by Pattern (2) on Page 2.

It can be done both with sed and awk. Here is a 2-pass sed solution
(you may need to do some adjustment to fit your situation).

$ sed -f <(sed -n -e '/(Page 2/,/(Page 3/{s:^1 5 0 \(.*\) tr.*:/\1/d:p'
-e } pfile) pfile
TR12 setfont (Page 1 of 4         ) 2 0 12 515 755 st
0 1 550 539 575 514 box
( ) 0 1 12 552 524.5 st
1 5 0 537 526 tr
454 320 446 320 dl
...
TR12 setfont (Page 2 of 4         ) 2 0 12 515 755 st
1 5 45 193 346 sq
...

The logic is straight forward. Firstly make a sed script from the input
file:

/161 324/d
/71 330/d
/196 337/d

Then with the script, we can delete all lines (both in page1 and page2)
contains the pattern (2). To get an more accurate result try to
generate a script like this by your self:
/^\(.[^ ]* \)\{3\}161 324 /d
/^\(.[^ ]* \)\{3\}71 330 /d
/^\(.[^ ]* \)\{3\}196 337/d

--
Regards,
hq00e






[ Post a follow-up to this message ]



    Re: Sed, Awk, or both?  
William James


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
01-19-06 07:57 AM

zcui@yahoo.com wrote:
> I need a script to automatically modify PostScript files.  The
> PostScript file has patterns like following:
>
> ...
> TR12 setfont (Page 1 of 4         ) 2 0 12 515 755 st
> 0 1 550 539 575 514 box
> ( ) 0 1 12 552 524.5 st
> 1 5 0 537 526 tr
> 1 2 180 161 324 sq               <--- Pattern (1)
> 1 2 90 71 330 tr                 <--- Pattern (1)
> 1 2 180 196 337 tr               <--- Pattern (1)
> 454 320 446 320 dl
> ...
> TR12 setfont (Page 2 of 4         ) 2 0 12 515 755 st
> 1 5 0 161 324 tr                  <--- Pattern (2)
> 1 5 0 71 330 tr                   <--- Pattern (2)
> 1 5 0 196 337 tr                  <--- Pattern (2)
> 1 5 45 193 346 sq
> ...
>
> Pattern (1) is only in Page 1, its column 1-3 and 6 can be something
> else.
> Pattern (2) is only in Page 2, its column 1-3 and 6 are always same,
> like "1   5   0   x   y    tr"
>
> What I need is
> 1) Find the Pattern (2) first by /^1 5 0 [1-9][0-9]*[0-9]*
> [1-9][0-9]*[0-9]* tr$/, save the column 4 and 5 as a string (p
attern
> x).
> 2) Then, search the string (pattern x) and delete the line in Page 1.
> 3) Delete all the lines found by Pattern (2) on Page 2.
>
> What's the best way to do these?
>
> Thanks for any suggestions.
> Scott

Using Ruby:

a = [['0']]
while gets
a << [ $1 ]   if  $_  =~  /^TR12 setfont \(Page (\d) of /
a.last << $_
end
a.assoc('2').reject!{ |s|
if md = /^1 5 0( \d+ \d+ )/.match( s )
a.assoc('1').reject!{ |x| x =~ /^\d+ \d+ \d+#{md[1]}/ }
end
}
a.each{ |x| puts x[1..-1] }






[ Post a follow-up to this message ]



    Re: Sed, Awk, or both?  
Ed Morton


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
01-19-06 01:11 PM

zcui@yahoo.com wrote:

> I need a script to automatically modify PostScript files.  The
> PostScript file has patterns like following:
>
> ...
> TR12 setfont (Page 1 of 4         ) 2 0 12 515 755 st
> 0 1 550 539 575 514 box
> ( ) 0 1 12 552 524.5 st
> 1 5 0 537 526 tr
> 1 2 180 161 324 sq               <--- Pattern (1)
> 1 2 90 71 330 tr                 <--- Pattern (1)
> 1 2 180 196 337 tr               <--- Pattern (1)
> 454 320 446 320 dl
> ...
> TR12 setfont (Page 2 of 4         ) 2 0 12 515 755 st
> 1 5 0 161 324 tr                  <--- Pattern (2)
> 1 5 0 71 330 tr                   <--- Pattern (2)
> 1 5 0 196 337 tr                  <--- Pattern (2)
> 1 5 45 193 346 sq
> ...
>
> Pattern (1) is only in Page 1, its column 1-3 and 6 can be something
> else.
> Pattern (2) is only in Page 2, its column 1-3 and 6 are always same,
> like "1   5   0   x   y    tr"
>
> What I need is
> 1) Find the Pattern (2) first by /^1 5 0 [1-9][0-9]*[0-9]*
> [1-9][0-9]*[0-9]* tr$/, save the column 4 and 5 as a string (p
attern
> x).
> 2) Then, search the string (pattern x) and delete the line in Page 1.
> 3) Delete all the lines found by Pattern (2) on Page 2.
>
> What's the best way to do these?
>
> Thanks for any suggestions.
> Scott
>

Try this:

$ cat file
...
TR12 setfont (Page 1 of 4         ) 2 0 12 515 755 st
0 1 550 539 575 514 box
( ) 0 1 12 552 524.5 st
1 5 0 537 526 tr
1 2 180 161 324 sq
1 2 90 71 330 tr
1 2 180 196 337 tr
454 320 446 320 dl
...
TR12 setfont (Page 2 of 4         ) 2 0 12 515 755 st
1 5 0 161 324 tr
1 5 0 71 330 tr
1 5 0 196 337 tr
1 5 45 193 346 sq
...

$ cat rmvLines.awk
BEGIN{ ARGV[ARGC++] = ARGV[1]; phase = 0 }
/Page 1/ && (phase == 0) { phase = 1 } # first pass, page 1
/Page 2/ && (phase == 1) { phase = 2 } # first pass, page 2
/Page 3/ && (phase == 2) { phase = 3 } # first pass, page 3+
/Page 1/ && (phase != 1) { phase = 4 } # second pass, page 1
/Page 2/ && (phase == 4) { phase = 5 } # second pass, page 2
/Page 3/ && (phase == 5) { phase = 6 } # second pass, page 3+

{ key = $4 " " $5 }
phase == 2 && /^1 5 0 [1-9][0-9]* [1-9][0-9]* tr$/ { ke
ys[key] }
phase ~ /4|5/ && !(key in keys) { print }
phase ~ /0|6/ { print }

$ awk -f rmvLines.awk file
...
TR12 setfont (Page 1 of 4         ) 2 0 12 515 755 st
0 1 550 539 575 514 box
( ) 0 1 12 552 524.5 st
1 5 0 537 526 tr
454 320 446 320 dl
...
TR12 setfont (Page 2 of 4         ) 2 0 12 515 755 st
1 5 45 193 346 sq
...

Regards,

Ed.





[ Post a follow-up to this message ]



    Re: Sed, Awk, or both?  
William Park


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
01-19-06 11:24 PM

zcui@yahoo.com wrote:
> I need a script to automatically modify PostScript files.  The
> PostScript file has patterns like following:
>
> ...
> TR12 setfont (Page 1 of 4         ) 2 0 12 515 755 st
> 0 1 550 539 575 514 box
> ( ) 0 1 12 552 524.5 st
> 1 5 0 537 526 tr
> 1 2 180 161 324 sq               <--- Pattern (1)
> 1 2 90 71 330 tr                 <--- Pattern (1)
> 1 2 180 196 337 tr               <--- Pattern (1)
> 454 320 446 320 dl
> ...
> TR12 setfont (Page 2 of 4         ) 2 0 12 515 755 st
> 1 5 0 161 324 tr                  <--- Pattern (2)
> 1 5 0 71 330 tr                   <--- Pattern (2)
> 1 5 0 196 337 tr                  <--- Pattern (2)
> 1 5 45 193 346 sq
> ...
>
> Pattern (1) is only in Page 1, its column 1-3 and 6 can be something
> else.
> Pattern (2) is only in Page 2, its column 1-3 and 6 are always same,
> like "1   5   0   x   y    tr"
>
> What I need is
> 1) Find the Pattern (2) first by /^1 5 0 [1-9][0-9]*[0-9]*
> [1-9][0-9]*[0-9]* tr$/, save the column 4 and 5 as a string (p
attern
> x).

This is the most difficult part.  To get 3 lines from 'TR12 setfont...',
then slice out 'x' and 'y' (column 4 and 5),

sed -n -e '/^TR12 setfont (Page 2 of 4/ {n; N; N; p; q}'  \
| awk '{print $4, $5}' > x_y

> 2) Then, search the string (pattern x) and delete the line in Page 1.

Something like
sed '/^TR12 setfont (Page 1 of 4/,/Page 2/ { /.../d; /.../d; /.../d;}'

> 3) Delete all the lines found by Pattern (2) on Page 2.
>
> What's the best way to do these?
>
> Thanks for any suggestions.
> Scott

--
William Park <opengeometry@yahoo.ca>, Toronto, Canada
ThinFlash: Linux thin-client on USB key (flash) drive
http://home.eol.ca/~parkw/thinflash.html
BashDiff: Super Bash shell
http://freshmeat.net/projects/bashdiff/





[ Post a follow-up to this message ]



    Re: Sed, Awk, or both?  
zcui@yahoo.com


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
01-20-06 11:03 PM

Thank you all for the help.

Since I have to modify the data for serval other different patterns
besides the pattern 1 and 2, I finally wrote a PERL program to do the
job.

I'll take your suggestions to try Sed and Awk later when I have time.

Thanks again.
Scott






[ Post a follow-up to this message ]



    Re: Sed, Awk, or both?  
Ed Morton


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
01-20-06 11:03 PM

zcui@yahoo.com wrote:

> Thank you all for the help.
>
> Since I have to modify the data for serval other different patterns
> besides the pattern 1 and 2, I finally wrote a PERL program to do the
> job.
>
> I'll take your suggestions to try Sed and Awk later when I have time.
>
> Thanks again.
> Scott
>

Would you ming posting the PERL program so we can see how it looks in
contrast to the awk program?

Ed.





[ Post a follow-up to this message ]



    Re: Sed, Awk, or both?  
James


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
01-20-06 11:03 PM

How about this?

open F,$ARGV[0];
undef %pat;
while (<F> ) {
$N = $1 if /Page (\d+) of /;
$pat{$1} = 1 if $N == 2 && /^1 5 0 (\d+ \d+) tr$/;
}
seek(F,0,0);
while (<F> ) {
$N = $1 if /Page (\d+) of /;
next if $N < 3 && /^\d+ \d+ \d+ (\d+ \d+) / && $pat{$1};
print;
}

It would be nice if awk can refer to the matched pattern directly.

James

Ed Morton wrote:
> zcui@yahoo.com wrote:
> 
>
> Would you ming posting the PERL program so we can see how it looks in
> contrast to the awk program?
>
> 	Ed.






[ Post a follow-up to this message ]



    Re: Sed, Awk, or both?  
Ed Morton


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
01-21-06 07:49 AM

James wrote:
> How about this?
>
> open F,$ARGV[0];
> undef %pat;
> while (<F> ) {
>         $N = $1 if /Page (\d+) of /;
>         $pat{$1} = 1 if $N == 2 && /^1 5 0 (\d+ \d+) tr$/;

If I understand it correctly, \d represents any digit, but the OP only
wants numbers that start with 1-9, not with zero.

> }
> seek(F,0,0);

I assume the above gets you back to the start of the file, so you're
about to start the second phase of parsing.

> while (<F> ) {
>         $N = $1 if /Page (\d+) of /;

Do you need to reset $N to zero before starting this second loop so it
doesn't retain it's value from the first pass for the header text
preceeding "Page 1"?

>         next if $N < 3 && /^\d+ \d+ \d+ (\d+ \d+) / && $pat{$1};
>         print;
> }


> It would be nice if awk can refer to the matched pattern directly.

Yes, it would. Thanks for posting that (but please don't top-post in
future).

In case anyone else finds it interesting, the equivalent design written
in awk (given the OPs posted input format) would be:

BEGIN{ ARGV[ARGC++] = ARGV[1] }
/Page [[:digit:]] of / { N = $4 }
{ key = $4" "$5 }
NR == FNR {
pat[key] = (N == 2 && /^1 5 0 ([[:digit:]]+ ){2}tr$/ ? 1 : 
0)
next
}
N < 3 && /^([[:digit:]]+ ){5}/ && pat[key] { next }
{ print }

The awk ones a little shorter, but there's not a huge difference,
really. The main difference, I think, is that as James pointed out, in
awk you can't refer to matched patterns so I had to explicitly hard-code
the field numbers to create "key" and "N", which was fine in this case
but could present problems in other siutations.

Obviously, if your awk doesn't support RE intervals (supported by gawk
--re-interval, or any POSIX awk), then you need to write the full
"[[:digit:]]+ [[:digit:]]+ [[:digit:]]+ [[:d
igit:]]+ [[:digit:]]+ "
instead of just "([[:digit:]]+ ){5}", etc.

Regards,

Ed.

> James
>
> Ed Morton wrote:
> 
>
>





[ Post a follow-up to this message ]



    Sponsored Links  




 





   All times are GMT. The time now is 06:05 PM.      Post New Thread    Post A Reply      
Pages (2): [1] 2 »   Last Thread   Next Thread Next


Most Popular forums 

Forum Jump:
Rate This Thread:

Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
 
Medical and Health forum | Computer Games Reviews | Graphics design forum

Back To The Top
Home | Usercp | Faq | Register