Unix Shell - removing html with sed

This is Interesting: Free IT Magazines  
Home > Archive > Unix Shell > May 2007 > removing html with sed





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author removing html with sed
jsamdirect

2007-05-24, 1:18 pm

I am trying to remove the following with sed but am not having any
luck. I tried using the ".*" options like this, "sed -e 's/
<STRONG>Place mouse.*<\/\TABLE>//' filename.txt" but it doesn't appear
to match the string and remove the code. Anyone know how I can do
this?

<STRONG>Place mouse over picture below to change view above.<SCRIPT
language=JavaScript><!-- Beginvar pic01 = new Image();var pic02 = new
Image(); var pic03 = new Image();pic01.src = "http://www.myserver.com/
itset.jpg";pic02.src = "http://www.myserver.com/itset.jpg";function
doButtons(picimage) {eval("document['picture'].src = " + picimage +
".src");}// End --></SCRIPT> </STRONG></FONT><TABLE><TBODY><TR><TD
vAlign=top><TABLE borderColor=#dfdfdf cellSpacing=0 cellPadding=2
border=1><TBODY><TR><TD vAlign=center><A
onmouseover="doButtons('pic01')"><FONT face="arial, helvetica, sans-
serif" size=2><IMG src="http://www.myserver.com/itset.jpg" width=50
border=0></FONT></A><FONT face="arial, helvetica, sans-serif" size=2>
</FONT></TD><TD vAlign=center><A
onmouseover="doButtons('pic02')"><FONT face="arial, helvetica, sans-
serif" size=2><IMG src="http://www.myserver.com/itset.jpg" width=50
border=0></FONT></A><FONT face="arial, helvetica, sans-serif" size=2>
</FONT></TD></TR></TBODY></TABLE>

Michael Tosch

2007-05-24, 7:18 pm

jsamdirect wrote:
> I am trying to remove the following with sed but am not having any
> luck. I tried using the ".*" options like this, "sed -e 's/
> <STRONG>Place mouse.*<\/\TABLE>//' filename.txt" but it doesn't appear
> to match the string and remove the code. Anyone know how I can do
> this?
>
> <STRONG>Place mouse over picture below to change view above.<SCRIPT
> language=JavaScript><!-- Beginvar pic01 = new Image();var pic02 = new
> Image(); var pic03 = new Image();pic01.src = "http://www.myserver.com/
> itset.jpg";pic02.src = "http://www.myserver.com/itset.jpg";function
> doButtons(picimage) {eval("document['picture'].src = " + picimage +
> ".src");}// End --></SCRIPT> </STRONG></FONT><TABLE><TBODY><TR><TD
> vAlign=top><TABLE borderColor=#dfdfdf cellSpacing=0 cellPadding=2
> border=1><TBODY><TR><TD vAlign=center><A
> onmouseover="doButtons('pic01')"><FONT face="arial, helvetica, sans-
> serif" size=2><IMG src="http://www.myserver.com/itset.jpg" width=50
> border=0></FONT></A><FONT face="arial, helvetica, sans-serif" size=2>
> </FONT></TD><TD vAlign=center><A
> onmouseover="doButtons('pic02')"><FONT face="arial, helvetica, sans-
> serif" size=2><IMG src="http://www.myserver.com/itset.jpg" width=50
> border=0></FONT></A><FONT face="arial, helvetica, sans-serif" size=2>
> </FONT></TD></TR></TBODY></TABLE>
>



sed only has the standard greedy match .* so I suggest PERL with its
minimum match .*?

perl -pe 's#<STRONG>Place mouse.*?</STRONG>##'


--
Michael Tosch @ hp : com
jsamdirect

2007-05-24, 7:18 pm

> sed only has the standard greedy match .* so I suggest PERL with its
> minimum match .*?
>
> PERL -pe 's#<STRONG>Place mouse.*?</STRONG>##'
>
> --
> Michael Tosch @ hp : com


Worked like a charm.... Thanks!

William James

2007-05-25, 7:17 am

On May 24, 1:25 pm, Michael Tosch <eed...@NO.eed.SPAM.ericsson.PLS.se>
wrote:
> jsamdirect wrote:
>
>
> sed only has the standard greedy match .* so I suggest PERL with its
> minimum match .*?
>
> PERL -pe 's#<STRONG>Place mouse.*?</STRONG>##'


The only time this will work is when everything
that you are trying to match is on one line.

ruby -e 'puts gets(nil).
sub(/<STRONG>Place mouse.*?<\/STRONG>/m,"")'


Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com