|
Home > Archive > Unix Programming > January 2008 > Regex for anchor Tag
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Regex for anchor Tag
|
|
| meendar@gmail.com 2007-12-30, 1:27 pm |
| Hi,
anyone know the RE expression for finding the anchor tags in an html
page.
Data : xxxxxxxxxxxx<a href ="xxxx.com" ></a>
I just need <a href ="xxxx.com
Thanks,
Meendar
| |
| Barry Margolin 2007-12-30, 1:27 pm |
| In article
<4eadab70-70ed-4da7-9867-1839fa4d5c6e@l6g2000prm.googlegroups.com>,
meendar@gmail.com wrote:
> Hi,
>
> anyone know the RE expression for finding the anchor tags in an html
> page.
>
>
> Data : xxxxxxxxxxxx<a href ="xxxx.com" ></a>
>
> I just need <a href ="xxxx.com
You don't want the '"' at the end of the URL? And what about the
closing '>'?
< *[aA] [^>]
--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
| |
| meendar@gmail.com 2008-01-01, 1:36 am |
| On Dec 30 2007, 10:50 pm, Barry Margolin <bar...@alum.mit.edu> wrote:
> In article
> <4eadab70-70ed-4da7-9867-1839fa4d5...@l6g2000prm.googlegroups.com>,
>
> meen...@gmail.com wrote:
>
>
>
>
> You don't want the '"' at the end of the URL? And what about the
> closing '>'?
>
> < *[aA] [^>]
>
> --
> Barry Margolin, bar...@alum.mit.edu
> Arlington, MA
> *** PLEASE post questions in newsgroups, not directly to me ***
> *** PLEASE don't copy me on replies, I'll read them in the group ***
> < *[aA] [^>]
There is some possiblity to have any text after the href end ie..
<a href = "xxxx.com" title ="new"></a>
I am looking for only <a href = "xxxx.com
| |
| Barry Margolin 2008-01-01, 1:36 am |
| In article
<7e658650-0442-492a-aae4-405e7721072c@t1g2000pra.googlegroups.com>,
meendar@gmail.com wrote:
> On Dec 30 2007, 10:50 pm, Barry Margolin <bar...@alum.mit.edu> wrote:
>
>
> There is some possiblity to have any text after the href end ie..
>
> <a href = "xxxx.com" title ="new"></a>
>
> I am looking for only <a href = "xxxx.com
< *a +href *= *"[^"]*
--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
| |
| Scott Lurndal 2008-01-01, 7:23 pm |
| meendar@gmail.com writes:
>On Dec 30 2007, 10:50 pm, Barry Margolin <bar...@alum.mit.edu> wrote:
>
>
>There is some possiblity to have any text after the href end ie..
>
><a href = "xxxx.com" title ="new"></a>
>
>I am looking for only <a href = "xxxx.com
Use an xsl stylesheet processed by xsltproc.
e.g. something like:
<xsl:stylesheet version='1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="//a">
<a href="<xsl:value-of select="attribute::href"/>
</xsl:template>
</xsl:stylesheet>
Run this through xsltproc:
$ cat /tmp/a.html
<html>
<head>
</head>
<body>
<a href="test1" fred="joe">test</a>
<a href="test2" fred="billbob">frod</a>
</body>
</html>
$ cat /tmp/a.xsl
<xsl:stylesheet version='1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="//a">
<a href="<xsl:value-of select="attribute::href"/>
</xsl:template>
</xsl:stylesheet>
$ cat /tmp/a.html | xsltproc /tmp/a.xsl -
<?xml version="1.0"?>
<a href="test1
<a href="test2
|
|
|
|
|