|
Home > Archive > Unix Shell > August 2006 > Inserting base href into header of xml/html file
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Inserting base href into header of xml/html file
|
|
| Dietrich 2006-08-29, 7:33 pm |
| Hi everybody,
I want to code a bash script putting a
<!-- base href = "$1" -->
into the head of a html/xml -file where $1 should be the original base
href of the locally saved file (for possible further reference).
The problem is: some html-files just starting with <html><head>
In that cases the scipt could put the new line at first in the html
file.
But there are also files starting with
<!DOCTYPE html PUBLIC [...]
or with
<?xml version="
for instance when using certain encoding like "UTF-8" or "UTF-16".
Lower and upper case could vary, of course.
In that case this code should be the first code in the xml/html-File
and the inserted line should come right after it. Eventually there is
no new line after the whole <!DOCTYPE[..]> or <?xml[..]> and the new
line (cr&lf or only lf, just like the html/xml-file was saved) must be
inserted as well.
So I must parse these other possibilities, find the corresponding ">"
of the "<!DOCTYPE html" or "<?xml",
but in a newline after that and insert after that line the base href
mentioned above.
--> Finding in the first line the "<!DOCTYPE html" or "<?xml>" with
grep is quite easy, but how finding the corresponding ">"? I suppose,
it must always be the *next* ">" to find.
There is no way knowing the exact length or layout of the found
<!DOCTYPE html" / "<?xml>" piece of code.
--> How to distinguish between a existing cr/crlf after the found ">"
and no cr/crlf.
--> How to distinguish if cr or crlf should be used?
Any help appreciated,
Geetings from Fuerth, Franconia, Germany
Have a nice day
D
| |
| Jan Tomka 2006-08-30, 7:33 am |
| Dietrich wrote:
> I want to code a bash script putting a
> <!-- base href = "$1" -->
> into the head of a html/xml -file where $1 should be the original base
> href of the locally saved file (for possible further reference).
>
> The problem is: some html-files just starting with <html><head>
> In that cases the scipt could put the new line at first in the html
> file.
>
> But there are also files starting with
> <!DOCTYPE html PUBLIC [...]
> or with
> <?xml version="
> for instance when using certain encoding like "UTF-8" or "UTF-16".
> Lower and upper case could vary, of course.
>
> [...]
>
> --> How to distinguish if cr or crlf should be used?
Dietrich,
as far as I can understand, you do not want to append a line after the
"xml" or "DOCTYPE" tags as much as you want to simply prepend a line
before the "html" tag.
Try this short script and modify it to comply fully with your needs:
#!/bin/sh
# Usage: base.sh BASEHREF FILENAME
echo -e "/<html>/s/<html>/\\\\\n<!-- base href = \"$1\" \
-->\\\\\n&/\nw\nq" | ed -s "$2"
Watch out for the first argument, though. It is pasted into the ed's
"s" command and should be better escaped properly. Note, that the
"html" element could possibly contain the attributes, which is not
handled here.
However, in case of well-formed XHTML, I would prefer an XSL
transformation.
The LF/CR problem can be solved using unix2dos and dos2unix commands.
Hope this helps,
Jan
|
|
|
|
|