Unix Shell - Re-number the tags in file.

This is Interesting: Free IT Magazines  
Home > Archive > Unix Shell > October 2005 > Re-number the tags in file.





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Re-number the tags in file.
bshah@citadon.com

2005-10-24, 3:45 pm

Hi Gurus,
I need help with shell script.

I have a sample xml with has no of tags each having some no. and i need
to re-number it.
Here is the sample.xml
------------------------------
<Text id="0" time="1129" guid="E1A6C156406E510" color="55" fontsize="6"
italic="false" bold="false" opaque="false" underline="false"
font="Arial" hyperlink="" comment="">
<Text id="0" time="1129835577" guid="7EFE6B53"
color="0,0,255" fontsize="0.139751" italic="false" bold="false"
opaque="false" underline="false" font="Arial" hyperlink="" comment="">
<Text id="0" time="1129835545" guid="3CCF4BC"
color="0,0,255" fontsize="0.450309" italic="false" bold="false"
opaque="false" underline="false" font="Arial" hyperlink="" comment="">
<Text id="0" time="1129835590" guid="38FC928"
color="0,0,255" fontsize="0.186335" italic="false" bold="false"
opaque="false" underline="false" font="Arial" hyperlink="" comment="">
---------------------------------

Basically i have to replace all Text id tag which has no "0" (it can be
anything) to re-number it. meaning in the sample file there are 4
occurances of Text id so i need to number each text id starting from 0
to 3

I am trying with for and while loop but doesn't work.
Here is my script
SUM=0
COUNT=$(cat sample | grep "Text id=" | grep "guid=" | wc -l)
echo $COUNT

for id in `cat sample | grep "Text id=" | grep "guid="`
do
while [ $SUM -le $COUNT ]
do
SUM=`expr $SUM + 1`
echo $SUM
echo $id
#sed command to replace numbers from 0 to 3
done
done

It doesn't work at all. Any help is greatly appreciated. Any other
simpler method will also be OK.

Benjamin Schieder

2005-10-24, 3:45 pm

bshah@citadon.com wrote:
> I have a sample xml with has no of tags each having some no. and i need
> to re-number it.
> Here is the sample.xml
> ------------------------------
> <Text id="0" time="1129" guid="E1A6C156406E510" color="55" fontsize="6"
> italic="false" bold="false" opaque="false" underline="false"
> font="Arial" hyperlink="" comment="">
> <Text id="0" time="1129835577" guid="7EFE6B53"
> color="0,0,255" fontsize="0.139751" italic="false" bold="false"
> opaque="false" underline="false" font="Arial" hyperlink="" comment="">
> <Text id="0" time="1129835545" guid="3CCF4BC"
> color="0,0,255" fontsize="0.450309" italic="false" bold="false"
> opaque="false" underline="false" font="Arial" hyperlink="" comment="">
> <Text id="0" time="1129835590" guid="38FC928"
> color="0,0,255" fontsize="0.186335" italic="false" bold="false"
> opaque="false" underline="false" font="Arial" hyperlink="" comment="">
> ---------------------------------
>
> Basically i have to replace all Text id tag which has no "0" (it can be
> anything) to re-number it. meaning in the sample file there are 4
> occurances of Text id so i need to number each text id starting from 0
> to 3


Hmm...
count=0
while read line ; do
if [[ ${line} = '*Text id="0"*' ]] ; then
echo "${line/Text id="0"/Text id="${count}"}"
count=$((${count}+1))
else
echo "${line}"
fi
done < file

This might work in bash, but is entirely untested.

Greetings,
Benjamin

--
____ _ _ ____ _ _ _ _____ __ __
/ ___|| | / \ / ___|| | | ( ) ____| \/ |
\___ \| | / _ \ \___ \| |_| |/| _| | |\/| |
___) | |___ / ___ \ ___) | _ | | |___| | | |
|____/|_____/_/ \_\____/|_| |_| |_____|_| |_|
play online: telnet://slashem.crash-override.net
view scores: http://slashem.crash-override.net
watch deaths: irc://irc.freenode.net#slashem
bshah@citadon.com

2005-10-24, 3:45 pm

Hi Benjamin,
Thanks for your prompt response.
It still doesn't work. I also need to check for guid.
It doesn't do anything. Please Help
count=0
while read line ; do
if [[ ${line} = '*Text id="0"*' ]] && [[ ${line} = 'guid=* ]]
; then
echo "${line/Text id="0"/Text id="${count}"}"
sed "s/Text id=\"0\"/Text id=\"${count}\"/" > new_count
count=$((${count}+1))
# else
# echo "${line}"
fi
done < sample

William Park

2005-10-24, 3:45 pm

bshah@citadon.com wrote:
> Hi Gurus,
> I need help with shell script.
>
> I have a sample xml with has no of tags each having some no. and i need
> to re-number it.
> Here is the sample.xml
> ------------------------------
> <Text id="0" time="1129" guid="E1A6C156406E510" color="55" fontsize="6"
> italic="false" bold="false" opaque="false" underline="false"
> font="Arial" hyperlink="" comment="">
> <Text id="0" time="1129835577" guid="7EFE6B53"
> color="0,0,255" fontsize="0.139751" italic="false" bold="false"
> opaque="false" underline="false" font="Arial" hyperlink="" comment="">
> <Text id="0" time="1129835545" guid="3CCF4BC"
> color="0,0,255" fontsize="0.450309" italic="false" bold="false"
> opaque="false" underline="false" font="Arial" hyperlink="" comment="">
> <Text id="0" time="1129835590" guid="38FC928"
> color="0,0,255" fontsize="0.186335" italic="false" bold="false"
> opaque="false" underline="false" font="Arial" hyperlink="" comment="">
> ---------------------------------
>
> Basically i have to replace all Text id tag which has no "0" (it can be
> anything) to re-number it. meaning in the sample file there are 4
> occurances of Text id so i need to number each text id starting from 0
> to 3


1.
- split your text on '<Text id="0" '.
- re-generate text with '<Text id="N" ', where N is sequence you
want.

Eg.
slurp=`< sample.xml`
a=( "${slurp|,<Text id=\"0\" }" )
n=$(( ${#a[*]} - 1 ))
b=( '<Text id="'{0..n}'" ' )

You now have
a=( aaa bbb ccc ddd )
b=( 0 1 2 )
You need to "zip" them together, so that the end result is
c=( aaa 0 bbb 1 ccc 2 ddd )

This can be done by
arrayzip -a c a b
pp_trim -a c

To print them out,
printf '%s' "${c[@]}"
echo
or
echo "${c[@]|,}"

2.
- process the lines sequentially, assuming only one <Text> tag and
one 'id' attribute per line.

Eg.
n=0
while read; do
if match -2 REPLY '<Text id="0" ' submatch; then
echo "${submatch[0]}<Text id=\"$n\" ${submatch[1]}"
n=$(( n + 1 ))
fi
done

Here 'match -2' will return string segments before and after the
regex pattern. And, they are returned in 'submatch' array variable.

Ref:
http://freshmeat.net/projects/bashdiff/
http://home.eol.ca/~parkw/index.html#bashdiff

--
William Park <opengeometry@yahoo.ca>, Toronto, Canada
ThinFlash: Linux thin-client on USB key (flash) drive
http://home.eol.ca/~parkw/thinflash.html
BashDiff: Super Bash shell
http://freshmeat.net/projects/bashdiff/
bshah@citadon.com

2005-10-24, 3:45 pm

Hi Willam,
Sorry i don't have match command on Solaris8 and my script is an
extension of K-Shell not BASH.
I need to make the script suggested by Benjamin to work but somehow it
doesn't. the sed command doesnt do anything. Any other suggetions in
K-Shell?
Please Advice.
Regards
B

Janis Papanagnou

2005-10-24, 3:45 pm

bshah@citadon.com wrote:
>
> I have a sample xml with has no of tags each having some no. and i need
> to re-number it.
> [...]
> Basically i have to replace all Text id tag which has no "0" (it can be
> anything) to re-number it. meaning in the sample file there are 4
> occurances of Text id so i need to number each text id starting from 0
> to 3


bshah@citadon.com wrote:
> Hi Benjamin,
> Thanks for your prompt response.
> It still doesn't work. I also need to check for guid.
> It doesn't do anything. Please Help
> count=0
> while read line ; do
> if [[ ${line} = '*Text id="0"*' ]] && [[ ${line} = 'guid=* ]]
> ; then
> echo "${line/Text id="0"/Text id="${count}"}"
> sed "s/Text id=\"0\"/Text id=\"${count}\"/" > new_count
> count=$((${count}+1))
> # else
> # echo "${line}"
> fi
> done < sample


Try this...

awk '{
sub(/Text id="0"/, "Text id=\"" nr["textid"]++ "\"")
sub(/guid="0"/, "guid=\"" nr["guid"]++ "\"")
print
}' < sample


Janis
William James

2005-10-24, 3:45 pm


bshah@citadon.com wrote:
> Hi Gurus,
> I need help with shell script.
>
> I have a sample xml with has no of tags each having some no. and i need
> to re-number it.
> Here is the sample.xml
> ------------------------------
> <Text id="0" time="1129" guid="E1A6C156406E510" color="55" fontsize="6"
> italic="false" bold="false" opaque="false" underline="false"
> font="Arial" hyperlink="" comment="">
> <Text id="0" time="1129835577" guid="7EFE6B53"
> color="0,0,255" fontsize="0.139751" italic="false" bold="false"
> opaque="false" underline="false" font="Arial" hyperlink="" comment="">
> <Text id="0" time="1129835545" guid="3CCF4BC"
> color="0,0,255" fontsize="0.450309" italic="false" bold="false"
> opaque="false" underline="false" font="Arial" hyperlink="" comment="">
> <Text id="0" time="1129835590" guid="38FC928"
> color="0,0,255" fontsize="0.186335" italic="false" bold="false"
> opaque="false" underline="false" font="Arial" hyperlink="" comment="">
> ---------------------------------
>
> Basically i have to replace all Text id tag which has no "0" (it can be
> anything) to re-number it. meaning in the sample file there are 4
> occurances of Text id so i need to number each text id starting from 0
> to 3
>
> I am trying with for and while loop but doesn't work.
> Here is my script
> SUM=0
> COUNT=$(cat sample | grep "Text id=" | grep "guid=" | wc -l)
> echo $COUNT
>
> for id in `cat sample | grep "Text id=" | grep "guid="`
> do
> while [ $SUM -le $COUNT ]
> do
> SUM=`expr $SUM + 1`
> echo $SUM
> echo $id
> #sed command to replace numbers from 0 to 3
> done
> done
>
> It doesn't work at all. Any help is greatly appreciated. Any other
> simpler method will also be OK.


ruby -pe'BEGIN{$c=-1}
sub(/<Text id="\d+(?=".*guid=)/){"<Text id=\"#{ $c+=1 }"}'

bshah@citadon.com

2005-10-24, 3:45 pm

Hi janis,
We need sed to replace the count correct? i need to replace only Text
id not guid so another sub will not be required correct?

I tried it prints the entire file

bshah@citadon.com

2005-10-24, 3:45 pm

Hi Janis,
Sorry for my previous email It does seem to work but the no is not
correct.
It gives following no.:
Text id="11"
Text id="34"
Text id="61"
Text id="84"
instead of 0 ,1, 2 and 3.
How come it generates odd nos?

Janis Papanagnou

2005-10-24, 3:45 pm

bshah@citadon.com wrote:
> Hi Janis,
> Sorry for my previous email It does seem to work but the no is not
> correct.
> It gives following no.:
> Text id="11"
> Text id="34"
> Text id="61"
> Text id="84"
> instead of 0 ,1, 2 and 3.
> How come it generates odd nos?
>


I get from my program with your test data...


<Text id="0" time="1129" guid="E1A6C156406E510" color="55" fontsize="6"
italic="false" bold="false" opaque="false" underline="false"
font="Arial" hyperlink="" comment="">
<Text id="1" time="1129835577" guid="7EFE6B53"
color="0,0,255" fontsize="0.139751" italic="false" bold="false"
opaque="false" underline="false" font="Arial" hyperlink="" comment="">
<Text id="2" time="1129835545" guid="3CCF4BC"
color="0,0,255" fontsize="0.450309" italic="false" bold="false"
opaque="false" underline="false" font="Arial" hyperlink="" comment="">
<Text id="3" time="1129835590" guid="38FC928"
color="0,0,255" fontsize="0.186335" italic="false" bold="false"
opaque="false" underline="false" font="Arial" hyperlink="" comment="">


The guid is unchanged, because in your example you had none with value 0.

Janis
Janis Papanagnou

2005-10-24, 3:45 pm

It would be helpful if you quote some context when posting.

bshah@citadon.com wrote:
> Hi janis,
> We need sed to replace the count correct? i need to replace only Text
> id not guid so another sub will not be required correct?


What's your comment about guid, then? If you want to replace only when
guid on the same line matches some condition then try something like...

awk '/guid=WHATEVER/ {
sub(/Text id="0"/, "Text id=\"" nr["textid"]++ "\"")
} 1 ' < sample

(This one is untested.)

Janis
Janis Papanagnou

2005-10-24, 3:45 pm

Janis Papanagnou wrote:
> It would be helpful if you quote some context when posting.
>
> bshah@citadon.com wrote:
>
>
>
> What's your comment about guid, then? If you want to replace only when
> guid on the same line matches some condition then try something like...
>
> awk '/guid=WHATEVER/ {
> sub(/Text id="0"/, "Text id=\"" nr["textid"]++ "\"")
> } 1 ' < sample
>
> (This one is untested.)


More precisely, you don't need the array in that simple case, it's only
helpful if you automatically want to extract arbitrary tags and index
the array with the tags. Otherwise simply use...

sub(/Text id="0"/, "Text id=\"" nr++ "\"")

If this does not work in your environment, as you mentioned elsethread,
we need to know more about it.

Janis
William Park

2005-10-24, 3:45 pm

bshah@citadon.com wrote:
> Hi Willam,
> Sorry i don't have match command on Solaris8 and my script is an
> extension of K-Shell not BASH.
> I need to make the script suggested by Benjamin to work but somehow it
> doesn't. the sed command doesnt do anything. Any other suggetions in
> K-Shell?


None. Download and compile. :-)

--
William Park <opengeometry@yahoo.ca>, Toronto, Canada
ThinFlash: Linux thin-client on USB key (flash) drive
http://home.eol.ca/~parkw/thinflash.html
BashDiff: Super Bash shell
http://freshmeat.net/projects/bashdiff/
bshah@citadon.com

2005-10-24, 3:45 pm

Hi Jenis,
On actual file it doesn't work. It gives weired result. Also i don't
need to replace guid at all. That is just for checking where Text id
and guid exists there only replace the text id no.
Also i would love to know if poosible how nr["textid"]++ works.
Thanks again for the Help
Here is the actual file:


<CDLInfo v1="1" v2="7" v3="0" v4="12">
<DriverInfo v1="1" v2="5" v3="0" v4="23">Pdf2DL</DriverInfo>
</CDLInfo>
<ImageList imagecount="0">
</ImageList>
<PageList pagecount="2">
<Page index="0" left="0.0" bottom="0.0" right="8464.96"
top="10984.7">
<AuthorList authorcount="2">
<Author name="an">
<Text id="0" time="1129835521"
guid="91EE8C8249F3874B9E1A6C156406E510" color="0,0,255"
fontsize="0.248446" italic="false" bold="false" opaque="fa
lse" underline="false" font="Arial" hyperlink="" comment="">
<PointList pointcount="4">
<Point>
<x>778.454</x>
<y>10131.8</y>
</Point>
<Point>
<x>3336.99</x>
<y>10131.8</y>
</Point>
<Point>
<x>3336.99</x>
<y>9722.44</y>
</Point>
<Point>
<x>778.454</x>
<y>9722.44</y>
</Point>
</PointList>
<TextLine>Page1Markup1</TextLine>
</Text>
</Author>
<Author name="an">
<Text id="0" time="1129835577"
guid="B06497DEF31DFA40B7EFE6B538261ED5" color="0,0,255"
fontsize="0.139751" italic="false" bold="false" opaque="fa
lse" underline="false" font="Arial" hyperlink="" comment="">
<PointList pointcount="4">
<Point>
<x>6066.1</x>
<y>9841.84</y>
</Point>
<Point>
<x>8198.21</x>
<y>9841.84</y>
</Point>
<Point>
<x>8198.21</x>

bshah@citadon.com

2005-10-24, 3:45 pm

Hi Jenis,
Sorry to trouble you.

Well somehow i am getting weired result.
/usr/xpg4/bin/awk '{
sub(/Text id="0"/, "Text id=\"" nr["textid"]++ "\"")
print
}' < sample > testno

I am trying on actual file. Let me try again.

Janis Papanagnou

2005-10-24, 3:45 pm

bshah@citadon.com wrote:
> Also i would love to know if poosible how nr["textid"]++ works.


As I said, in this context it is unnecessary, you may take a simple
scalar variable and increment it, nr++, but ok...

Above nr is an (associative) array indexed by an arbitrary string,
"textid" in this case. Initially the array contains no data. As
soon as you address some element of the array, like nr["textid"],
the element will be created and initialized with 0 (or an empty
string ""). The above expression will be evaluated to the actual
number (0 initially), then incremented, so that the next time it
is accessed it will get the next higher value (and will be further
incremented).

Janis
bshah@citadon.com

2005-10-24, 3:45 pm

Thanks Janis
WHy it is not working with actual sample. Also what about wild
character matching for Text id=*" and guid=*.
Best regards

Janis Papanagnou

2005-10-24, 3:45 pm

You are referring to a response of mine but followup William's reply.
PLEASE learn how to post in Usenet (this is no web forum)!

bshah@citadon.com wrote:
> Hi Janis,
> One more thing. The Text id="0" can be any no i mean text id="1" or 2
> or 3 any no. Is there a way to put wild character like sub(/Text
> id="*"/, "Text id=\"" nr["textid"]++ "\"") as you mentioned earlier
> awk '/guid=*/ as guid can also be any no as seen in the sample file
> and is justg for the purpose
> to replace only when guid on the same line matches some condition.
> Best Regards
> B
>


To substitute arbitrary index numbers use this line in the program I
posted...

sub(/Text id="[0-9]+"/, "Text id=\"" nr++ "\"")


Janis
bshah@citadon.com

2005-10-24, 3:45 pm

Sorry Janis. I will take care in future. Why it doesn't work in actual
file? It gives weired nos?

bshah@citadon.com

2005-10-24, 3:45 pm

Hi Jenis,
On actual file it doesn't work. It gives weired result. Also i don't
Thanks again for the Help
Here is the actual file:


<CDLInfo v1="1" v2="7" v3="0" v4="12">
<DriverInfo v1="1" v2="5" v3="0" v4="23">Pdf2DL</DriverInfo>
</CDLInfo>
<ImageList imagecount="0">
</ImageList>
<PageList pagecount="2">
<Page index="0" left="0.0" bottom="0.0" right="8464.96"
top="10984.7">
<AuthorList authorcount="2">
<Author name="an">
<Text id="0" time="1129835521"
guid="91EE8C8249F3874B9E1A6C156406E510" color="0,0,255"
fontsize="0.248446" italic="false" bold="false" opaque="fa
lse" underline="false" font="Arial" hyperlink="" comment="">
<PointList pointcount="4">
<Point>
<x>778.454</x>
<y>10131.8</y>
</Point>
<Point>
<x>3336.99</x>
<y>10131.8</y>
</Point>
<Point>
<x>3336.99</x>
<y>9722.44</y>
</Point>
<Point>
<x>778.454</x>
<y>9722.44</y>
</Point>
</PointList>
<TextLine>Page1Markup1</TextLine>
</Text>
</Author>
<Author name="an">
<Text id="0" time="1129835577"
guid="B06497DEF31DFA40B7EFE6B538261ED5" color="0,0,255"
fontsize="0.139751" italic="false" bold="false" opaque="fa
lse" underline="false" font="Arial" hyperlink="" comment="">
<PointList pointcount="4">
<Point>
<x>6066.1</x>
<y>9841.84</y>
</Point>
<Point>
<x>8198.21</x>
<y>9841.84</y>
</Point>
<Point>
<x>8198.21</x>

Janis Papanagnou

2005-10-24, 3:45 pm

bshah@citadon.com wrote:
> Thanks Janis
> WHy it is not working with actual sample.


It is working in my environment with both of your posted samples.

> Also what about wild
> character matching for Text id=*" and guid=*.
> Best regards


Answered elsethread.

I think I cannot help you further. Good luck.

Janis
bshah@citadon.com

2005-10-24, 3:45 pm

OK Thanks Janis. Appreciate all your Help.

Chris F.A. Johnson

2005-10-24, 3:45 pm

On 2005-10-21, bshah@citadon.com wrote:
> Hi Jenis,
> Sorry to trouble you.
>
> Well somehow i am getting weired result.


What is the result?

What's weird about it?

What are you expecting?

> /usr/xpg4/bin/awk '{
> sub(/Text id="0"/, "Text id=\"" nr["textid"]++ "\"")
> print
> }' < sample > testno
>
> I am trying on actual file.


What does the input look like?

--
Chris F.A. Johnson <http://cfaj.freeshell.org>
========================================
==========================
Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress
<http://www.torfree.net/~chris/books/cfaj/ssr.html>
bshah@citadon.com

2005-10-24, 3:45 pm

Hi Chris,
Well the Script given by Janis works but the result is weired. The Text
id= number should come as 0 ,1,2,3 but instead it comes as 23 ,41,61,84
etc...not sure why. I am running Solaris 8. I also compiled and
installed gawk and tried but no luck. However Janis can't reproduce the
same.
Any help is greatly apprciated.

bshah@citadon.com

2005-10-24, 3:45 pm

<CDLInfo v1="1" v2="7" v3="0" v4="12">
<DriverInfo v1="1" v2="5" v3="0" v4="23">Pdf2DL</DriverInfo>
</CDLInfo>
<ImageList imagecount="0">
</ImageList>
<PageList pagecount="2">
<Page index="0" left="0.0" bottom="0.0" right="8464.96"
top="10984.7">
<AuthorList authorcount="2">
<Author name="an">
<Text id="0" time="1129835521"
guid="91EE8C8249F3874B9E1A6C156406E510" color="0,0,255"
fontsize="0.248446" italic="false" bold="false" opaque="fa
lse" underline="false" font="Arial" hyperlink="" comment="">
<PointList pointcount="4">
<Point>
<x>778.454</x>
<y>10131.8</y>
</Point>
<Point>
<x>3336.99</x>
<y>10131.8</y>
</Point>
<Point>
<x>3336.99</x>
<y>9722.44</y>
</Point>
<Point>
<x>778.454</x>
<y>9722.44</y>
</Point>
</PointList>
<TextLine>Page1Markup1</TextLine>
</Text>
</Author>
<Author name="an">
<Text id="0" time="1129835577"
guid="B06497DEF31DFA40B7EFE6B538261ED5" color="0,0,255"
fontsize="0.139751" italic="false" bold="false" opaque="fa
lse" underline="false" font="Arial" hyperlink="" comment="">
<PointList pointcount="4">
<Point>
<x>6066.1</x>
<y>9841.84</y>
</Point>
<Point>
<x>8198.21</x>
<y>9841.84</y>
</Point>
<Point>
<x>8198.21</x>

Chris F.A. Johnson

2005-10-24, 3:45 pm

On 2005-10-22, bshah@citadon.com wrote:
> Hi Chris,
> Well the Script given by Janis works but the result is weired.


What script was that?

This is Usenet, not a web forum (though it is also bastardized on
several web sites). You cannot know whether the reader can see or
has seen the previous posts, or, if they have been seen, whether
the reader remembers what they were about.

When using groups.google.com to reply to a Usenet article (better
to use a real newsreader), click on "show options" at the top of
the article, then click on the "Reply" at the bottom of the
article headers. This will quote the previous message in the
accepted manner. Trim the parts of it that are not relevant to
your follow-up.

> The Text id= number should come as 0 ,1,2,3 but instead it comes as
> 23 ,41,61,84 etc...not sure why. I am running Solaris 8. I also
> compiled and installed gawk and tried but no luck. However Janis
> can't reproduce the same.


> Any help is greatly apprciated.
>



--
Chris F.A. Johnson <http://cfaj.freeshell.org>
========================================
==========================
Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress
<http://www.torfree.net/~chris/books/cfaj/ssr.html>
bshah@citadon.com

2005-10-24, 3:45 pm

I will check it out.
Here is the script:

/usr/xpg4/bin/awk '{
sub(/Text id="[0-9]+"/, "Text id=\""nr++"\"")
print
}' < sample > testno

Chris F.A. Johnson

2005-10-24, 3:45 pm

On 2005-10-22, bshah@citadon.com wrote:
> I will check it out.


Check what out?

> Here is the script:
>
> /usr/xpg4/bin/awk '{
> sub(/Text id="[0-9]+"/, "Text id=\""nr++"\"")
> print
> }' < sample > testno


What is that supposed to do?

What is the input?

What doesn't it do?

Each post to usenet should be as complete as possible in itself.
Please reread my previous post.

--
Chris F.A. Johnson <http://cfaj.freeshell.org>
========================================
==========================
Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress
<http://www.torfree.net/~chris/books/cfaj/ssr.html>
bshah@citadon.com

2005-10-24, 3:45 pm

Hi Chris,
Sorry about that.
Here is the detail:

Basically i have to replace all <Text id= > tag which has no "0" (it
can be
anything) to re-number it. meaning in the sample file there are 4
occurances of Text id so i need to number each text id starting from 0
to 3

Here is the sample file from which i need to renumber tags.
<?xml version="1.0" encoding="UTF-8"?>
<CDLInfo v1="1" v2="7" v3="0" v4="12">
<DriverInfo v1="1" v2="5" v3="0" v4="23">Pdf2DL</DriverInfo>
</CDLInfo>
<ImageList imagecount="0">
</ImageList>
<PageList pagecount="2">
<Page index="0" left="0.0" bottom="0.0" right="8464.96"
top="10984.7">
<AuthorList authorcount="2">
<Author name="nian">
<Text id="0" time="1129835521"
guid="91EE8C8249F3874B9E1A6C156406E510" color="0,0,255"
fontsize="0.248446" italic="false" bold="false" opaque="fa
lse" underline="false" font="Arial" hyperlink="" comment="">
<PointList pointcount="4">
<Point>
<x>778.454</x>
<y>10131.8</y>
</Point>
<Point>
<x>3336.99</x>
<y>10131.8</y>
</Point>
<Point>
<x>3336.99</x>
<y>9722.44</y>
</Point>
<Point>
<x>778.454</x>
<y>9722.44</y>
</Point>
</PointList>
<TextLine>Page1Markup1</TextLine>
</Text>
</Author>
<Author name="anian">
<Text id="0" time="1129835577"
guid="B06497DEF31DFA40B7EFE6B538261ED5" color="0,0,255"
fontsize="0.139751" italic="false" bold="false" opaque="fa
lse" underline="false" font="Arial" hyperlink="" comment="">
<PointList pointcount="4">
<Point>
<x>6066.1</x>
<y>9841.84</y>
</Point>
<Point>
<x>8198.21</x>
<y>9841.84</y>
</Point>
<Point>
<x>8198.21</x>
<y>9611.57</y>
</Point>
<Point>
<x>6066.1</x>
<y>9611.57</y>
</Point>
</PointList>
<TextLine>Page1Markup2</TextLine>
</Text>
</Author>
</AuthorList>
</Page>
<Page index="1" left="0.0" bottom="0.0" right="8464.96"
top="10984.7">
<AuthorList authorcount="2">
<Author name="Srini Balasubramanian">
<Text id="0" time="1129835545"
guid="521AD866670EB54AA13CCF4BC645F544" color="0,0,255"
fontsize="0.450309" italic="false" bold="false" opaque="fa
lse" underline="false" font="Arial" hyperlink="" comment="">
<PointList pointcount="4">
<Point>
<x>693.17</x>
<y>2729.11</y>
</Point>
<Point>
<x>4855.06</x>
<y>2729.11</y>
</Point>
<Point>
<x>4855.06</x>
<y>1987.13</y>
</Point>
<Point>
<x>693.17</x>
<y>1987.13</y>
</Point>
</PointList>
<TextLine>Page2Markup1</TextLine>
</Text>
</Author>
-----------------------------------------------------------------

Here is the script which works but doesn't put the numbers correctly.

usr/xpg4/bin/awk '{
sub(/Text id="0"/, "Text id=\"" nr++ "\"")
print
}' < sample > testno

It puts some random nos in final output on <Text id> tags.

Regards
B

bshah@citadon.com

2005-10-24, 3:45 pm

Hi Chris,
The problem is ++nr in the script
usr/xpg4/bin/awk '{
sub(/Text id="0"/, "Text id=\"" nr++ "\"")
print
}' < sample > testno
puts the previous line no. instead of 0 1 2 3...
Is there a way we can searlize this?

Chris F.A. Johnson

2005-10-24, 3:45 pm

On 2005-10-22, bshah@citadon.com wrote:
> Hi Chris,
> Sorry about that.


Sorry about what? You are still not quoting properly.

> Here is the detail:
>
> Basically i have to replace all <Text id= > tag which has no "0" (it
> can be anything) to re-number it. meaning in the sample file there
> are 4 occurances of Text id so i need to number each text id
> starting from 0 to 3


There are only three instances in your file.

> Here is the sample file from which i need to renumber tags.
><?xml version="1.0" encoding="UTF-8"?>

[snip]
> <Text id="0" time="1129835521"
> guid="91EE8C8249F3874B9E1A6C156406E510" color="0,0,255"
> fontsize="0.248446" italic="false" bold="false" opaque="fa
> lse" underline="false" font="Arial" hyperlink="" comment="">
> <PointList pointcount="4">

[snip]
> <Author name="anian">
> <Text id="0" time="1129835577"
> guid="B06497DEF31DFA40B7EFE6B538261ED5" color="0,0,255"
> fontsize="0.139751" italic="false" bold="false" opaque="fa
> lse" underline="false" font="Arial" hyperlink="" comment="">
> <PointList pointcount="4">

[snip]
> <Text id="0" time="1129835545"
> guid="521AD866670EB54AA13CCF4BC645F544" color="0,0,255"
> fontsize="0.450309" italic="false" bold="false" opaque="fa
> lse" underline="false" font="Arial" hyperlink="" comment="">
> <PointList pointcount="4">

[snip]
> </Author>
> -----------------------------------------------------------------
>
> Here is the script which works but doesn't put the numbers correctly.
>
> usr/xpg4/bin/awk '{
> sub(/Text id="0"/, "Text id=\"" nr++ "\"")
> print
> }' < sample > testno
>
> It puts some random nos in final output on <Text id> tags.


Take a good look at those numbers; they are not random.
(Hint: grep -n 'Text id' sample)

Try:

if (sub(/Text id="0"/, "Text id=\"" nr "\"")) ++nr

--
Chris F.A. Johnson <http://cfaj.freeshell.org>
========================================
==========================
Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress
<http://www.torfree.net/~chris/books/cfaj/ssr.html>
Chris F.A. Johnson

2005-10-24, 3:45 pm

On 2005-10-22, bshah@citadon.com wrote:
> Hi Chris,
> The problem is ++nr in the script


What problem???????????

How many times do you need to be told?????

QUOTE THE POST YOU ARE REPLYING TO!!!!!!!!!!!!

> usr/xpg4/bin/awk '{
> sub(/Text id="0"/, "Text id=\"" nr++ "\"")
> print
> }' < sample > testno
> puts the previous line no. instead of 0 1 2 3...
> Is there a way we can searlize this?
>



--
Chris F.A. Johnson <http://cfaj.freeshell.org>
========================================
==========================
Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress
<http://www.torfree.net/~chris/books/cfaj/ssr.html>
bshah@citadon.com

2005-10-24, 3:45 pm

I was new to this forum. Now i think i am following your instructions.
Let me know if this post is OK or not.

The current script sub function
sub(/Text id="0"/, "Text id=\"" nr++ "\"")
replaces Text id= with the previous line no. and not with regular 0 1
serially.
Is there a way to replace with searil nos 1 2 3 etc rather than line
nos.?
Regards

Chris F.A. Johnson

2005-10-24, 3:45 pm

On 2005-10-22, bshah@citadon.com wrote:
> I was new to this forum.


This is not a "forum"; it is a Usenet newsgroup.

> Now i think i am following your instructions.


I still don't see any quoting.

> Let me know if this post is OK or not.


Not.

> The current script sub function
> sub(/Text id="0"/, "Text id=\"" nr++ "\"")
> replaces Text id= with the previous line no. and not with regular 0 1
> serially.
> Is there a way to replace with searil nos 1 2 3 etc rather than line
> nos.?


Read my other post. I gave you the solution.

--
Chris F.A. Johnson <http://cfaj.freeshell.org>
========================================
==========================
Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress
<http://www.torfree.net/~chris/books/cfaj/ssr.html>
tester

2005-10-24, 3:45 pm

Hope this looks ok.

The soultion you gave works except first one. It doesn't put "0" in the
text id rest are fine.

Jan Schampera

2005-10-24, 3:45 pm

Chris F.A. Johnson wrote:

> On 2005-10-22, bshah@citadon.com wrote:
>
> This is not a "forum"; it is a Usenet newsgroup.

Googler...

J.

--
"Be liberal in what you accept, and conservative in what you send."
- J. B. Postel, master of the net.

Chris F.A. Johnson

2005-10-24, 3:45 pm

On 2005-10-22, tester wrote:
> Hope this looks ok.


No.

Once more: Click on "show options" at the top of the article, then
click on the "Reply" at the bottom of the article
headers (the bottom of the headers, not the bottom of
the article). This will quote the previous message in
the accepted manner. Trim the parts of it that are not
relevant to your follow-up.

> The soultion you gave works except first one. It doesn't put "0" in the
> text id rest are fine.


What does it put?

--
Chris F.A. Johnson <http://cfaj.freeshell.org>
========================================
==========================
Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress
<http://www.torfree.net/~chris/books/cfaj/ssr.html>
tester

2005-10-24, 3:45 pm

Hi Janis,
This soultion works:
if (sub(/Text id="0"/, "Text id=\"" 0+nr "\"")) ++nr
Regards


Janis Papanagnou wrote:
> Janis Papanagnou wrote:
>
> More precisely, you don't need the array in that simple case, it's only
> helpful if you automatically want to extract arbitrary tags and index
> the array with the tags. Otherwise simply use...
>
> sub(/Text id="0"/, "Text id=\"" nr++ "\"")
>
> If this does not work in your environment, as you mentioned elsethread,
> we need to know more about it.
>
> Janis


Chris F.A. Johnson

2005-10-24, 3:45 pm

On 2005-10-22, tester wrote:
> Hi Janis,
> This soultion works:
> if (sub(/Text id="0"/, "Text id=\"" 0+nr "\"")) ++nr


Better, but pease don't top post.
[vbcol=seagreen]
> Janis Papanagnou wrote:
[snip]

--
Chris F.A. Johnson <http://cfaj.freeshell.org>
========================================
==========================
Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress
<http://www.torfree.net/~chris/books/cfaj/ssr.html>
Bill Marcum

2005-10-24, 3:45 pm

On 21 Oct 2005 18:50:23 -0700, bshah@citadon.com
<bshah@citadon.com> wrote:
> Hi Chris,
> The problem is ++nr in the script
> usr/xpg4/bin/awk '{
> sub(/Text id="0"/, "Text id=\"" nr++ "\"")
> print
> }' < sample > testno
> puts the previous line no. instead of 0 1 2 3...
> Is there a way we can searlize this?
>

Perhaps you want to use ++nr instead of nr++?


--
Youth is a disease from which we all recover.
-- Dorothy Fuldheim
Enrique Perez-Terron

2005-10-24, 3:45 pm

On Sat, 22 Oct 2005 04:21:40 +0200, tester <bshah@citadon.com> wrote:

> Hope this looks ok.
>
> The soultion you gave works except first one. It doesn't put "0" in the
> text id rest are fine.


"Tester", "Bshah", whatever your name or alias be,

do you understand the word "quote"?

Look at this very message I am writing now. Can you see there are some lines
above my text, which reproduce the text you wrote? That is a quote. This message
quotes your message. I quote your message because I want that everybody who
reads this message shall see what I am responding to.


OK, what is wrong with Chris solution? I don't think you will ever even
*try* to learn this. Since Chris moved the the "++" operator away from the
"nr" variable in the "sub" function call, then awk has no reason to treat
"nr" as a number.

The variable "nr" occurs between two text strings:

"blah blah" nr "more blah blah"

then awk thinks "nr" is a string too, and concatenates (joins) the three strings.

The first time this happens, the variable "nr" has not yet been used in any way,
so it is just empty. That is why it fails. You can fix it by putting a number,
say 0, in "nr" before the data processing begins.

BEGIN {nr = 0}

{ if (sub(/Text id="0"/, "Text id=\"" nr "\"")) ++nr }


Another way would be to put the ++ back where it was, but instead make sure
the command is not executed except for those line that actually contains the
pattern /Text id=/



/Text id=/ {
sub( /Text id="[0-9]*"/, "Text id=\"" nr++ "\"")
}


1

I have changed the regular expressions a bit because you said the zero could be
anything.

-Enrique
Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com