Unix Shell - Printing multiple tag lines from file

This is Interesting: Free IT Magazines  
Home > Archive > Unix Shell > October 2005 > Printing multiple tag lines from file





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Printing multiple tag lines from file
bshah@citadon.com

2005-10-24, 3:45 pm

Hi Gurus,
I am trying to print multiple tag lines from files but some how am not
able to make it work. Please help

Here is my sample file

sample.xml
------------------------------
<TAG No 1>
Test1
Test2
Test3
Test4
</TAG>
HELP1

<TAG No 2>
pic1
pic2
pic3
</TAG>
HELP2

-------------------------

Like wise there are multiple tags and i want each tag from <TAG NO 1>
to </TAG> to a new file.
Expected Result
file1
--------
<TAG No 1>
Test1
Test2
Test3
Test4
</TAG>

file2
---------
<TAG No 2>
pic1
pic2
pic3
</TAG>

Any Help is greatly appreciated.
Best Regards
B

William James

2005-10-24, 3:45 pm

bshah@citadon.com wrote:
> Hi Gurus,
> I am trying to print multiple tag lines from files but some how am not
> able to make it work. Please help
>
> Here is my sample file
>
> sample.xml
> ------------------------------
> <TAG No 1>
> Test1
> Test2
> Test3
> Test4
> </TAG>
> HELP1
>
> <TAG No 2>
> pic1
> pic2
> pic3
> </TAG>
> HELP2
>
> -------------------------
>
> Like wise there are multiple tags and i want each tag from <TAG NO 1>
> to </TAG> to a new file.
> Expected Result
> file1
> --------
> <TAG No 1>
> Test1
> Test2
> Test3
> Test4
> </TAG>
>
> file2
> ---------
> <TAG No 2>
> pic1
> pic2
> pic3
> </TAG>
>
> Any Help is greatly appreciated.
> Best Regards
> B


ruby -e 'ARGF.read.scan(/<.*?>.*?<.*?>/m).each_with_index{ |x,i|
File.open( "file#{ i+1 }", "w"){|f| f.puts x } }' infile

Steffen Schuler

2005-10-24, 3:45 pm

bshah@citadon.com wrote:
> Hi Gurus,
> I am trying to print multiple tag lines from files but some how am not
> able to make it work. Please help
>
> Here is my sample file
>
> sample.xml
> ------------------------------
> <TAG No 1>
> Test1
> Test2
> Test3
> Test4
> </TAG>
> HELP1
>
> <TAG No 2>
> pic1
> pic2
> pic3
> </TAG>
> HELP2
>
> -------------------------
>
> Like wise there are multiple tags and i want each tag from <TAG NO 1>
> to </TAG> to a new file.
> Expected Result
> file1
> --------
> <TAG No 1>
> Test1
> Test2
> Test3
> Test4
> </TAG>
>
> file2
> ---------
> <TAG No 2>
> pic1
> pic2
> pic3
> </TAG>
>
> Any Help is greatly appreciated.
> Best Regards
> B
>


A GNU awk solution:

#!/usr/bin/gawk -f
/<TAG/ {
file = "file" gensub(/^[ \t]*<TAG[ \t]+No[ \t]+([0-9]+)>[ \t]*$/,
"\\1", "1")
}
/<TAG/,/<\/TAG>/ {print >> file}
/<\/TAG>/ {close(file)}

Regards,

Steffen
Loki Harfagr

2005-10-24, 3:45 pm

Le Thu, 20 Oct 2005 12:59:05 +0200, Steffen Schuler a écrit_:

> bshah@citadon.com wrote:
>
> A GNU awk solution:
>
> #!/usr/bin/gawk -f
> /<TAG/ {
> file = "file" gensub(/^[ \t]*<TAG[ \t]+No[ \t]+([0-9]+)>[ \t]*$/,
> "\\1", "1")
> }
> /<TAG/,/<\/TAG>/ {print >> file}
> /<\/TAG>/ {close(file)}



Good, then a slight simplification :
$ awk '/<TAG No/{close(file); file="outta"0+gensub(/<TAG No/,"",$1)}
/<TAG No /,/<\/TAG>/{ print > file}' inputfile

bshah@citadon.com

2005-10-24, 3:45 pm

Thanks Gurus.
Is there a way to do it with awk rahter than gawk?
as these extension will be a become a part of exisitng K-shell script.

Loki Harfagr wrote:
> Le Thu, 20 Oct 2005 12:59:05 +0200, Steffen Schuler a =E9crit :
>
>
>
> Good, then a slight simplification :
> $ awk '/<TAG No/{close(file); file=3D"outta"0+gensub(/<TAG No/,"",$1)}
> /<TAG No /,/<\/TAG>/{ print > file}' inputfile


bshah@citadon.com

2005-10-24, 3:45 pm

Hi Loki,
Can you please explain the code below:
awk '/<TAG No/{close(file); file="outta"0+gensub(/<TAG No/,"",$1)}
/<TAG No /,/<\/TAG>/{ print > file}' inputfile

Will this create a file0, file1..like that? What is gensub and outta"0
?
Will greatly appreciate it. Is this GNU Awk or awk?
Best Regards
B

Steffen Schuler

2005-10-24, 3:45 pm

bshah@citadon.com wrote:
> Thanks Gurus.
> Is there a way to do it with awk rahter than gawk?
> as these extension will be a become a part of exisitng K-shell script.
>
> Loki Harfagr wrote:
>
>
>
>


awk '/<TAG No/{close(file); s=$0; sub(/<TAG No/,"",s); file = "file" 0+s}
/<TAG No /,/<\/TAG>/{ print > file}' sample.xml

Steffen
bshah@citadon.com

2005-10-24, 3:45 pm

Hi Steffen,
I tried this in K shell:
awk '/<TAG No/{close(file); s=$0; sub(/<TAG No/,"",s); file = "file"
0+s} /<TAG No /,/<\/TAG>/{ print > file}' tags

Got an error:

+ awk /<TAG No/{close(file); s=$0; sub(/<TAG No/,"",s); file = "file"
0+s} /<TAG No /,/<\/TAG>/{ print > file} tags
awk: syntax error near line 1
awk: illegal statement near line 1

Steffen Schuler

2005-10-24, 3:45 pm

bshah@citadon.com wrote:
> Hi Loki,
> Can you please explain the code below:
> awk '/<TAG No/{close(file); file="outta"0+gensub(/<TAG No/,"",$1)}
> /<TAG No /,/<\/TAG>/{ print > file}' inputfile
>
> Will this create a file0, file1..like that? What is gensub and outta"0
> ?
> Will greatly appreciate it. Is this GNU Awk or awk?
> Best Regards
> B
>


from gawk(1):

gensub(r, s, h [, t]) Search the target string t for matches of the
regular expression r. If h is a string begin-
ning with g or G, then replace all matches of r
with s. Otherwise, h is a number indicating
which match of r to replace. If t is not sup-
plied, $0 is used instead. Within the replace-
ment text s, the sequence \n, where n is a
digit from 1 to 9, may be used to indicate just
the text that matched the n'th parenthesized
subexpression. The sequence \0 represents the
entire matched text, as does the character &.
Unlike sub() and gsub(), the modified string is
returned as the result of the function, and the
original target string is not changed.

That's GNU AWK code.

gensub(/<Tag No/, "", x) substitutes "<Tag No" to "" in $0
so gensub(...) == "1>"

0 + "1>" is equal to 1 because "1>" is transformed to a number by adding
0; and to get it there is taken the longest number prefix of "1>" also "1"

/<TAG No/,/<\/TAG>/ is a address range matching all lines between a
"<TAG No" line and a "</TAG>" line

print > file appends the line ($0) to the file

close(file) closes an open file but it's no error to close an unopen file
Steffen Schuler

2005-10-24, 3:45 pm

bshah@citadon.com wrote:
> Hi Steffen,
> I tried this in K shell:
> awk '/<TAG No/{close(file); s=$0; sub(/<TAG No/,"",s); file = "file"
> 0+s} /<TAG No /,/<\/TAG>/{ print > file}' tags
>
> Got an error:
>
> + awk /<TAG No/{close(file); s=$0; sub(/<TAG No/,"",s); file = "file"
> 0+s} /<TAG No /,/<\/TAG>/{ print > file} tags
> awk: syntax error near line 1
> awk: illegal statement near line 1
>


that was an error because of line mangling/splitting in the USENET
because of too long lines.

awk '/<TAG No/ {
close(file)
s=$0
sub(/<TAG No/,"",s)
file = "file" 0+s
}
/<TAG No /,/<\/TAG>/ {
print > file
}' tags
bshah@citadon.com

2005-10-24, 3:45 pm

Thanks Steffen. Tha t was great.
So this is GNU Awk not awk which i was trying in script and got syntax
error correct? This should go away if i install GNU awk?
Best Regards
B

Steffen Schuler

2005-10-24, 3:45 pm

Steffen Schuler wrote:
> bshah@citadon.com wrote:
>
>
> that was an error because of line mangling/splitting in the USENET
> because of too long lines.
>
> awk '/<TAG No/ {
> close(file)
> s=$0
> sub(/<TAG No/,"",s)
> file = "file" 0+s
> }
> /<TAG No /,/<\/TAG>/ {
> print > file
> }' tags


This solution is pure POSIX AWK not GNU AWK.
Loki Harfagr

2005-10-24, 3:45 pm

Le Thu, 20 Oct 2005 10:18:18 -0700, bshah a écrit_:

> Hi Loki,
> Can you please explain the code below:
> awk '/<TAG No/{close(file); file="outta"0+gensub(/<TAG No/,"",$1)}
> /<TAG No /,/<\/TAG>/{ print > file}' inputfile
>
> Will this create a file0, file1..like that? What is gensub and outta"0
> ?


gensub is a gawk extension of sub command family.
"outta" is just a litteral, used here as filename prefix.
the expression
"outta"0+gensub(/<TAG No/,"",$1)
will do:
- trim $1 content from the recurrent starter "<TAG No"
- then, the rest is something like "2>" and it adds the number 0
to it, as it "over-types" on calculus, hence only use the
arithmetical part of it it'll output a "2"
- finally collate the litteral part of the file prefix ("outta")
with the "2"
thus the final output will be a file name based on:
"outta" the given fixed prefix, you may like to use
"myOutputFile" instead ;-)
and the number excerpted from the <TAG* line

> Will greatly appreciate it. Is this GNU Awk or awk?


I'm afraid that gensub is gawk only, so is the
arithmetical over-typing, but Steffen gave you the correct translation
to awk (s=$0; sub(/<TAG No/,"",s); file = "file" 0+s)

Hope I was clear enough, it sometimes happen to me :-)

And thanks to Steffen for the portage :-)
bshah@citadon.com

2005-10-24, 3:45 pm

Hi Steffen,
Sorry for newbie questions as i am not much familier with awk that
much. The last script doesn't have inputfile. It should be like this
correct?

awk '/<TAG No/ {
close(file)
s=$0
sub(/<TAG No/,"",s)
file = "file" 0+s
}
/<TAG No /,/<\/TAG>/ {
print > file } inputfile

bshah@citadon.com

2005-10-24, 3:45 pm

This gives an error:
awk: syntax error near line 4
awk: illegal statement near line 4


awk '/<TAG No/ {
close(file)
s=$0
sub(/<TAG No/,"",s)
file = "file" 0+s
}
/<TAG No /,/<\/TAG>/ {
print > file
}' tags

Steffen Schuler

2005-10-24, 3:45 pm

bshah@citadon.com wrote:
> Hi Steffen,
> Sorry for newbie questions as i am not much familier with awk that
> much. The last script doesn't have inputfile. It should be like this
> correct?
>
> awk '/<TAG No/ {
> close(file)
> s=$0
> sub(/<TAG No/,"",s)
> file = "file" 0+s
> }
> /<TAG No /,/<\/TAG>/ {
> print > file } inputfile


#corrected:
print > file }' inputfile
bshah@citadon.com

2005-10-24, 3:45 pm

Still Doesn't work. Please Help

awk '/<TAG No/ {
close(file)
s=$0
sub(/<TAG No/,"",s)
file = "file" 0+s
}
/<TAG No /,/<\/TAG>/ {
print > file }' tags

awk: syntax error near line 4
awk: illegal statement near line 4

Some problem with line 4 - sub(/<TAG No/,"",s)

Enrique Perez-Terron

2005-10-24, 3:45 pm

On Thu, 20 Oct 2005 05:44:53 +0200, <bshah@citadon.com> wrote:

> Hi Gurus,
> I am trying to print multiple tag lines from files but some how am not
> able to make it work. Please help
>
> Here is my sample file
>
> sample.xml
> ------------------------------
> <TAG No 1>
> Test1
> Test2
> Test3
> Test4
> </TAG>
> HELP1
>
> <TAG No 2>
> pic1
> pic2
> pic3
> </TAG>
> HELP2
>
> -------------------------
>
> Like wise there are multiple tags and i want each tag from <TAG NO 1>
> to </TAG> to a new file.
> Expected Result
> file1
> --------
> <TAG No 1>
> Test1
> Test2
> Test3
> Test4
> </TAG>
>
> file2
> ---------
> <TAG No 2>
> pic1
> pic2
> pic3
> </TAG>


awk '/<TAG No 1>/,/<\/TAG>/ {print $0 >>"file1"}
/<TAG No 2>/,/<\/TAG>/ {print $0 >>"file2"}' sample.xml

-Enrique
Steffen Schuler

2005-10-24, 3:45 pm

You seem to have a very old AWK:

bshah@citadon.com wrote:
> Sorry for the trouble..
>
> PLease help
>
> awk '/<TAG No/ {
> close(file)
> s=$0


replace the following line:

> sub(/<TAG No/,"",s)


by:

s = substr(s,9)

> file = "file" 0+s}
> /<TAG No /,/<\/TAG>/ {
> print > file}' tags
>
> Doesn't seems to be working.
> It gives: awk: syntax error near line 4
> awk: illegal statement near line 4
>

bshah@citadon.com

2005-10-24, 3:45 pm

Hi Enrique,
It won't work as there can be any no of TAGS in xml not sure how many.

bshah@citadon.com

2005-10-24, 3:45 pm

HI Steffen,
That works but it puts all the tags in one file file0 instead of
separate files. DOes newer version of awk will solve this problem?

file0
<TAG No 1>
Test1
Test2
Test3
Test4
</TAG>
<TAG No 2>
pic1
pic2
pic3
</TAG>

Steffen Schuler

2005-10-24, 3:45 pm

bshah@citadon.com wrote:
> HI Steffen,
> That works but it puts all the tags in one file file0 instead of
> separate files. DOes newer version of awk will solve this problem?


install GNU AWK
Enrique Perez-Terron

2005-10-24, 3:45 pm

On Thu, 20 Oct 2005 20:22:19 +0200, <bshah@citadon.com> wrote:

> Sorry for the trouble..
>
> PLease help
>
> awk '/<TAG No/ {
> close(file)
> s=$0
> sub(/<TAG No/,"",s)
> file = "file" 0+s}
> /<TAG No /,/<\/TAG>/ {
> print > file}' tags


I copy'n'pasted the above lines.

I don't get any error messages. I get exactly the specified
output in the OP. I used file name "sample.xml" instead of
"tags", that's the whole difference.

What version of awk are you using? What operating system?

-Enrique
bshah@citadon.com

2005-10-24, 3:45 pm

Thanks a lot Guys. A special thanks to Steffen. Guys on this forum
ROCK!!!

Well the script works like a charm and a great explanations as well. I
used another version of awk and it works. I couldn't find the exsiting
version of awk that shipped with Solaris 8 but i used the alternate
version of awk from xpg4 folder and it worked.
Thanks once again to all Guys in this forum. for so much Support.
Best Regards
B

James

2005-10-24, 3:45 pm

If you don't mind PERL,

perl -e '
while (<> ) {
if (/^<TAG No (\d+)/) {
$num=$1;
open F, ">file$num";
} elsif (/^<\/TAG/) {
close F;
$num=0;
} elsif ($num > 0) {
print F;
}
}' sample.xml

James

2005-10-24, 3:45 pm

The last line was missing in the previous post.
Here it is again.

perl -e '
while (<> ) {
if (/^<TAG No (\d+)/) {
$num=$1;
open F, ">file$num";
} elsif (/^<\/TAG/) {
close F;
$num=0;
} elsif ($num > 0) {
print F;
}
}' sample.xml


James

John W. Krahn

2005-10-24, 3:45 pm

James wrote:
> If you don't mind PERL,


http://perldoc.perl.org/perlfaq1.html#What's-the-difference-between-%22perl%22-and-%22Perl%22%3f


> PERL -e '
> while (<> ) {
> if (/^<TAG No (\d+)/) {
> $num=$1;
> open F, ">file$num";
> } elsif (/^<\/TAG/) {
> close F;
> $num=0;
> } elsif ($num > 0) {
> print F;
> }
> }' sample.xml



perl -ne'((/<TAG No (\d+)>/ && open F, ">file$1") .. /<\/TAG>/) && print F'



John
--
use Perl;
program
fulfillment
Loki Harfagr

2005-10-24, 3:45 pm

Le Thu, 20 Oct 2005 12:08:25 -0700, bshah a écrit_:

> Thanks a lot Guys. A special thanks to Steffen. Guys on this forum
> ROCK!!!
>
> Well the script works like a charm and a great explanations as well. I
> used another version of awk and it works. I couldn't find the exsiting
> version of awk that shipped with Solaris 8 but i used the alternate
> version of awk from xpg4 folder and it worked.


Yes, I (and I believe that most of the other posters did so) forgot
to precise that you, and nobody else, would,should, use the overburnt
PACA suntanned, crippled to the carcinomial core, alpha oldee awk in
direct link under Solarisk. If Ed. wasn't on vacation you'd have had
this crystal clear from the very first post :D)

Apart from this, please do yourself a favor if you can afford it,
any place you can install gawk, just do it !-)
Though you may keep some *other* awks at hand just to exercise your
skill ;-)

> Thanks once again to all Guys in this forum. for so much Support.
> Best Regards
> B


That's nice of you, especially as because of vacations you only had
answers off tiers 1 and 2, just try and come back when the real boys
are back in town #;D)
James

2005-10-24, 3:45 pm

Hi John,
I noticed my PERL script does not print the <TAG ...>.

Anyway, interesting one-line script you have there.
perl -ne '((/<TAG No (\d+)>/ && open F, ">file$1") .. /<\/TAG>/) &&
print F'

What if you don't want to print the TAGs, only the lines between the
TAGs?
Just curious.

James

John W. Krahn

2005-10-24, 3:45 pm

James wrote:
> Hi John,
> I noticed my PERL script does not print the <TAG ...>.
>
> Anyway, interesting one-line script you have there.
> PERL -ne '((/<TAG No (\d+)>/ && open F, ">file$1") .. /<\/TAG>/) &&
> print F'
>
> What if you don't want to print the TAGs, only the lines between the
> TAGs?
> Just curious.



perl -ne'((/<TAG No (\d+)>/ && open F, ">file$1") .. /<\/TAG>/) && !/<\/?TAG/
&& print F'




John
--
use Perl;
program
fulfillment
bshah@citadon.com

2005-10-24, 3:45 pm

Its nice of you to have such a huge response.

Well here i have one more problem with my actual scenerio

The Script
/usr/xpg4/bin/awk '/<Page index=/ {
close(file)
s=$0
sub(/<Page index=/,"",s)
file = "file" 0+s
}
/<Page index=/,/<\/Page/ {
print > file}' sample.xml

Here is the sample.xml
<Page index=0">
Test1
Test2
Test3
Test4
</Page>
HELP1
<Page index="1">
pic1
pic2
pic3
</Page>
HELP2

Doesn't work correctly. It prints only file0 with tag 1 as below:
Output of file0
<Page index="1">
pic1
pic2
pic3
</Page>
It should create two files file0 and file1 with:
file 0 - <Page index="0"> </Page>
file1 -> <Page index="1"> </Page>

What i understood from the explanation (correct me if i am wrong)
sub function replaces <Page index=/ with "" (null) in string $0 (this
$0 references '/<Page index=/ after awk or the <Page index="0"> from
the file which matches - don't know - please clarify this $0 here)
and creates file based on no. 0 and 1 as file0 and file1 . The 0+s just
an arthemetic operation get the number becasue after replacing the
<Page index= with "" null the remaining portion will be "0" and as we
are doing arethmetic 0+s it will ignore quots and only consider
numbers. In that case it should output two files file0 and file1 with
respective tags. Please clarify and add stuff to make it more
explainable to me and others who are new to this sub function. Also
please let me know why this is not working as expected.
Best Regards

bshah@citadon.com

2005-10-24, 3:45 pm

Its nice of you to have such a huge response.

Well here i have one more problem with my actual scenerio

The Script
/usr/xpg4/bin/awk '/<Page index=/ {
close(file)
s=$0
sub(/<Page index=/,"",s)
file = "file" 0+s
}
/<Page index=/,/<\/Page/ {
print > file}' sample.xml

Here is the sample.xml
<Page index=0">
Test1
Test2
Test3
Test4
</Page>
HELP1
<Page index="1">
pic1
pic2
pic3
</Page>
HELP2

Doesn't work correctly. It prints only file0 with tag 1 as below:
Output of file0
<Page index="1">
pic1
pic2
pic3
</Page>
It should create two files file0 and file1 with:
file 0 - <Page index="0"> </Page>
file1 -> <Page index="1"> </Page>

What i understood from the explanation (correct me if i am wrong)
sub function replaces <Page index=/ with "" (null) in string $0 (this
$0 references '/<Page index=/ after awk or the <Page index="0"> from
the file which matches - don't know - please clarify this $0 here)
and creates file based on no. 0 and 1 as file0 and file1 . The 0+s just
an arthemetic operation get the number becasue after replacing the
<Page index= with "" null the remaining portion will be "0" and as we
are doing arethmetic 0+s it will ignore quots and only consider
numbers. In that case it should output two files file0 and file1 with
respective tags. Please clarify and add stuff to make it more
explainable to me and others who are new to this sub function. Also
please let me know why this is not working as expected.
Best Regards

Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com