Unix Shell - Parse an XML File using Shell script

This is Interesting: Free IT Magazines  
Home > Archive > Unix Shell > October 2005 > Parse an XML File using Shell script





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Parse an XML File using Shell script
karthik.prabaharan@gmail.com

2005-10-24, 3:45 pm

Hello,

Could someone help me in this regard?

I have an Input file having more than 1000 entries as below in xml
format:

<UserData id="id10">
<UserValue value="1wCBEW9yBJSOzC" title="__TEST "></UserValue>
<UserValue value="1L2T-5K762-B" title="__ITEM_ID"></UserValue>
<UserValue value="11" title="__REVISION_ID"></UserValue>
<UserValue value="WIRING ASSEMBLY" title="__NAME"></UserValue>
<UserValue value="HUB" title="Exported To"></UserValue>
<UserValue value="1L2T-5K762-B" title="Item ID"></UserValue>
<UserValue value="11" title="Item Revision"></UserValue>
<UserValue value="Own" title="Owning Site"></UserValue>
<UserValue value="Pub" title="Release"></UserValue></Part>
<Part id="id17" instanceRefs="id27 id35" name="1L2T ASSEMBLY-4 PIN TR
(view)">

I would like to have the Output file in the following format for all
the entries as shown below:

Item ID Item Revision Owning Site Release
1L2T-5K762-B 11 Own Pun


Any help is highly appreciated.

Thanks, Karthik

James

2005-10-24, 3:45 pm

Using PERL snippet in shell script,

#!/bin/bash
perl -e '
undef %h;
while (<> ) {
$k = $2,$h{$k} = $1 if (/value\=\"(.+?)\"\s+title\=\"(.+?)\"/);
}
@L = ("Item ID","Item Revision","Owning Site","Release");
print join "\t",@L;
print "\n";
for $k (@L) {
print "$h{$k}\t";
}
print "\n";
' $1

$ script xml.file
Item ID Item Revision Owning Site Release
1L2T-5K762-B 11 Own Pub

James

Steffen Schuler

2005-10-24, 3:45 pm

karthik.prabaharan@gmail.com wrote:
> Hello,
>
> Could someone help me in this regard?
>
> I have an Input file having more than 1000 entries as below in xml
> format:
>
> <UserData id="id10">
> <UserValue value="1wCBEW9yBJSOzC" title="__TEST "></UserValue>
> <UserValue value="1L2T-5K762-B" title="__ITEM_ID"></UserValue>
> <UserValue value="11" title="__REVISION_ID"></UserValue>
> <UserValue value="WIRING ASSEMBLY" title="__NAME"></UserValue>
> <UserValue value="HUB" title="Exported To"></UserValue>
> <UserValue value="1L2T-5K762-B" title="Item ID"></UserValue>
> <UserValue value="11" title="Item Revision"></UserValue>
> <UserValue value="Own" title="Owning Site"></UserValue>
> <UserValue value="Pub" title="Release"></UserValue></Part>
> <Part id="id17" instanceRefs="id27 id35" name="1L2T ASSEMBLY-4 PIN TR
> (view)">
>
> I would like to have the Output file in the following format for all
> the entries as shown below:
>
> Item ID Item Revision Owning Site Release
> 1L2T-5K762-B 11 Own Pun
>
>
> Any help is highly appreciated.
>
> Thanks, Karthik
>


An AWK script:

#!/usr/bin/awk -f
func check(a1, a3) {
return a1 ~ /^[ \t]*<UserValue[ \t]+value=$/ && a3 ~ /^[ \t]+title=$/
}
BEGIN {
FS="\""
print "Item ID Item Revision Owning Site Release"
}
check($1, $3) && $4 ~ /^Item ID$/ {
itemId = $2
}
check($1, $3) && $4 ~ /^Item Revision$/ {
itemRev = $2
}
check($1, $3) && $4 ~ /^Owning Site$/ {
ownSite = $2
}
check($1, $3) && $4 ~ /^Release$/ {
release = $2
}
itemId != "" && itemRev != "" && ownSite != "" && release != "" {
format = "%-14s%6s %-14s%-10s\n"
printf( format, itemId, itemRev, ownSite, release)
itemId = itemRev = ownSite = release = ""
}

Regards,

Steffen
William Park

2005-10-24, 3:45 pm

karthik.prabaharan@gmail.com wrote:
> Hello,
>
> Could someone help me in this regard?
>
> I have an Input file having more than 1000 entries as below in xml
> format:
>
> <UserData id="id10">
> <UserValue value="1wCBEW9yBJSOzC" title="__TEST "></UserValue>
> <UserValue value="1L2T-5K762-B" title="__ITEM_ID"></UserValue>
> <UserValue value="11" title="__REVISION_ID"></UserValue>
> <UserValue value="WIRING ASSEMBLY" title="__NAME"></UserValue>
> <UserValue value="HUB" title="Exported To"></UserValue>
> <UserValue value="1L2T-5K762-B" title="Item ID"></UserValue>
> <UserValue value="11" title="Item Revision"></UserValue>
> <UserValue value="Own" title="Owning Site"></UserValue>
> <UserValue value="Pub" title="Release"></UserValue></Part>
> <Part id="id17" instanceRefs="id27 id35" name="1L2T ASSEMBLY-4 PIN TR
> (view)">
>
> I would like to have the Output file in the following format for all
> the entries as shown below:
>
> Item ID Item Revision Owning Site Release
> 1L2T-5K762-B 11 Own Pun
>
>
> Any help is highly appreciated.


If your input data is nicely line-oriented like the above, then you can
extract 'value' and 'title' attributes using Sed, Python, Perl, or even
ordinary shell.

However, to handle more general syntax, you need to use a XML parser.
Expat is the first thing that comes to mind. Expat interface is
available for Gawk and Bash shell.

For Bash shell extension to Expat XML parser, see
http://home.eol.ca/~parkw/index.html#expat

Eg.

start() # Usage: start tag att=value...
{
case $1 in
UserData)
unset Item_ID Item_Revision Owning_Site Release
;;
UserValue)
declare "${@:2}"
case $title in
__ITEM_ID) Item_ID=$value ;;
__REVISION_ID) Item_Revision=$value ;;
Owning\ Site) Owning_Site=$value ;;
Release) Release=$value ;;
esac
;;
esac
}
end() # Usage: end tag
{
case $1 in
UserData)
echo "$Item_ID $Item_Revision $Owning_Site $Release"
;;
esac
}

echo "Item_ID Item_Revision Owning_Site Release"
expat -s start -e end < file.xml

--
William Park <opengeometry@yahoo.ca>, Toronto, Canada
ThinFlash: Linux thin-client on USB key (flash) drive
http://home.eol.ca/~parkw/thinflash.html
BashDiff: Super Bash shell
http://freshmeat.net/projects/bashdiff/
Enrique Perez-Terron

2005-10-24, 3:45 pm

On Thu, 20 Oct 2005 19:13:05 +0200, <karthik.prabaharan@gmail.com> wrote:

> Hello,
>
> Could someone help me in this regard?
>
> I have an Input file having more than 1000 entries as below in xml
> format:
>
> <UserData id="id10">
> <UserValue value="1wCBEW9yBJSOzC" title="__TEST "></UserValue>
> <UserValue value="1L2T-5K762-B" title="__ITEM_ID"></UserValue>
> <UserValue value="11" title="__REVISION_ID"></UserValue>
> <UserValue value="WIRING ASSEMBLY" title="__NAME"></UserValue>
> <UserValue value="HUB" title="Exported To"></UserValue>
> <UserValue value="1L2T-5K762-B" title="Item ID"></UserValue>
> <UserValue value="11" title="Item Revision"></UserValue>
> <UserValue value="Own" title="Owning Site"></UserValue>
> <UserValue value="Pub" title="Release"></UserValue></Part>
> <Part id="id17" instanceRefs="id27 id35" name="1L2T ASSEMBLY-4 PIN TR
> (view)">
>
> I would like to have the Output file in the following format for all
> the entries as shown below:
>
> Item ID Item Revision Owning Site Release
> 1L2T-5K762-B 11 Own Pun


#! /usr/bin/perl

$format="%-20s %15s %-14s %-10s\n";
@titles=("Item ID", "Item Revision", "Owning Site", "Release");

printf $format, @titles;

%row = {};

while (<> ) {
chomp;
if (/\<UserValue /i) {
( $value ) = /\svalue\=\"(.*?)\"/i or die "UserValue with no value at line $.: $_\n";
( $title ) = /\stitle\=\"(.*?)\"/i or die "UserValue with no title at line $.: $_\n";
$row{$title} = $value if (grep {$_ == $title} @titles;
}
if (/\<\/part\s*\>/i) {
printf $format, map {$row{$_}} @titles;
%row = {};
}
}


I can't remember now if xml has case sensitive tags and elements, in which case you
remove the "i" after the "/" before "or die"

-Enrique
bsh

2005-10-24, 3:45 pm


karthik.prabaharan@gmail.com wrote:
> I have an Input file ... in xml format:
> ...
> Thanks, Karthik


As with all the previous excellent suggestions, I probably
should round out the choices with an XML parser already
written -- and debugged! -- by Steve Coile and Aharon Robbins:

XMLparse.awk 1.1
ftp://ftp.freefriends.org/arnold/Awkstuff/xmlparser.awk
http://groups-beta.google.com/group...e2fb62778b31600

=Brian

bsh

2005-10-24, 3:45 pm

bsh wrote:[vbcol=seagreen]
> karthik.prabaharan@gmail.com wrote:

Or maybe even the XMLgawk patch to gawk(1):

https://sourceforge.net/projects/xmlgawk/
http://home.vrweb.de/~juergen.kahrs/gawk/XML/

=Brian

Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com