Unix administration - Uncompressing files coming out of cpio -o on the fly??

This is Interesting: Free IT Magazines  
Home > Archive > Unix administration > January 2004 > Uncompressing files coming out of cpio -o on the fly??





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Uncompressing files coming out of cpio -o on the fly??
Jeff

2004-01-23, 5:03 pm

I'm trying to move lots of compressed Oracle datafiles from one system
to another. The fastest scheme I've come up with so far is to use a
combination of netcat and cpio on both sides. I'm transferring many
files (around a hundred or so) across 20 "streams".

Here's the setup:

I have multiple files in a directory on the SOURCE server:

/dir/file.1.Z
/dir/file.2.Z
/dir/file.3.Z
....

On that machine I'm running:
cd /dir ; find . | cpio -oa -H odc | nc -w 5 TARGET 1310


on the TARGET server I'm running this a few minutes before:

cd /dir ; nc -l -p 1310 | cpio -ium &


The cpio/nc sits there on the TARGET server until it gets a connection
and then it begins dumping files out in /dir.

What I want to do is uncompress those files on the fly and still have
the filenames, etc.

Can anyone think of a way to do this? I'm still working on it myself,
but maybe someone could think up a shortcut.

One thing I thought of was writing a script that loops around all of
the directories and uncompresses files that are done writing (I could
store all filesizes in an array and check it every minute or two, or I
could transfer filesizes ahead of time from SOURCE to TARGET). I
suppose this would work but I'd much rather have it all in one command
on the TARGET machine.

Thanks a bunch!

Jeff
Frederico Fonseca

2004-01-23, 5:03 pm

On 22 Sep 2003 08:13:25 -0700, spam@lightweb.net (Jeff) wrote:
quote:

>I'm trying to move lots of compressed Oracle datafiles from one system
>to another. The fastest scheme I've come up with so far is to use a
>combination of netcat and cpio on both sides. I'm transferring many
>files (around a hundred or so) across 20 "streams".
>
>Here's the setup:
>
>I have multiple files in a directory on the SOURCE server:
>
>/dir/file.1.Z
>/dir/file.2.Z
>/dir/file.3.Z
>...
>
>On that machine I'm running:
>cd /dir ; find . | cpio -oa -H odc | nc -w 5 TARGET 1310
>
>
>on the TARGET server I'm running this a few minutes before:
>
>cd /dir ; nc -l -p 1310 | cpio -ium &
>
>
>The cpio/nc sits there on the TARGET server until it gets a connection
>and then it begins dumping files out in /dir.
>
>What I want to do is uncompress those files on the fly and still have
>the filenames, etc.
>
>Can anyone think of a way to do this? I'm still working on it myself,
>but maybe someone could think up a shortcut.
>
>One thing I thought of was writing a script that loops around all of
>the directories and uncompresses files that are done writing (I could
>store all filesizes in an array and check it every minute or two, or I
>could transfer filesizes ahead of time from SOURCE to TARGET). I
>suppose this would work but I'd much rather have it all in one command
>on the TARGET machine.



Unless David does what you require maybe the following will help.

If you can have an FTP server running on your "origin" machine I would
use the latest version of ncftp (www.ncftp.com) for the transport.

With this version you can just build scripts as follows.

Script1.

ncftpget -u user -p password -R server-name $1 $2
uncompress $1/*


execute the script as
$./script1 /localdir /remotedir
to copy all files from /remotedir (and subdirectories) to /localdir
and uncompress then imediatelly after the transfer.

------
Script2.

ncftpget -u user -p password server-name $1 $2
uncompress $1/*


execute the script as
$./script2 /localdir /remotedir/remotefiles
to copy all /remotedir/remotefiles to /localdir
and uncompress then imediatelly after the transfer.




Frederico Fonseca
ema il: frederico_fonseca at syssoft-int.com
frogswallow

2004-01-23, 5:03 pm

Frederico Fonseca <real-email-in-msg-spam@email.com> wrote in message news:<sbdumvk8f193gv3cs4a1pdu1rkfbnu1nhp@4ax.com>...[QUOTE][color=darkred]
> On 22 Sep 2003 08:13:25 -0700, spam@lightweb.net (Jeff) wrote:
>

<clip>

I would pipe your data to standard output/input using something
like...

(ssh login@hostname tar cvf - /put/path/here) | tar xvf -
/put/target/here

I haven't tested to see that this would work, but I think that it
should meet the criteria you're looking for. If ssh is an option like
I assumed above, you should also be able to do a recursive scp. The
files are located in one directory, from what your previous command
suggested.

Hope that helps.
Ted
Tim Haynes

2004-01-23, 5:03 pm

thomas_e_ward@yahoo.com (frogswallow) writes:
quote:

> <clip>
>
> I would pipe your data to standard output/input using something
> like...
>
> (ssh login@hostname tar cvf - /put/path/here) | tar xvf - /put/target/here
>
> I haven't tested to see that this would work, but I think that it should
> meet the criteria you're looking for. If ssh is an option like I assumed
> above, you should also be able to do a recursive scp. The files are
> located in one directory, from what your previous command suggested.



It depends what you're aiming for. If the files are sizeable and the
network is fast, then find, cpio and nc is the best way to saturate the
network connection.

There are a few amendments I'd make to your command above:

a) stylistic - the subshell () is unnecessary;

b) indeterminate compression-level; I use a small amount of compression for
interactive sessions, but `-oCompression=no' for bulk transfers - we don't
want to waste CPU doing both compression *and* encryption when it could be
kicking data out over the LAN (of course, this is the exact opposite
problem if you're running over a slow remote link; then you want maximum
compression instead);

c) the finger-memorized `-cvf' should probably just be `cf'; we don't want
to waste time printing each filename through the ssh session that's being
used for bulk transfer when it's already being printed as the files are
unpacked

d) the paths do not line up well (and you're missing a `-C'). If you run
your above command, then the created tar-file will have files
./put/path/here
./put/path/here/fileone
./put/path/here/filetwo
....
in it; now what you've written will unpack that "tar-file" looking only for
the file `/put/target/here'. Assuming you meant `-C /put/target/here', that
will relocate the contents of the tarball into `/put/target/here', thus
creating files:
/put/target/here/put/path/here
/put/target/here/put/path/here/fileone
/put/target/here/put/path/here/filetwo
...
etc.
So you probably wanted to fix the source tar-file, so it contains the
least-required number of parent-directories.

Summarizing all the above:

| ssh -oCompression=no user@remotebox "cd /base/dir; tar cf - ." | \
| tar xvpf - -C /local/basedir

is my replacement suggestion. (I suspect you can use -C on the first tar
command, too, but that's not quite as logical to understand, IMO.)

HTH - if not, it's a list of things to consider for posterity

~Tim
--
23:27:46 up 108 days, 14:05, 9 users, load average: 0.12, 0.26, 0.18
piglet@stirfried.vegetable.org.uk |Not every discomfort should
http://spodzone.org.uk/cesspit/ |be criminalised. (Bill Unruh)
Jeff

2004-01-23, 5:03 pm

Tim Haynes <usenet-20030922@stirfried.vegetable.org.uk> wrote in message news:<86d6dswisi.fsf@potato.vegetable.org.uk>...
quote:

> thomas_e_ward@yahoo.com (frogswallow) writes:
>
>
> It depends what you're aiming for. If the files are sizeable and the
> network is fast, then find, cpio and nc is the best way to saturate the
> network connection.
>
> There are a few amendments I'd make to your command above:
>
> a) stylistic - the subshell () is unnecessary;
>
> b) indeterminate compression-level; I use a small amount of compression for
> interactive sessions, but `-oCompression=no' for bulk transfers - we don't
> want to waste CPU doing both compression *and* encryption when it could be
> kicking data out over the LAN (of course, this is the exact opposite
> problem if you're running over a slow remote link; then you want maximum
> compression instead);
>
> c) the finger-memorized `-cvf' should probably just be `cf'; we don't want
> to waste time printing each filename through the ssh session that's being
> used for bulk transfer when it's already being printed as the files are
> unpacked
>
> d) the paths do not line up well (and you're missing a `-C'). If you run
> your above command, then the created tar-file will have files
> ./put/path/here
> ./put/path/here/fileone
> ./put/path/here/filetwo
> ....
> in it; now what you've written will unpack that "tar-file" looking only for
> the file `/put/target/here'. Assuming you meant `-C /put/target/here', that
> will relocate the contents of the tarball into `/put/target/here', thus
> creating files:
> /put/target/here/put/path/here
> /put/target/here/put/path/here/fileone
> /put/target/here/put/path/here/filetwo
> ...
> etc.
> So you probably wanted to fix the source tar-file, so it contains the
> least-required number of parent-directories.
>
> Summarizing all the above:
>
> | ssh -oCompression=no user@remotebox "cd /base/dir; tar cf - ." | \
> | tar xvpf - -C /local/basedir
>
> is my replacement suggestion. (I suspect you can use -C on the first tar
> command, too, but that's not quite as logical to understand, IMO.)
>
> HTH - if not, it's a list of things to consider for posterity
>
> ~Tim





Well, I gave up. I couldn't find a way to uncompress
previously-compressed files. I realize I could write an app to do it
but it's not worth the trouble right now.

Our situation is this:

We have two E6500 servers with 24+ processors each. They're connected
via gigabit ethernet. There are 700 gigs of oracle datafiles, in about
200 files I believe. It was necessary to compress them because of
disk-space constraints on the source server. The backups are cold. We
take those cold files and transfer them across and then decompress
them on the target server.

I realized yesterday that this is only temporary because we will be
moving to transferring the uncompressed original datafiles across the
network. I went and snagged afio and noticed it has an option to
compress each file inside the archive on the fly. You can even change
that to encrypt or whatever. Unfortunately, it won't decompress files
which were compressed before the archive was created. I tried that.

What I'm going to try is executing 20 simultaneous find | afio | nc
combinations, using gzip compression on the fly for each file.
Hopefully this should ensure that the entire archive isn't corrupted
in the case of a failure. Hopefully our boxes can stand up to it. Our
backup window is 6 hours.


This is all preperation for a migration from the source to the target
server. We're getting off of the old direct-attached SAN and onto a
newer one on the target server. Also OS patches, etc.


Thanks for all the responses!


Jeff
Tim Haynes

2004-01-23, 5:03 pm

spam@lightweb.net (Jeff) writes:

[snip]
quote:

> Our situation is this:
>
> We have two E6500 servers with 24+ processors each. They're connected via
> gigabit ethernet. There are 700 gigs of oracle datafiles, in about 200
> files I believe.



OK, I neglected the specifics of your problem in my generic spiel.

Consider this:

for i in `ssh otherbox "cd /some/dir; ls -1`
do
j=`echo $i | sed 's/.Z$//'`
ssh -oCompression=no otherbox "zcat $i" > $j
echo $i to $j done
done

IOW, you get the list of files from the remote box (should run reasonably
quickly), and for each one, hop over there and uncompress it, bringing it
back to a file (minus the trailing `.Z') "here".

Any use to man or Sun?

~Tim
--
18:41:35 up 109 days, 9:19, 10 users, load average: 0.07, 0.07, 0.08
piglet@stirfried.vegetable.org.uk |April comes to the new grass
http://spodzone.org.uk/cesspit/ |On the hills of gold
Ian Fitchet

2004-01-23, 5:03 pm

spam@lightweb.net (Jeff) writes:
quote:

> on the TARGET server I'm running this a few minutes before:
>
> cd /dir ; nc -l -p 1310 | cpio -ium &



When this was first posted, I had an idea about just sticking a -t
flag on the cpio and running cpio piped into xargs uncompress which
was rubbish on two counts: firstly, -t means do not create files and
secondly you can't be sure when the file has been completely created.

Well, it occurred that not only does -v also print the file name
(d'oh! -- in my defence I rarely use verbose flags) but thanks to the
serialisation of cpio you know the previous file has been completely
created when you seen the next file name. Cue awk and everyone's
favourite example of handling when the current line is not the same
as the previous. Upgrade to nawk/gawk to make a call to an external
program and your problem is solved.

Well, it works on a trivial example for me.

Oh, and I chose to use NR rather than...

.... | cpio -iumv | nawk '{ if (NR>1) s=sprintf ("uncompress %s", p); system (s); p=$1 } END { s=sprintf ("uncompress %s", p); system (s) }'

And if you still want the file names you can throw a tee into the
pipeline or a print into the awk.

Cheers,

Ian

Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com