Unix Programming - File Transfers with zlib

This is Interesting: Free IT Magazines  
Home > Archive > Unix Programming > September 2006 > File Transfers with zlib





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author File Transfers with zlib
bwaichu@yahoo.com

2006-09-19, 1:36 am

I am in the process of writing a simple file transfer program.
I have successfully sent a file to netcat using this idiom:

while ((size = read(file_no, buf, BUFSIZ)) != -1 && size != 0) {
if ((write(sockfd, buf, size)) <= 0)
break;
sent += size;
}

Now, I would like to be able to send a file that is compressed on one
end and decompressed on the other. My first choice is to use zlib.

But I have two questions:

1) how do I determine the best size buffer to send and compress with?
I intend to be sending
files that approximate 300 megabytes. Is there a good way to test
buffer sizes, so I can
make an educated decision?

2) are there better alternatives to read() and write() in this scenario
over a TCP connection?

Thanks.

joe@invalid.address

2006-09-19, 1:32 pm

"bwaichu@yahoo.com" <bwaichu@yahoo.com> writes:

> I am in the process of writing a simple file transfer program.
> I have successfully sent a file to netcat using this idiom:
>
> while ((size = read(file_no, buf, BUFSIZ)) != -1 && size != 0) {
> if ((write(sockfd, buf, size)) <= 0)
> break;
> sent += size;
> }
>
> Now, I would like to be able to send a file that is compressed on one
> end and decompressed on the other. My first choice is to use zlib.
>
> But I have two questions:
>
> 1) how do I determine the best size buffer to send and compress with?
> I intend to be sending
> files that approximate 300 megabytes. Is there a good way to test
> buffer sizes, so I can
> make an educated decision?


The zlib documentation gives guidlines for buffer size. You can
compress/uncompress in chunks so you don't have to buffer the whole
300 MB. See the zlib documentation at http://www.zlib.net. There's
some sample code with annotations at http://www.zlib.net/zlib_how.html

> 2) are there better alternatives to read() and write() in this scenario
> over a TCP connection?


That depends on your requirements.

Joe
James Antill

2006-09-19, 7:26 pm

On Mon, 18 Sep 2006 22:21:51 -0700, bwaichu@yahoo.com wrote:

> I am in the process of writing a simple file transfer program.
> I have successfully sent a file to netcat using this idiom:
>
> while ((size = read(file_no, buf, BUFSIZ)) != -1 && size != 0) {
> if ((write(sockfd, buf, size)) <= 0)
> break;
> sent += size;


This is wrong, write() on a socket can easily return > 0 && < size.
You also don't want to immediately exit if read returns -1 (check errno
and make a decision).

> }
>
> Now, I would like to be able to send a file that is compressed on one
> end and decompressed on the other. My first choice is to use zlib.


I'm hoping you mean "use zlib to produce gzip compatible compressions",
using straight zlib is a bad idea.

> 2) are there better alternatives to read() and write() in this scenario
> over a TCP connection?


For file transfer, you _might_ want to compress to disk and then
sendfile() the result. This will win big if you keep the result around and
can just re-send it if someone else requests the same file.
Note that you might have latency requirements that dictate not doing that
(at least for the first request).

Also you might want to use send() instead of write() with the MSG_MORE
flag (but personally, I find just setting the CORK flag on the connection
and using write() is easier).

--
James Antill -- james@and.org
http://www.and.org/and-httpd

bwaichu@yahoo.com

2006-09-20, 1:32 am


James Antill wrote:
>
> I'm hoping you mean "use zlib to produce gzip compatible compressions",
> using straight zlib is a bad idea.


Why is it a bad idea? Can you demonstrate why it's a bad idea?
openSSH uses
zlib compression with a 4096 fixed size buffer. And the compression
functions
in openSSH are pretty simple.

>
>
> For file transfer, you _might_ want to compress to disk and then
> sendfile() the result. This will win big if you keep the result around and
> can just re-send it if someone else requests the same file.
> Note that you might have latency requirements that dictate not doing that
> (at least for the first request).


This would also eat up a lot of disk space. Let's say I want to send
10 files that
are all about 200-300 megabytes each. Why would I want to compress
them and save the compression to disk? What if the file I am sending
after being compressed is the same size due to it being an avi? This
just adds disk I/O for no reason.

> Also you might want to use send() instead of write() with the MSG_MORE
> flag (but personally, I find just setting the CORK flag on the connection
> and using write() is easier).


Please explain.

James Antill

2006-09-20, 7:52 pm

On Tue, 19 Sep 2006 17:40:49 -0700, bwaichu@yahoo.com wrote:

>
> James Antill wrote:
>
> Why is it a bad idea? Can you demonstrate why it's a bad idea?
> openSSH uses
> zlib compression with a 4096 fixed size buffer. And the compression
> functions
> in openSSH are pretty simple.


I'd argue it's always bad due to interoperability. zlib is needlessly
different from straight gzip, and straight gzip can be read by many
things. This is esp. important, IMO, for a file server where you might
actually want to get the compressed version and then uncompress it (there
are no std. tools to uncompress zlib).

>
> This would also eat up a lot of disk space. Let's say I want to send
> 10 files that
> are all about 200-300 megabytes each. Why would I want to compress
> them and save the compression to disk?


Latency and throughput.
If one of those files gets requested twice, that would be a huge saving.
And disk space is very cheap compared to network latency.

> What if the file I am sending
> after being compressed is the same size due to it being an avi? This
> just adds disk I/O for no reason.


Then don't keep it, but keep the knowledge that it isn't worth
compressing. Then you'll have better latency with almost no space usage.

>
> Please explain.


From "man 2 send":

MSG_MORE (Since Linux 2.4.4)
The caller has more data to send. This flag is used with TCP
sockets to obtain the same effect as the TCP_CORK socket option
(see tcp(7)), with the difference that this flag can be set on a
per-call basis.

....for TCP_CORK documentation see "man 7 tcp".

--
James Antill -- james@and.org
http://www.and.org/and-httpd

bwaichu@yahoo.com

2006-09-21, 1:35 am


James Antill wrote:

> I'd argue it's always bad due to interoperability. zlib is needlessly
> different from straight gzip, and straight gzip can be read by many
> things. This is esp. important, IMO, for a file server where you might
> actually want to get the compressed version and then uncompress it (there
> are no std. tools to uncompress zlib).


I would uncompress the packets as they arrived and write the result to
a
file. Let me experiment more on this one.

>
> From "man 2 send":
>
> MSG_MORE (Since Linux 2.4.4)
> The caller has more data to send. This flag is used with TCP
> sockets to obtain the same effect as the TCP_CORK socket option
> (see tcp(7)), with the difference that this flag can be set on a
> per-call basis.
>
> ...for TCP_CORK documentation see "man 7 tcp".


I am not finding MSG_MORE on my openBSD box man pages. I just searched
sys/socket.h, and I'm not finding it there either. Is that a flag only
available to linux?

James Antill

2006-09-21, 1:25 pm

On Wed, 20 Sep 2006 22:33:36 -0700, bwaichu@yahoo.com wrote:

> James Antill wrote:
>
> I am not finding MSG_MORE on my openBSD box man pages. I just searched
> sys/socket.h, and I'm not finding it there either. Is that a flag only
> available to linux?


TCP_CORK was initially only in Linux, but has been added to Solaris. As
far as I know none of the BSDs have added support for it though.

--
James Antill -- james@and.org
http://www.and.org/and-httpd

Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com