Unix Programming - Parsing ZIP headers

This is Interesting: Free IT Magazines  
Home > Archive > Unix Programming > January 2004 > Parsing ZIP headers





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Parsing ZIP headers
Jem Berkes

2004-01-26, 11:34 am

I develop a GNU GPL'd mail tool for UNIX that filters attachments by
detecting dangerous filenames. It works precisely as expected, but one
situation I'm trying to find a solution for is the case where dangerous
filenames are compressed inside an attached ZIP file.

I realize that there are libraries for ZIP handling, but I prefer to avoid
such thorough external libraries when all I really need to do is peek
inside the ZIP header to read the filenames contained within.

Is this a relatively easy thing to do? Can anyone point me to any documents
that can help me learn what the header looks like? (I'm guessing there are
various versions of it).

--
Jem Berkes
http://www.sysdesign.ca/
Tim Haynes

2004-01-26, 7:34 pm

Jem Berkes <jb@users.pc9.org> writes:
quote:

> I realize that there are libraries for ZIP handling, but I prefer to
> avoid such thorough external libraries when all I really need to do is
> peek inside the ZIP header to read the filenames contained within.
>
> Is this a relatively easy thing to do? Can anyone point me to any
> documents that can help me learn what the header looks like? (I'm
> guessing there are various versions of it).



I believe it must be easy...

I just opened a .zip file in emacs, and it gave me exactly this data, an
index of what's in the file. Running `strings' on the blighter, the
filenames are present both between the individual files' compressed data,
and in a small paragraph right at the tail of the output (search for `@', 6
bytes of unprintable gibberish[0], then the filename - these things may
well occur at fixed offsets back from the end of the file, too).

[0] at a guess, this, plus the bits after the name, might contain either
offsets into the zip file for where it starts, and/or hints as to which
algorithm was used (store, explode, expand,..).

HTH?

~Tim
--
08:19:20 up 55 days, 11:32, 0 users, load average: 0.09, 0.18, 0.12
piglet@stirfried.vegetable.org.uk |Cries of mercy rise like rockets
http://spodzone.org.uk/cesspit/ |Through the paths of the redeemed
Jem Berkes

2004-01-27, 1:34 am

>> I realize that there are libraries for ZIP handling, but I prefer to
quote:

>
> I believe it must be easy...
>
> I just opened a .zip file in emacs, and it gave me exactly this data,
> an index of what's in the file. Running `strings' on the blighter, the
> filenames are present both between the individual files' compressed
> data, and in a small paragraph right at the tail of the output (search
> for `@', 6 bytes of unprintable gibberish[0], then the filename -
> these things may well occur at fixed offsets back from the end of the
> file, too).
>
> [0] at a guess, this, plus the bits after the name, might contain
> either offsets into the zip file for where it starts, and/or hints as
> to which algorithm was used (store, explode, expand,..).



Interesting, you're right the ASCII filename is pretty clearly in there.
Hmm, this may be much easier than I thought. If a spec doesn't turn out
I'll 'reverse engineer' the process from empirical data, there are so many
example ZIP files out there to make this easy.

Hopefully renattach 1.2.1 will be able to look for dangerous filenames
inside ZIP files then

--
Jem Berkes
http://www.sysdesign.ca/
Robert Harris

2004-01-27, 2:34 am

At:

<http://www.info-zip.org/doc/appnote-iz-latest.zip>

There is a ZIPped spec of the ZIP format.

Robert

Jem Berkes wrote:[QUOTE][color=darkred]
Colin McKinnon

2004-01-27, 9:35 am

Jem Berkes spilled the following:
quote:

> I develop a GNU GPL'd mail tool for UNIX that filters attachments by
> detecting dangerous filenames. It works precisely as expected, but one
> situation I'm trying to find a solution for is the case where dangerous
> filenames are compressed inside an attached ZIP file.
>



Last time I looked zip files were compressed tar archives (using good old
fashioned 'compress'). So tar -tvZf should do the trick.

Beware though, a fun way to hurt a mailscanner is to send it a zip bomb -
ISR its possible to get compression ratios of >> 1000:1 using (e.g.) a file
of null characters. (have a google for 42.zip)

C.
Felix Tilley

2004-01-27, 3:34 pm

In article <Xns947CBCE5D39A2jbuserspc9org@130.179.16.24>, Mon, 26 Jan 2004
17:34:09 -0700, "Jem Berkes" <jb@users.pc9.org> wrote:
quote:

> I develop a GNU GPL'd mail tool for UNIX that filters attachments by
> detecting dangerous filenames. It works precisely as expected, but one
> situation I'm trying to find a solution for is the case where dangerous
> filenames are compressed inside an attached ZIP file.
>
> I realize that there are libraries for ZIP handling, but I prefer to
> avoid such thorough external libraries when all I really need to do is
> peek inside the ZIP header to read the filenames contained within.
>
> Is this a relatively easy thing to do? Can anyone point me to any
> documents that can help me learn what the header looks like? (I'm
> guessing there are various versions of it).
>



From Linux:

man zip
man unzip

Otherwise, there may be better places to ask the question.
Unfortunately, I do not know where to ask the question either.

I know that some ISPs scan ZIP files for viruses. I don't remember which
ones. I think some corporations do it too.

I do not think I have been very helpful.

--

Felix Tilley
Rank: Capt
Fanatic Lartvocate
FL# 555-LART
=?ISO-8859-1?Q?Bj=F8rn_Augestad?=

2004-01-27, 6:34 pm

Jem Berkes wrote:
quote:

> I develop a GNU GPL'd mail tool for UNIX that filters attachments by
> detecting dangerous filenames. It works precisely as expected, but one
> situation I'm trying to find a solution for is the case where dangerous
> filenames are compressed inside an attached ZIP file.
>
> I realize that there are libraries for ZIP handling, but I prefer to avoid
> such thorough external libraries when all I really need to do is peek
> inside the ZIP header to read the filenames contained within.
>
> Is this a relatively easy thing to do? Can anyone point me to any documents
> that can help me learn what the header looks like? (I'm guessing there are
> various versions of it).
>



Google is your friend. ;-)

I googled for "zip file format" and found this document.
http://www.pkware.com/products/ente...rs/appnote.html



HTH
Bjørn


--
The worlds fastest web server is now available
at http://highlander.metasystems.no:2000. Enjoy!
Fredrik Roubert

2004-01-28, 1:36 am

On 27 Jan 2004 00:34:09 GMT, Jem Berkes wrote:
quote:

> I realize that there are libraries for ZIP handling, but I prefer to avoid
> such thorough external libraries when all I really need to do is peek
> inside the ZIP header to read the filenames contained within.
>
> Is this a relatively easy thing to do? Can anyone point me to any documents
> that can help me learn what the header looks like? (I'm guessing there are
> various versions of it).



Parsing the ZIP directory is quite easy, and I see that you've already
got pointers to documentation. If you want sample code as well, you can
download the zlib source code, and look at the program minizip in the
contrib directory:

http://www.gzip.org/zlib/

Cheers // Fredrik Roubert

--
Möllevångsvägen 6c | +46 46 188127
SE-222 40 Lund | http://www.df.lth.se/~roubert/
Jem Berkes

2004-01-28, 2:38 am

>> I realize that there are libraries for ZIP handling, but I prefer to
quote:

>
> Google is your friend. ;-)
>
> I googled for "zip file format" and found this document.
> http://www.pkware.com/products/ente...rs/appnote.html



Thanks! I appreciate all the help, everyone.

--
Jem Berkes
http://www.sysdesign.ca/
Frank Pilhofer

2004-01-28, 1:35 pm

Jem Berkes <jb@users.pc9.org> wrote:
quote:

>
> Is this a relatively easy thing to do? Can anyone point me to any documents
> that can help me learn what the header looks like? (I'm guessing there are
> various versions of it).
>



www.wotsit.org

Frank

--
Frank Pilhofer ........................................... fp@fpx.de
How is it that people looking for a helping hand tend to overlook
the one at the end of their arm? - Alfred E. Neuman
Jem Berkes

2004-01-28, 2:35 pm

>> Is this a relatively easy thing to do? Can anyone point me to any
quote:

>
> www.wotsit.org



Thanks Frank, well that's a neat site and I can't believe I never ran
across it before. Bookmarked it now, probably will refer to it frequently
in the future!

--
Jem Berkes
http://www.sysdesign.ca/
Jem Berkes

2004-01-28, 3:35 pm

>>> Is this a relatively easy thing to do? Can anyone point me to any
quote:

>
> Thanks Frank, well that's a neat site and I can't believe I never ran
> across it before. Bookmarked it now, probably will refer to it
> frequently in the future!



This also makes me remember another advantage of coding my own
implementations of specifications (as opposed to using an existing
library). In renattach I use my own MIME parser which is more forgiving to
specification violations, which viruses often take advantage of.

For instance, I think there is at least one worm that distributed a corrupt
ZIP file. Does anyone have a copy of such a thing? I'm interested in this
because the virus author might be trying to send a corrupt format in order
to bypass (anal) parsers, if you see what I'm getting at.

--
Jem Berkes
http://www.sysdesign.ca/
Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com