|
Home > Archive > Unix Programming > January 2004 > Parsing ZIP headers
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Parsing ZIP headers
|
|
| Jem Berkes 2004-01-26, 11:34 am |
| I develop a GNU GPL'd mail tool for UNIX that filters attachments by
detecting dangerous filenames. It works precisely as expected, but one
situation I'm trying to find a solution for is the case where dangerous
filenames are compressed inside an attached ZIP file.
I realize that there are libraries for ZIP handling, but I prefer to avoid
such thorough external libraries when all I really need to do is peek
inside the ZIP header to read the filenames contained within.
Is this a relatively easy thing to do? Can anyone point me to any documents
that can help me learn what the header looks like? (I'm guessing there are
various versions of it).
--
Jem Berkes
http://www.sysdesign.ca/
| |
| Tim Haynes 2004-01-26, 7:34 pm |
| Jem Berkes <jb@users.pc9.org> writes:
quote:
> I realize that there are libraries for ZIP handling, but I prefer to
> avoid such thorough external libraries when all I really need to do is
> peek inside the ZIP header to read the filenames contained within.
>
> Is this a relatively easy thing to do? Can anyone point me to any
> documents that can help me learn what the header looks like? (I'm
> guessing there are various versions of it).
I believe it must be easy... 
I just opened a .zip file in emacs, and it gave me exactly this data, an
index of what's in the file. Running `strings' on the blighter, the
filenames are present both between the individual files' compressed data,
and in a small paragraph right at the tail of the output (search for `@', 6
bytes of unprintable gibberish[0], then the filename - these things may
well occur at fixed offsets back from the end of the file, too).
[0] at a guess, this, plus the bits after the name, might contain either
offsets into the zip file for where it starts, and/or hints as to which
algorithm was used (store, explode, expand,..).
HTH?
~Tim
--
08:19:20 up 55 days, 11:32, 0 users, load average: 0.09, 0.18, 0.12
piglet@stirfried.vegetable.org.uk |Cries of mercy rise like rockets
http://spodzone.org.uk/cesspit/ |Through the paths of the redeemed
| |
| Jem Berkes 2004-01-27, 1:34 am |
| >> I realize that there are libraries for ZIP handling, but I prefer toquote:
>
> I believe it must be easy... 
>
> I just opened a .zip file in emacs, and it gave me exactly this data,
> an index of what's in the file. Running `strings' on the blighter, the
> filenames are present both between the individual files' compressed
> data, and in a small paragraph right at the tail of the output (search
> for `@', 6 bytes of unprintable gibberish[0], then the filename -
> these things may well occur at fixed offsets back from the end of the
> file, too).
>
> [0] at a guess, this, plus the bits after the name, might contain
> either offsets into the zip file for where it starts, and/or hints as
> to which algorithm was used (store, explode, expand,..).
Interesting, you're right the ASCII filename is pretty clearly in there.
Hmm, this may be much easier than I thought. If a spec doesn't turn out
I'll 'reverse engineer' the process from empirical data, there are so many
example ZIP files out there to make this easy.
Hopefully renattach 1.2.1 will be able to look for dangerous filenames
inside ZIP files then 
--
Jem Berkes
http://www.sysdesign.ca/
| |
|
|
| Colin McKinnon 2004-01-27, 9:35 am |
| Jem Berkes spilled the following:
quote:
> I develop a GNU GPL'd mail tool for UNIX that filters attachments by
> detecting dangerous filenames. It works precisely as expected, but one
> situation I'm trying to find a solution for is the case where dangerous
> filenames are compressed inside an attached ZIP file.
>
Last time I looked zip files were compressed tar archives (using good old
fashioned 'compress'). So tar -tvZf should do the trick.
Beware though, a fun way to hurt a mailscanner is to send it a zip bomb -
ISR its possible to get compression ratios of >> 1000:1 using (e.g.) a file
of null characters. (have a google for 42.zip)
C.
| |
| Felix Tilley 2004-01-27, 3:34 pm |
| In article <Xns947CBCE5D39A2jbuserspc9org@130.179.16.24>, Mon, 26 Jan 2004
17:34:09 -0700, "Jem Berkes" <jb@users.pc9.org> wrote:
quote:
> I develop a GNU GPL'd mail tool for UNIX that filters attachments by
> detecting dangerous filenames. It works precisely as expected, but one
> situation I'm trying to find a solution for is the case where dangerous
> filenames are compressed inside an attached ZIP file.
>
> I realize that there are libraries for ZIP handling, but I prefer to
> avoid such thorough external libraries when all I really need to do is
> peek inside the ZIP header to read the filenames contained within.
>
> Is this a relatively easy thing to do? Can anyone point me to any
> documents that can help me learn what the header looks like? (I'm
> guessing there are various versions of it).
>
From Linux:
man zip
man unzip
Otherwise, there may be better places to ask the question.
Unfortunately, I do not know where to ask the question either.
I know that some ISPs scan ZIP files for viruses. I don't remember which
ones. I think some corporations do it too.
I do not think I have been very helpful.
--
Felix Tilley
Rank: Capt
Fanatic Lartvocate
FL# 555-LART
| |
| =?ISO-8859-1?Q?Bj=F8rn_Augestad?= 2004-01-27, 6:34 pm |
| Jem Berkes wrote:
quote:
> I develop a GNU GPL'd mail tool for UNIX that filters attachments by
> detecting dangerous filenames. It works precisely as expected, but one
> situation I'm trying to find a solution for is the case where dangerous
> filenames are compressed inside an attached ZIP file.
>
> I realize that there are libraries for ZIP handling, but I prefer to avoid
> such thorough external libraries when all I really need to do is peek
> inside the ZIP header to read the filenames contained within.
>
> Is this a relatively easy thing to do? Can anyone point me to any documents
> that can help me learn what the header looks like? (I'm guessing there are
> various versions of it).
>
Google is your friend. ;-)
I googled for "zip file format" and found this document.
http://www.pkware.com/products/ente...rs/appnote.html
HTH
Bjørn
--
The worlds fastest web server is now available
at http://highlander.metasystems.no:2000. Enjoy!
| |
| Fredrik Roubert 2004-01-28, 1:36 am |
| On 27 Jan 2004 00:34:09 GMT, Jem Berkes wrote:
quote:
> I realize that there are libraries for ZIP handling, but I prefer to avoid
> such thorough external libraries when all I really need to do is peek
> inside the ZIP header to read the filenames contained within.
>
> Is this a relatively easy thing to do? Can anyone point me to any documents
> that can help me learn what the header looks like? (I'm guessing there are
> various versions of it).
Parsing the ZIP directory is quite easy, and I see that you've already
got pointers to documentation. If you want sample code as well, you can
download the zlib source code, and look at the program minizip in the
contrib directory:
http://www.gzip.org/zlib/
Cheers // Fredrik Roubert
--
Möllevångsvägen 6c | +46 46 188127
SE-222 40 Lund | http://www.df.lth.se/~roubert/
| |
|
|
| Frank Pilhofer 2004-01-28, 1:35 pm |
| Jem Berkes <jb@users.pc9.org> wrote:quote:
>
> Is this a relatively easy thing to do? Can anyone point me to any documents
> that can help me learn what the header looks like? (I'm guessing there are
> various versions of it).
>
www.wotsit.org
Frank
--
Frank Pilhofer ........................................... fp@fpx.de
How is it that people looking for a helping hand tend to overlook
the one at the end of their arm? - Alfred E. Neuman
| |
| Jem Berkes 2004-01-28, 2:35 pm |
| >> Is this a relatively easy thing to do? Can anyone point me to anyquote:
>
> www.wotsit.org
Thanks Frank, well that's a neat site and I can't believe I never ran
across it before. Bookmarked it now, probably will refer to it frequently
in the future!
--
Jem Berkes
http://www.sysdesign.ca/
| |
| Jem Berkes 2004-01-28, 3:35 pm |
| >>> Is this a relatively easy thing to do? Can anyone point me to anyquote:
>
> Thanks Frank, well that's a neat site and I can't believe I never ran
> across it before. Bookmarked it now, probably will refer to it
> frequently in the future!
This also makes me remember another advantage of coding my own
implementations of specifications (as opposed to using an existing
library). In renattach I use my own MIME parser which is more forgiving to
specification violations, which viruses often take advantage of.
For instance, I think there is at least one worm that distributed a corrupt
ZIP file. Does anyone have a copy of such a thing? I'm interested in this
because the virus author might be trying to send a corrupt format in order
to bypass (anal) parsers, if you see what I'm getting at.
--
Jem Berkes
http://www.sysdesign.ca/
|
|
|
|
|