Unix Shell - Script to read pdf files

This is Interesting: Free IT Magazines  
Home > Archive > Unix Shell > April 2005 > Script to read pdf files





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Script to read pdf files
Claus Dormeier

2005-04-25, 2:52 am

I am looking for a shell script which is able to read the content of a pdf file.
Is there a special script, or might I use standard shell commands.
Thanks, Claus
Alan Connor

2005-04-25, 7:55 am

On comp.unix.shell, in
<d6bb8e19.0504242338.100fa73a@posting.google.com>, "Claus
Dormeier" wrote:

> I am looking for a shell script which is able to
> read the content of a pdf file. Is there a special
> script, or might I use standard shell commands.
> Thanks, Claus


Normally, you'd use a PDF previewer, like gv, but you
probably know that.

Assuming you want to pull the text out of a PDF file for
further processing, there's this:

ps2ascii (1) - Ghostscript translator from PostScript
or PDF to ASCII

The output is not very good, basically un-formatted ascii.

Here's another, this being the description of a Debian
package.

Package: pstotext
Installed-Size: 86
Maintainer: J.H.M. Dassen (Ray) <jdassen@debian.org>
Architecture: i386
Version: 1.8g-4
Depends: gs | gs-aladdin (>= 3.51), libc6 (>= 2.2.4-4)
Size: 30814

Description: Extract text from PostScript and PDF files. pstotext
extracts text (in the ISO 8859-1 character set) from a PostScript
or PDF (Portable Document Format) file. Thus, pstotext is similar
to the ps2ascii program that comes with ghostscript. The output
of pstotext is however better than that of ps2ascii, because
pstotext deals better with punctuation and ligatures.

endquote.

Hope this helps, Claus,


AC


Michael Vilain

2005-04-25, 5:53 pm

In article <d6bb8e19.0504242338.100fa73a@posting.google.com>,
1948@mailinator.com (Claus Dormeier) wrote:

> I am looking for a shell script which is able to read the content of a pdf
> file.
> Is there a special script, or might I use standard shell commands.
> Thanks, Claus


You install Acrobat Reader for your OS and install it. It can read the
PDF format. Or use a PERL CPAN module:

http://search.cpan.org/~antro/PDF-111/PDF.pm

--
DeeDee, don't press that button! DeeDee! NO! Dee...



Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com