|
Home > Archive > Unix Shell > April 2005 > Script to read pdf files
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Script to read pdf files
|
|
| Claus Dormeier 2005-04-25, 2:52 am |
| I am looking for a shell script which is able to read the content of a pdf file.
Is there a special script, or might I use standard shell commands.
Thanks, Claus
| |
| Alan Connor 2005-04-25, 7:55 am |
| On comp.unix.shell, in
<d6bb8e19.0504242338.100fa73a@posting.google.com>, "Claus
Dormeier" wrote:
> I am looking for a shell script which is able to
> read the content of a pdf file. Is there a special
> script, or might I use standard shell commands.
> Thanks, Claus
Normally, you'd use a PDF previewer, like gv, but you
probably know that.
Assuming you want to pull the text out of a PDF file for
further processing, there's this:
ps2ascii (1) - Ghostscript translator from PostScript
or PDF to ASCII
The output is not very good, basically un-formatted ascii.
Here's another, this being the description of a Debian
package.
Package: pstotext
Installed-Size: 86
Maintainer: J.H.M. Dassen (Ray) <jdassen@debian.org>
Architecture: i386
Version: 1.8g-4
Depends: gs | gs-aladdin (>= 3.51), libc6 (>= 2.2.4-4)
Size: 30814
Description: Extract text from PostScript and PDF files. pstotext
extracts text (in the ISO 8859-1 character set) from a PostScript
or PDF (Portable Document Format) file. Thus, pstotext is similar
to the ps2ascii program that comes with ghostscript. The output
of pstotext is however better than that of ps2ascii, because
pstotext deals better with punctuation and ligatures.
endquote.
Hope this helps, Claus,
AC
| |
| Michael Vilain 2005-04-25, 5:53 pm |
| In article <d6bb8e19.0504242338.100fa73a@posting.google.com>,
1948@mailinator.com (Claus Dormeier) wrote:
> I am looking for a shell script which is able to read the content of a pdf
> file.
> Is there a special script, or might I use standard shell commands.
> Thanks, Claus
You install Acrobat Reader for your OS and install it. It can read the
PDF format. Or use a PERL CPAN module:
http://search.cpan.org/~antro/PDF-111/PDF.pm
--
DeeDee, don't press that button! DeeDee! NO! Dee...
|
|
|
|
|