[TriLUG] PDF to text
Owen Berry
oberry at trilug.org
Wed Feb 1 14:04:43 EST 2006
ps2ascii seems to be an option that might work. There is a possibility
that the files will be password protected (I have the password), in
which case I couldn't get ps2ascii to work. The downside is that I would
have to install ghostscript. :-(
Thanks for the suggestion.
Owen
On Wed, 2006-02-01 at 13:26 -0500, Christopher J. Knowles wrote:
> On Wednesday 01 February 2006 12:01, Ian Kilgore wrote:
> > Owen Berry wrote:
> > | Anyone know of a command line utility for extracting text from a pdf
> > | file, other than the one included in xpdf (pdftotext)? pdftotext does
> > | exactly what I want, but I would like to avoid pulling in the rest of
> > | xpdf, if possible, as this is for a server.
> > |
> > | BTW, I'm using it combined with the perlfect search engine, so the text
> > | does not need to be formatted nicely or anything.
> > |
> > | Thanks,
> > | Owen
> >
> > You can pipe pdf2ps | ps2ascii
>
> Or, according to ps2ascii manpage (and some quick experimentation, you can
> just "ps2ascii pdffile.pdf > pdffile.txt"
>
> (When I just tried the pdf2ps | ps2ascii, it gave me a blank... while just
> running it through ps2ascii seems to work.)
>
> CJK
More information about the TriLUG
mailing list