[TriLUG] PDF to text

Joseph E. ODoherty joey at odoherty.net
Wed Feb 1 18:53:09 EST 2006


On Wed, Feb 01, 2006 at 02:04:43PM -0500, Owen Berry wrote:
> which case I couldn't get ps2ascii to work. The downside is that I would
> have to install ghostscript. :-(

Well you have to render the pdf/postscript _somehow_!

If you really need something small and unobtrusive you could statically
compile ps2ascii yourself and throw away all the parts you dont need.

> On Wed, 2006-02-01 at 13:26 -0500, Christopher J. Knowles wrote:
> > On Wednesday 01 February 2006 12:01, Ian Kilgore wrote:
> > > Owen Berry wrote:
> > > | Anyone know of a command line utility for extracting text from a pdf
> > > | file, other than the one included in xpdf (pdftotext)? pdftotext does
> > > | exactly what I want, but I would like to avoid pulling in the rest of
> > > | xpdf, if possible, as this is for a server.
> > > |
> > > | BTW, I'm using it combined with the perlfect search engine, so the text
> > > | does not need to be formatted nicely or anything.
> > > |
> > > | Thanks,
> > > | Owen
> > >
> > > You can pipe pdf2ps | ps2ascii
> > 
> > Or, according to ps2ascii manpage (and some quick experimentation, you can 
> > just "ps2ascii pdffile.pdf > pdffile.txt"
> > 
> > (When I just tried the pdf2ps | ps2ascii, it gave me a blank... while just 
> > running it through ps2ascii seems to work.)

-- 
pub  1024D/B663781B 2001-11-13 Joey O'Doherty <joey(at)odoherty(dot)net>
     Key fingerprint = F76B 9ACA 4197 C707 6E4D  2B78 E430 101A B663 781B

     The sad fact is that "national security" has become the root password 
     to the Constitution. -- Phil Karn



More information about the TriLUG mailing list