*Tags*: ps to text, ps2text
pstotext is a program that works with Ghostscript to extract plain text from PostScript and PDF files.
pstotext works by sending a library, followed by the PostScript file, to the Ghostscript interpreter. The library intercepts the text rendering operators and sends information about the text back to pstotext. This information includes character metrics and encoding vectors, so in most situations we're able to reconstruct the plain text (converted to ISO Latin 1 encoding), with correct word breaks and good guesses about line breaks. It even works for rotated text!
http://www.research.compaq.com/SRC/virtualpaper/pstotext.html http://www.research.compaq.com/SRC/virtualpaper/cgi-bin/nph-download.tcl/pstotext.tar.Z?object=pstotext
Usage: pstotext [option|file]... Options: -cork assume Cork encoding for dvips output -landscape rotate 270 degrees -landscapeOther rotate 90 degrees -portrait don't rotate (default) -bboxes output one word per line with bounding box -debug show Ghostscript output and error messages -gs "command" Ghostscript command - read from stdin (default if no files specified) -output file output results to "file" (default is stdout)