Ps


Table of Contents

PreScript 
Basic Info 
Help 
Version 2.2 
Other Postscript Converters 

PreScript 

Basic Info 

Usage 

Usage:

prescript format input [output]
  • format is either plain or html.
  • input is the input filename, a PostScript file.
  • output is the output filename. By default, the output file name is the same as the input filename with the path removed and suffix replace to either .txt or .html.

Info 

PreScript is a utility for extracting text from PostScript files.

Related Urls 

Description 

Features 

PreScript offers:

PostScript conversion to plain ASCII or HTML
PreScript is really a PostScript to plain text converter, but rudimentary HTML can also be produced. Tags are inserted to mark paragraphs (<p>), short lines (<br>), page breaks (<hr>), and header and footers (italicized with <i>…</i>).
Paragraph boundaries detection
PreScript determines the line spacing of a document and uses this (and also indentations) to determine paragraph boundaries.
Hyphenation removal
Hyphenated words are de-hyphenated.
Ligature translation
Most ligatures used by TeX document are detected. PreScript doesn't track font changes making it impossible to reliably detect all ligatures.

Releases 

The PreScript 0.1 distribution
This distribution is the most stable - it is what you should use to do real work.
The PreScript 2 distribution
This is a beta release of our latest version.

Comments 

Help 

Support 

Quick Help 

Detail Help 

Version 2.2 

This is a beta release of our latest version. This version is a lot cleaner and faster; it is also extensible (users can write their own renderers), better documented, and contains better prediction of line, paragraph, and page breaks.

File size 

Dependencies 

PreScript is written in PostScript and Python. You will need Ghostscript (at least version 4.01) and the Python interpreter (at least version 1.4.).

Other Postscript Converters 

Here is a summary of other PostScript to text converters we found.

pstotext
From the DEC Virtual Paper research project. PostScript program and C program. Probably the best PostScript to text converter (after PreScript, of course).
ps2html, The Sequel
Developed at Johns Hopkins University to convert JHU journal articles to HTML. This converter attempts to preserve the formatting of the original PostScript document, but is tied to PostScript files generated with a specific package (QuarkXPress?). A table describing a number of parameters is used to aid conversion and can be modified for new formats. Uses a variation of Ghostscript's ps2ascii.ps.
ps2ascii.ps
Part of the Ghostscript distribution. ps2ascii.ps is considerably less robust than PreScript.
ps2a.sh
A PostScript program similar to Ghostscript's ps2ascii.ps.
ps2ascii.shar
A PostScript program and Perl script.
ps2ascii.pl
A Perl script that extracts parenthesized text from a PostScript file.
ps2txt
A standalone C program that extracts parenthesized text. Some special code to deal with dvips generated files.