====== Prescript ====== PreScript is a utility for extracting text from PostScript files. PreScript offers: **PostScript conversion to plain ASCII or HTML.**\\ PreScript is really a PostScript to plain text converter, but rudimentary HTML can also be produced. Tags are inserted to mark paragraphs (
), short lines (
), page breaks (
prescript format input [output]
* format is either plain or html.
* input is the input filename, a PostScript file.
* output is the output filename. By default, the output file name is the same as the input filename with the path removed and suffix replace to either .txt or .html.
=====Notes=====
PreScript is a port of a Perl program used by the New Zealand Digital Library project to convert computer science technical reports to HTML. The Perl version is deemed unfit for a public release because the code is quite messy (a consequence of Perl's cumbersome syntax for defining objects). The Python version is considerably easier to understand, maintain, and extend. The technical paper [[http://files.greenstone.org/software/prescript/prescript.ps.gz|prescript.ps.gz]] documents the algorithms and heuristics used in PreScript 0.1 - there is an update to this for PreScript 2 inside its distribution archive.
=====Other Postscript Converters=====
Here is a summary of other PostScript to text converters we found.
**[[http://www.research.digital.com/SRC/virtualpaper/pstotext.html|pstotext]]**\\
From the DEC Virtual Paper research project. PostScript program and C program. Probably the best PostScript to text converter (after PreScript, of course). \\
**[[http://stasi.bradley.edu/ftp/pub/ps2html/ps2html-v2.html|ps2html, The Sequel]]**\\
Developed at Johns Hopkins University to convert JHU journal articles to HTML. This converter attempts to preserve the formatting of the original PostScript document, but is tied to PostScript files generated with a specific package (QuarkXPress?). A table describing a number of parameters is used to aid conversion and can be modified for new formats. Uses a variation of Ghostscript's ps2ascii.ps. \\
**ps2ascii.ps**\\
Part of the Ghostscript distribution. ps2ascii.ps is considerably less robust than PreScript. \\
**[[ftp://ftp.mpce.mq.edu.au/pub/comp/src/ps2a.sh|ps2a.sh]]**\\
A PostScript program similar to Ghostscript's ps2ascii.ps. \\
**[[ftp://apocalypse.engr.ucf.edu/usr/ssd/ps2ascii.shar|ps2ascii.shar]]**\\
A PostScript program and Perl script. \\
**[[ftp://wilma.cs.brown.edu/pub/postscript/ps2ascii.pl|ps2ascii.pl]]**\\
A Perl script that extracts parenthesized text from a PostScript file. \\
**[[ftp://ftp.funet.fi/pub/archive/alt.sources/volume92/Feb/920223.01.gz|ps2txt]]**\\
A standalone C program that extracts parenthesized text. Some special code to deal with dvips generated files. \\